What is Data Profiling | Data Profiling Basics

This page covers data profiling definition, classification of data profiling tasks, use cases and challenges of data profiling.

Data Profiling definition

The process of examining and collecting informative summary in the form of smaller database from the larger one is known as data profiling.

Data Profiling

Data Profiling Tasks

➨It examines data available in existing data source and collects statistics and information about the data.
➨It converts big data information into smaller informative data.
➨It collects metadata in order to support data management.
➨It results information about columns and column sets.
The figure-1 depicts various data profiling tasks.

Data Profiling Use cases

Following are the use cases of data profiling.
➨Query optimization: It counts and generates histograms.
➨Data Cleansing: It removes duplicate patterns and removes any violations.
➨Data integration: Cross DB inclusion dependencies.
➨Scientific data management: Handles new datasets.
➨Data analytics and data mining

Data Profiling Challenges

Following are the challenges involved in data profiling.
➨Computational complexity:
Number of rows (sorting, hashing), Number of columns and combinations.
➨Large space requirements.
➨New data types (beyond strings/numbers) and data models (beyond relational).
➨New requirements: User oriented, streaming etc.

Data Analytics Related Links

what is data analytics
Advantages and Disadvantages of data analytics
What is big data
What is Hadoop
Data Mining Glossary
Data mining tools and techniques
What is Cloud Storage
data mining tutorial
cloud storage tutorial
Infrastructure
How does it work
Service providers
cloud storage security
cloud computing tutorial


What is Difference between

traditional storage vs cloud storage
Types
DNS vs DHCP
FTP vs HTTP
FTP vs SMTP
FTP vs TFTP

RF and Wireless Terminologies