What is Data Profiling | Data Profiling Basics
This page covers data profiling definition, classification of data profiling tasks, use cases and challenges of data profiling.
Data Profiling definition
The process of examining and collecting informative summary in the form of smaller database from the larger one is known as data profiling.
Data Profiling Tasks
➨It examines data available in existing data source and collects
statistics and information about the data.
➨It converts big data information into smaller informative data.
➨It collects metadata in order to support data management.
➨It results information about columns and column sets.
The figure-1 depicts various data profiling tasks.
Data Profiling Use cases
Following are the use cases of data profiling.
➨Query optimization: It counts and generates histograms.
➨Data Cleansing: It removes duplicate patterns and removes any violations.
➨Data integration: Cross DB inclusion dependencies.
➨Scientific data management: Handles new datasets.
➨Data analytics and data mining
Data Profiling Challenges
Following are the challenges involved in data profiling.
Number of rows (sorting, hashing), Number of columns and combinations.
➨Large space requirements.
➨New data types (beyond strings/numbers) and data models (beyond relational).
➨New requirements: User oriented, streaming etc.
Data Analytics Related Links
what is data analytics
Advantages and Disadvantages of data analytics
What is big data
What is Hadoop
Data Mining Glossary
Data mining tools and techniques
What is Cloud Storage
data mining tutorial
cloud storage tutorial
How does it work
cloud storage security
cloud computing tutorial