What is Big Data | Tutorial on Big Data Basics

Definition of Big Data:
As per Gartner 2012: "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization"

In other words Big Data is the name given to lots of data which are being collected and warehoused as follows:
• Web Data, E-commerce
• Purchases at department or grocery stores
• Bank/Credit Card Transactions
• Social Networks

How Much Data?

• Google processes 20 PB (Petabyte) a day (Statistics: year 2008)
• 1 PB = 1015 Bytes = 1 million gigabytes = 1 thousand terabytes
• Facebook has 2.5 PB of user data + 15 TB/day (Statistics: April 2009)
• eBay has 6.5 PB of user data + 50 TB/day (Statistics: May 2009)

Three V of Big Data

Big Data Vectors

➨high-volume: amount of data
➨high-velocity: Speed rate in collecting or acquiring or generating or processing of data
➨high-variety: different data type such as text, audio, video, image data, XML, retional data (e.g. tables, transaction, legacy ), graph data (semantic web, social network), streaming data (you can only scan data once),

What has been done with these data

• Aggregation and Statistics
-Data warehouse and OLAP
• Indexing, Searching, and Querying
-Keyword based search
-Pattern matching (XML/RDF)
• Knowledge discovery
-Data Mining
-Statistical Modeling

Hadoop has been developed to tackle the growing demand of Big Data, Refer What is Hadoop and its advantages and disadvantages>>

