Advantages of Hadoop | Disadvantages of Hadoop
This page covers advantages and disadvantages of Hadoop. It mentions Hadoop advantages or benefits and Hadoop disadvantages or drawbacks.
What is Big Data?
• "Big data" is similar to small data but bigger. The word "Big" in big data not just refers to
data volume alone. It also refers fast rate of data origination, its complex format and its origination from
variety of sources. The three V's of big data are Volume, Velocity and Variety.
• The challenges involved with big data are many which include capturing data, curation, storage, searching, sharing, transfer, analysis, presentation etc.
• To fulfill above challenges traditional computing techniques are not sufficient. As a result "HADOOP" has been developed. Refer What is Big Data and its advantages and disadvantages>>.
What is Hadoop?
Initially Google has developed an algorithm known as "Mapreduce" which can be used to divide big task into smaller parts. These small tasks are then assigned to many computers. The processed results are integrated to form the resultant dataset.
From the solution provided by Google, Doug cutting and team have developed Open Source Project known as "Hadoop". Hadoop applies "MapReduce (MR)" algorithms where in data is processed in parallel with other data sets. This is shown in the figure-1.
• Hadoop is an Apache Open Source framework written in Java which allows
distributed processing of large datasets across clusters of computers using simple programming models.
• It is used to develop applications which could perform complete statistical analysis on huge amounts of data.
• Hadoop is designed to scale up from single server to thousands of machines to offer local computations/storage.
Hadoop architecture consists of two layers.
• MapReduce as Processing/Computation layer
• Hadoop Distributed File System (HDFS) as Storage layer
Benefits or advantages of Hadoop
Hadoop solves big data problems.
Following are the benefits or advantages of Hadoop:
➨More storage and computing power can be achieved by addition of more nodes to Hadoop cluster. This eliminates need to buy external hardware. Hence it is cheaper solution.
➨It can handle unstructured data and semi-structured data.
➨Hadoop clusters provide storage and distributed computing all in one.
➨Hadoop framework has built-in power and flexibility to do what was not possible earlier.
➨HDFS layer in hadoop has self healing, replicating and fault tolerance characteristics. It automatically replicates data if server or disk got crashed.
➨Hadoop offers scalability, reliability and plenty of libraries for various applications at lower cost.
➨It helps in distributing data on different servers and prevents network overloading.
Drawbacks or disadvantages of Hadoop
Following are the drawbacks or disadvantages of Hadoop:
➨It is not suitable for small and real time data applications.
➨Joining multiple data set operations are complex.
➨It does not have storage or network level encryption.
➨Cluster management is hard i.e. in cluster, operations like debugging, distributing software, collection logs etc. are too hard.
➨When operated by a single master it will cause difficulty in scaling.
➨Programming model is very restrictive.
Data Mining and Data Analytics Related Links
data analytics tutorial
What is data analytics
What is Data Cleansing
What is Data Deduping
What is Data Profiling
Advantages and Disadvantages of data analytics
What is big data
What is Hadoop
Data Mining Glossary
Data mining tools and techniques
What is Cloud Storage
data mining tutorial
Advantages and disadvantages of data analytics