What is Hadoop | Tutorial on Hadoop basics
This Hadoop tutorial page covers what is Hadoop.This tutorial also covers Hadoop Basics.
Definition of Hadoop:
It is an open-source software framework which supports data intensive distributed applications. These are licensed under the Apache v2 license.
Hadoop answers to the question: "How to process big data with reasonable cost and time?"
Refer What is Big Data and its advantages and disadvantages>>.
In other words, Hadoop is basically a software platform which allows one easily write
as well as run applications in order to process vast amounts of data.
• MapReduce - It is a offline computing engine
• HDFS - It is a Hadoop distributed file system
• HBase (pre-alpha) - It is online data access
Hadoop is in use at most organizations such as Yahoo, Amazon, Facebook, Netflix etc. which handle big data. Three main applications of Hadoop are Searches (group related documents), Advertisement and Security (search for uncommon patterns).
Requirements of Hadoop
Following are the goals or requirements of Hadoop:
➨High scalability and availability
➨Abstract and facilitate the storage and processing of large and/or rapidly growing data sets (both structured and non-structured).
➨Use commodity hardware with little redundancy
➨Move computation rather than data
Hadoop Architecture Frame Work
The figure depicts Hadoop architecture frame work.
Following are the silent features of Hadoop frame work.
➨Main nodes of cluster are where most of the computational power and storage of the system lies.
➨Main nodes run TaskTracker to accept and reply to MapReduce tasks, and also DataNode to store needed blocks closely as possible.
➨Distributed, with some centralization.
➨Hadoop is written in Java. It also supports Ruby and Python.
➨Central control node runs NameNode to keep track of HDFS directories & files, and JobTracker to dispatch compute tasks to TaskTracker.