Katz & Rabkin (2013) discusses that the Hadoop eco-system consists of four basic modules and some additional projects. These basic module are stated below:
i) Hadoop common: The common utilities come under Hadoop common. It helps to provide support to other Hadoop modules.
ii) Hadoop Distributed File System: Hadoop Distributed File System is commonly known as HDFS. The main function of HDFS is to store and manage the large amount of data a.k.a. datasets. HDFS provides sequential Read/ Write operations.
iii) Hadoop Yarn: The task of Hadoop Yarn is used to schedule jobs and manage the resources in a cluster.
iv) Hadoop MapReduce: The responsibility of MapReduce is to provides parallel programming to the system. This objective is achieved by reducing number of the task and by merging the values in the single result.
However, according to Bhandarkar (2013) there are several additional projects which work with Hadoop, some of these projects are Hbase, Hive , Pig, Sqoop, Flume etc. These project are discussed below:
i) Hbase: The Hbase is a non-rational database system. It is runned on the top of HDFS. Its function is to read/writes data from database of HDFS.
ii) Hive: Hive is basically a SQL like language that provides data warehousing.
iii) Pig: Pig supports high-level query language that is commonly used for analysation of data.
iv) Spark: Spark is used to provide faster data analysation of large datasets. Moreover, Spark uses its own framework for data processing instead of using MapReduce.
v) Sqoop: The Sqoop is used to transfer data to and from relational database and Hadoop. Sqoop was developed by Cloudera.
vi) Flume: The task of Flume it to collect live data from any source.
vii) Zookeeper: Zookeeper is used to provide co-ordination with all the software used in Hadoop eco-system (Bhandarkar, 2013)
The figure below depicts the Eco-system of the Hadoop.