The digital revolution is the biggest thing in the early 2010’s, similar to manufacturing or industrial revolution in 1800’s. The digital revolution requires to go through evolution in data analytics to achieve the industry needs (Franks, 2014). Big Data is defined as “extremely large sets of data which can be analysed to obtain patterns, associations and trends that relates to human behaviour and interactions” (Arthur, 2013).
The increase of social media and usage of internet in day to day life results in data generation and millions of people using this would generate huge amount of data. This has resulted in huge increase of data in last few decades. There are many reasons behind this massive increase in data. Some reasons for this increase are from sources like Facebook, twitter, LinkedIn, YouTube etc. It is estimated that these sources also contributes in this increase. These data generated can be termed as “Big Data” (White, 2012).
A business can only withstand the current growing competitive market by understanding its customers and by analysing their behaviour patterns. This information about their customers can be in different forms like structured and semi-structured format and this needs to be analysed, usage of traditional approaches like databases and excel file would not be possible. So, to operate such datasets of various formats, huge size and analysing them in a short time requires some special tools like Hadoop.
Background: The word “Big data” can be used to define massive volumes of structured, semi-structured or un-structured data which are so huge in size that processing them by using traditional approach, becomes a challenge. In other words, these are large data which requires huge time to process .
Big data have three main properties i.e. Volume, variety and Velocity. However, there are other two dimensions which need to be considered, they are Variability and complexity (Sailaja, 2014).
This Big data can be analysed by many tools, one of such is Hadoop. The Hadoop provides feature of data warehousing and analytics using the concept of MapReduce and HDFS (Hadoop Distributed File System) (Shaw, 2013). The modern day business faces many challenges like finding an efficient way to store and manage the data and to extract valuable information from data in limited time frame. These challenges can be easily tackle by Apache Hadoop which is an open source framework (White, 2012). Apart from Apache Hadoop, the companies like Amazon Web Services, Cloudera, Hortonbox etc. have set the new icon of economy by providing the storage and analytics under the same roof.