Classification of Big data:
The Big data is classified in three categories. These categories are: Structured data, Semi- structured data and un-structured data.
The structured data is defined as data which is stored in databases and are in the form of tables and columns. Minelli, Chambers & Ambiga (2013) states that, these types of data have rational key and this key can help to map the data into pre-defined fields. Structured data are every effectively used in development as these are the simplest way to manage the required information. Jeremy (2010) claims that there is only 5-10 % of data present overall which are structured in nature.
The semi- structured data is defined as data which are not stored in rational databases. However, Meeker & Hong (2014) states that these type of data have some pre-defined properties which are used to organize the data making it easier to analyse. These type of data requires less space to store, provides more clarity and are easy to compute. Jeremy (2010) claims that currently, the semi-structured data contributes 5-10 % in storing data.
The Un-Structured data is the data which do not have any structure. These data are mostly generated by two sources i.e. either by machine or by human. These kind of data are found everywhere in large percent and are growing at the fast pace. Jeremy (2010) claims that un-structured data contributes the highest percentage in storing of data which is about 80 % of total.
These different types of data can be easily explained by providing some example which are faced very often in daily routine. These examples are represented in tabular form in table 2.3.
Table 2.3: Table showing examples of structured data, Semi-structured data and un-structured data
|Structured Data||Semi-Structured data||Un-structured data|
|Example of structured data is SQL databases.||Examples of Semi-structured data are CSV files, NoSQL databases. E-mail etc. The development of the artefact in this project will also make use of Semi-Structured data in the form of CSV file.||Examples of Un-Structured data can be broadly classified under two categories i.e. machine generated data like satellite images, scientific data, surveillance videos, radar or sonar data etc. and human generated data like social media data, mobile content etc.|
Although Wiggins (2010) argues that the data is broadly classified into two parts i.e Structured and non structured (comprises of Semi-structure and un-structure data) however, Meeker & Hong (2014) claims that the data are of three types. The concept of Wiggins (2010) can be shown by the graph by Edelle (2016). This graph represents when did structured and non-structured data started growing. The graph also demonstrates that the structured data came into existence before 1980’s which grows gradually with time and will grow upto about 0.5 zettabytes by 2020. However, the unstructured and semi-structured data came into existence in early 2000 and it grows tremendously with time and will be more than 2 Zettabytes by 2020.
The estimated figures of structured, semi-structured and Un-.structured data (Source: Edell, 2016)
The concept that Meeker & Hong (2014) followed can be demonstrated by the bar graph depicted in fig 2.3 by Bhavna (2015). This graph represents the amount of data used by sources like Un-structured data, Databases (structured) and E-mail (semi-structured).
The amount of data in petabytes from 2008 to 2015. (Source: Bhavna, 2015)
From the above blog it can be concluded that the un-structured data is always found to be more in volume than the structured data.