Fernandez, Verdejo, & Teodora (2014) states that the Big Data have concept of “The three Vs”. These are used to describe the evolution of data and the change which are responsible in collecting and storing of information. These three Vs are Volume, Variety and velocity. But Sailaja (2014) claims that there are other two factors which should also be considered i.e. Variability and complexity.
The amount or volume of data that is owned by small or large business, IT system or an individual are growing rapidly. This growth can be easily measured by applying qualitative approach. This evaluation can take place in many ways: i) measuring the amount of data collected and stored using conventional approach in bytes or, ii) by counting the total number of records that are stored in databases, iii) counting the amount of transactions that are being run, iv) counting the total number of tables, files and other forms of data (Minelli, Chambers & Ambiga, 2013). However, Cattell (2010) wrote that traditionally the small data set was used to analyse, which was known as “samples”. The data set was small due to the limited volume of storage. However, modern days Businesses or IT system uses larger data sets, which results in increasing the accuracy of the analysation.
Moreover, the term volume differs from small to large Business companies. Small companies defines data between one terabyte to one petabyte as big data however, the same volume of data for big companies like amazon are not considered big enough as they deal with petabyte and terabyte most often.
The graph in figure 2.4 shows that the volume of information have grown tremendously with time however, the storage is limited in order to save the information
From the above discussion it can be predicted that, the volume of data which is generated by different sources is the main cause of big data analytics. However, the term “variety” also plays a virtual role in terms of Big data. Variety mainly deals with different kind of data which are generated, (which may be Structured, semi-structured or unstructured) results in creating massive volumes of data. Richard (2010) claims that traditional system deal more often with structured data. However, as current era comprises of semi- structured and unstructured data hence, the traditional method faces many limitations. These limitations are even more when semi-structured or unstructured data need to be changed into structured data, making these data warehousing are much costlier (Cattell, 2010).
The velocity defines the pace at which the data is being generated. Currently, the data is producing with great pace and it is believed that it will continue to rise with the even more rapidity. Chu (2014) states that this sudden increase of data rapidly is caused due to the generation of real time data (such as location based data). Using traditional approach of store and analyse real time data is difficult as this has to be done in limited time while providing proper accuracy. Moreover, the conventional method for storing and analysing use live stream data processing which results in quick and accurate response.
The figure 2.5 depicts the graph between value of information and time. According to the graph, Richard (2006) states that, if the time taken in generating the result is more, it will reduce the value of information.
Sailaja (2014) defines the term variability as the variation in the data. It can be used to perform analysis of sentiments. The meaning of variability here reflects to the meaning which is constantly changing. So it can be said that these are the data whose meaning changes with time and place. In order to extract the exact meaning proper sentiment analysis is needed (Repate & Yambem, 2015).
Richard (2006) writes that, the data is collaborated from different sources and these needs to be processed soon, this results in increasing the complexity of the work. According to Sailaja (2014) these processing involves large number of tasks like linking, matching cleaning and transferring of data into the format which is required (Sailaja, 2014).
The five properties which are stated above are possessed by all types of data that are generated in the current era. Even the dataset which will be considered in developing the artefact will undergo these properties. These stated characteristics helps in predicting the properties of data which can be very helpful in storing the large volume of data.