There are two main challenges which are faced by big data as a whole. These challenges are: how can data be stored and managed efficiently, and how to extract meaningful and accurate information in some specific time span. However, if investigated deeply, these are sub-divided into several challenges. These challenges are: Dealing with the Heterogeneous and incomplete data, scaling of big data, time taken to analyse and privacy of data.
- Dealing with heterogeneous and incomplete data: Most of the times it happens that the particular data is present at different sources with inconsistent information and in order to analyse the data collaboration of information from different sources are required (Li. et al., 2014) This require efficient analysation tools which can collaborate information, in order to do so it also requires extra storage to save data of identical size and structure. Sometimes, these information may include some non-important or nuance information called as error, and is required to get cleaned. There is one more challenge which is related to missing information. These missing information are treated as NULL values. These missing information may sometimes reduce the accuracy of analysation (Chu, 2014).
- Scaling of big data: As the name suggest that data is big in size and this data is constantly increasing with great pace. Managing this data efficiently is a challenge from last few decades. So as to minimise this challenge, the data volume is scaling faster but it still faces the issue as the speed of CPU are static (Chu, 2014).
- Time taken to analyse the data: The third challenge faced by the big data is time. And, it is known that time taken for analysation directly depends on the size of dataset. It mean that if the dataset is large in volume, then more time is required to analyse the data (Meeker & Hong, 2014). However, a business and IT firm should take as less time as possible to analyse the results and make maximum profit out of it. There is one more reason that analysation requires much time in processing is because the sorting of data have to take place repeatedly and these sorting cannot be taken place from the middle of the database. Every time the sorting has to start either from starting on ending of dataset (Chu, 2014).
- Privacy : Another concern is privacy, especially transaction related data requires much privacy than any other data. As, it can results in violating the personal data or account information of data (Meeker & Hong, 2014).
The above discussed challenges are some of the very common challenges which will be faced during the collection of data. These challenges can be minimised by choosing appropriate tools according to the demand of the data which are received by the businesses and IT companies.
Big Data tools : There are many tools which are available for taming Big data in current market. Wayner (2012) claims that Hadoop is most popular tool which are used to organize the racks of server. And, for storing data on these racks NoSQL databases are used. However Loshin (2016) stated that there are several other tools for developer which use NOSQL database for storing data on racks. Some of them are: IBM big data Analytics, HP Big data, SAP Big data Analytics, Microsoft big data (Azure), Oracle big data Analytics, Talend Open Studio, Amazon web service, and many more. These Big data platform are selected according to the needs and cost requirements of the Businesses and IT companies.
All the tools come across several drawbacks which depends from one tool to another. Some of these drawbacks will be discussed in coming blogs.