SQL server 2012 supports Hadoop. Hadoop is not a rational database system, it is even not the replacement or substitute of SQL server. Hadoop is mainly designed to deal with the unstructured and semi-structured data. SQL can save its data in form of XML as well as file stream however, it faces the problem of size moreover, and it uses high processing power as well. The Hadoop is believed to come into existence less than a decade ago and got so popular in the field of big data, it is inspired from google which was used to index the textual information.
According to Facebook (2010), Hadoop was used as largest clusters, storing more than 20 Petabytes of data. Hadoop is written in java language and are run on large clusters of servers. Adding and removing servers from clusters are easy, which means that it is Hadoop is scalable. More the number of servers, more will be processing power. To developers of Hadoop are making Hadoop platform independent, which means that Hadoop can now also be run on windows.
The Rational Databases came into existence in about 1970’s by Edgar F. Codd’s who worked in IBM, this was not very far from the launching time of home based computers. Later this concept of Relational Databases was adopted by Oracle, Informix etc. At approximately same time SQL was launched by Donald D, Raymond, and Chamberlin. SQL was given a status of language which was used to analyse the stored data. However, the Hadoop came into existence in the end of first decade of 21st century. Hadoop was firstly launched by Apache, which later was adapted by many open source providers as Cloudera, Hortonbox, etc. The differences between the two approaches are shown in table below:
|1||Technology||RDBMS are databases used for storing data.||Hadoop is a framework used to handle large volumes of data.|
|2||Type of data used||RDBMS uses structured data. Provides storage and analysation of structured data in simple manner. RDBMS are not used for semi-structured or un-structured data.||Hadoop uses data which are either semi-structured or un-structured and comes from variety of sources like e-mail, videos, photos, Social media posts etc. It can even join, aggregate and analyse semi-structured or un-structured easily|
|3||Storage||Rational databases stored data in table and data is defined by schema. These are static in nature||The Hadoop stores its data in form of key-values pairs.
These are dynamic in nature.
|4||Scalability||RDBMS only allows constant workflow. If scaling is required it adds lots of horsepower i.e. CPU and RAM to small or single dataset.||Hadoop is good solution for companies and Businesses which requires variable database at all time. Hadoop requires more CPU and RAM then RDBMS but uses low power and work in parallel.|
|5||Querying||RDBMS take use of the SQL query language.||Hadoop uses MapReduce programs. This Map Reduce program follows SQL- like commands.|
|6||Size of Dataset||RDBMS uses Gigabytes of data||Hadoop is used for large dataset of about few petabytes.|
RDBMS approach uses ACID properties where ACID in an acronym which stands for Atomicity, Consistency, Isolation and Durability, this makes RDBMS approach feasible for transactions. However, Hadoop uses BASE approach where Basic Availability, Soft state and Eventually Consistence. Apart from this Hadoop is also based on CAP theorem which is the property that is followed by NOSQL approach (Consistency, Availability, Partition tolerance). NoSQL approach is followed by MySQL and Hadoop have tool called Hive which performs similar task as MySQL.