Wednesday, April 3, 2019
Technologies to Analyze Big Data
Technologies to go Big DataHassan, Ruman UlCurrently, most of the companies a analogous(p) Facebook, Google, and Amazon are generating an all-inclusive recogniseive information and this data is termed as big data. In addition to the in a higher place mentioned sources, there are numerous other sources like banking, airlines, stock market, and digital media that fusss big data. Nandimath, Patil, Banerjee, Kakade, and Vaidya (2013) state that the volume of data being generated daily is change magnitude rapidly and the size of this data is nearer to zeta bytes (p. 700). This operator the size of the data is increasing chop-chop. This data holds a value that shadower benefits railway line ecesiss to change their business stability and to incr solace their profit. However, this big data creates the problem of entrepot and impacting. Prior to ten years ago, the data was stored and unconscious act uponed in a traditional database management clay. This system is called a s Relational Database Management corpse (RDBMS). After the rise of big data, it is really difficult for the RDBMS to process this enormous data. Thus, m whatever researchers focuse their study in developing a technology that can effectively analyze the big data.After extensive research, Google has proposed a google bill system for storing the big data and a interpret ignore algorithmic rule for touch this data. Moreover, Nandimath et al. (2013) assert that Apache hadoop is used for distributed impact of big data (p. 700). This model aids many organizations in efficiently analyzing their big data. Beside Hadoop, the other technologies that help in analyzing the big data are Pig, stash away, Hbase, Zoo Keeper, and Sqoop. Each tool has their accept requirements, so the usage of these tools depends on the criticality of the data and the requirement of the organization or business. However, the three major technologies to analyze big data are hadoop, put in, and informer .Hadoop is one the major technologies to analyze the big data. It is the simulation developed by Apache for process extensive data sets. This framework helps business firms to effectively process their un grammatical constructiond data like video, audio and image. In addition, this framework benefits many business organizations to improve their financial stability by effectively analyzing their data. Furthermore, the hadoop framework consists of two main components, hadoop distributed file system (HDFS) and map number computer programing paradigm. The function of HDFS is to store the comprehensive datasets in distributed environment. Distributed environment allows the developer to store the large data sets on multiple machines. Thus, it helps in improving the retrieval process of immense data. In addition, Nandimath et al. (2013) state that Hadoop uses its own file system HDFS which facilitates spendthrift transfer of data which can sustain node failure a whole (p. 700). It also helps developer to overcome the storage problem. For example, if immense data is stored on a single machine then it creates a problem of bear on and retrieving because of its size. Thus, if that data is distributed on multiple machines then it provide an ease for the developer for processing and retrieving. Beside fast processing and retrieving, reliability is also a benefit of HDFS. HDFS deliver the high-priceds high reliability by replicating the data on different machines. Therefore, if any machine fails in distributed environment, then the data of that particular machine leave alone be easily recovered through and through backups.According to Dittrich and Ruiz (2012), the benefit of map reduce is that developers need to define only single functions for map and reduce travail (p. 2014). This map reduce paradigm helps developers to overcome the problem of efficiently processing the data. Moreover, Nandimath et al. (2013) believe that the purpose of map is to sort out the jo b into small parts and distribute it to different nodes, while the purpose of reduce is to generate the sought after result (p. 701). For instance, if Facebook wants to analyze the user interest then the Facebook testament first deploy the generated data on HDFS and performs the map labor to divide the zeta byte of data and then perform the reduce task to get the desired result. Thus, it shows that hadoop helps organizations for efficiently analyzing their extensive datasets.Another technology to analyze big data is hive. It is a data warehouse framework build upon hadoop. It provides an ability for the developer to structure and analyze the data. In hadoop, the data processing task is performed using umber programming terminology where as in hive, processing a task is performed using structured query language (SQL). In addition. Borkar, Carey, and Liu (2012) assert that Hive is SQL-inspired and reported to be used for over 90% of the Facebook map reduce use cases (p. 2). Thus , the main goal of hive is to process the data through SQL like interface. Moreover, the traditional SQL standards were restricting the hive from perform some intense trading trading operations like extracting, transforming and loading the big data. As a result, hive developed their own query language called hive query language (HQL).Besides traditional SQL standards, HQL includes some specific hive extensions that provide an ease for the developer to effectively analyze the big data. Furthermore, hive helps developer to overcome the scalability issue by using distributed file system mechanism. It also helps them to achieve the fast chemical reaction prison term through HQL. For example, general SQL statements like SELECT and INSERT exit consume more time on traditional database management system for big data where as in hive the same operations can be performed efficiently. Moreover, Liu, Liu, Liu, and Li (2013) conclude that with precise system line of reasoning tuning in hive, an acceptable performance can be achieved (p. 45). This means if the developer precisely changes the system parameters for analyzing the data, then performance efficiency can be improved for that task.Besides hadoop and hive, fuzz is also a major technology to analyze the big data. Pig allows the developer to analyze and process the enormous datasets quickly and easily through shift. It is also called dataflow language. The boar framework is used along with HDFS and map reduce paradigm. The working of down is similar to that of hive take out the query language. In consume a task is performed using pig Latin whereas in hive, the task is performed using HQL. The main benefit of pig is that pig Latin queries can be integrated with other languages like Java, Jruby, and Python and it also allow users to define their own functions to perform the task as per their needs. Moreover, as pig is a dataflow language it helps developer to illustrate the data transformation process. For example, in pig it is easy to perform the data transformation operations like Split, Stream, and Group compare to SQL. In addition, the pig framework is divided up into two parts pig Latin language and pig interpreter. The pig latin is a query language to process big data. In addition, Lee, Lee, Choi, Chung, and Moon (2011) assert that in pig framework a task is processed using pig latin language (p. 14). The pig latin queries help developer to process the data efficiently and quickly. Another component of pig framework is pig interpreter. The work of interpreter is to convert the pig latin queries into map reduce jobs and also to evaluate the bugs in pig latin queries. For example, if Facebook developer writes the pig latin query to find the people in India that like rock music, then this query is first interpreted by pig interpreter to identify bugs and then it is converted to map reduce jobs. Thus, with the help of pig latin queries, developers can avoid the stress of writing a tedious code in java to perform the same action.In conclusion, the three technologies to process the big data are hadoop, hive, and pig. These frameworks help business organizations to find the value from their data. In addition, each technology is useful for performing a task differently. For instance, Apache Hadoop is useful for analyzing the offline data and it cannot process the real time data like banking data. Moreover, hive provides a SQL like interface that makes the processing a lot easier because the user does not have to write the long tedious code. Hive is good for those user who are not good at programming and best in SQL. Similarly, pig also makes the processing task much easier forusers. All the map reduce jobs can be written in pig latin queries to get desired results. Therefore, organizations should select the technology based on their data formats and requirements. However, all these technologies help organizations to process and store their data efficiently.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.