Open Access Open Access  Restricted Access Subscription Access

Optimizing Hive Performance on Analyzing Log Data with Hadoop

L. K. Vishwamitra, Vijay Singh Pawar

Abstract


Web servers automatically created or maintained the log files to observe the user behaviour. So, we can get the valuable information through analyzing these server log files. Log files collect a variety of data about information requests to your web server. Server logs act as a visitor sign-in sheet. Server log files can give information about what pages get the most and the least traffic? What sites refer visitors to your site? What pages that your visitors view? There is huge usage of web which generates a huge amount of log files and processing such huge amount of log files using RDBMS creates problem. In this we proposed Hadoop which is an open source solution for handling big data. For analysis we can use bigdata analytical tool called hive which works on top of the Hadoop. Hive support SQL like language to analyze the data called HiveQL. In this paper we can also enhance the performance of hive query on analyzing log data.

Cite this Article

L.K. Vishwamitra, Vijay Singh Pawar. Optimizing Hive Performance on Analyzing Log Data with Hadoop. Journal of Communication Engineering & Systems. 2018; 8(3): 45–52p.


Keywords


Hadoop, web mining, log data, bigdata, pattern analysis

Full Text:

PDF

References


Santhanakumar M, Christopher Columbus C. Web Usage Analysis of Web pages Using Rapidminer. WSEAS Transactions on Computers. 2015;

Web server logs //http. Sever side log.org

Srivastava J et al. Web usage Mining: Discovery and Applications of usage patterns from Web Data. ACM SIGKDD Explorations. 2000; 1 (2): 12-23p.

Suguna S, Vithya M, Christy Eunaicy JI. Big Data Analysis in E-commerce System Using Hadoop Map Reduce. IEEE. 2016.

A white paper by Orzota Inc. Beyond Web Application Log Analysis using Apache Hadoop.

Shaily G Langhnoja, Mehul P Barot, Darshak B Mehta. Web Usage Mining Using Association Rule Mining on Clustered Data for Pattern Discovery. International Journal of Data Mining Techniques and Applications. 2013; 2 (1).

Nanhay Singh, Achin Jain, Ram, Shringar Raw. Comparison Analysis of Web Usage Mining Using Pattern Recognition Techniques. International Journal of Data Mining & Knowledge Process. 2013; 3 (4).

Saravanan S, Uma Maheswari B. Analyzing Large Web Log Files in A Hadoop Distributed Cluster Environment. International Journal of Computer Technology & Applications. 5: 1677-1681p.

Mrunal Sogodekar, Shikha Pandey, Isha Tupkari, Amit Manekar. Big Data Analytics: Hadoop and Tools in 2016. IEEE Bombay Section Symposium (IBSS), IEEE 2016.

Michael G Noll. Applied Research, Big Data, Distributed Systems, Open Source. Running Hadoop on Ubuntu Linux (Single-Node Cluster). [online] available from http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ [accessed on December 2018].

Shvachko KV. The Hadoop Distributed File System Requirements. MSST ’10 Proceeding of the 2010 IEEE 26th Symposium on Mass Storage System and Technologies (MSST).

Apache Hadoop //http://hadoop.apache.org.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Journal of Communication Engineering & Systems