Open Access Open Access  Restricted Access Subscription Access

Tensor-Flow Based Approach to Identify Author of the Text

Kajal Patel, Dr. Brijesh S. Bhatt

Abstract


Now-a-days a lot of content is available on internet, and people upload lot of information in form of opinion, review, description, recipe etc. online. In such scenario to trace the authenticity of the data, it is necessary to develop an author identification system. It has become a difficult problem in the scope of unnamed information has increased with fast growing Internet life. It is a process to identify author of unknown text document. In existing system, so many authorship attribution methods were evaluated for natural languages, such as English, Chinese, and Arabic. Most of the experiments are done on the lexical and word based. Classification techniques such as Naïve Bayes1,Support Vector Machine, Neural Network, Multilayer Perceptron, Decision Tree2,k-nearest-neighbor are already used, but previous work proved that Support Vector Machine3 is a good classifier for author identification. In this system we identify author using deep learning. For identification we use features like Bag-of-Words, word tokenization and stemming. We build Deep neural Network and LSTM network using the Tensor-flow library4. The average accuracy achieve 68 % using Tensor-flow Deep Neural Network and 38.68 % using Tensor-flow LSTM network.

Cite this Article

Kajal Patel, Brijesh S. Bhatt. TensorFlow-based Approach to Identify Author of the Text. Current Trends in Information Technology. 2018; 8(3): 23–29p.


Keywords


Stylometric features, machine learning classifiers, LSTM network (Long Short-Term Memory) lexical features, function words/stop words, deep learning, bag-of-word, support vector machine

Full Text:

PDF

References


Sara E Manar E Bouanani Ismail Kassou Authorship Analysis Studies: A Survey International Journal of Computer Applications. 2014; 86 (12).

Ahmed M Mohsen, Nagwa M El-Makky, Nagia Ghanem. Author Identification using Deep Learning. IEEE International Conference on Machine Learning and Applications, 2016.

Rachel M Green, John W. Sheppard. Comparing Frequency- and Style-based Features for Twitter Author Identification. Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference, 2013.

Stamatatos E, Fakotakis N, Kokkinakis G. Computer- Based Authorship Attribution without lexical measures. Computers and Humanities. 2001; 193–214p.

Diederich Joachim, Kindermenn Jörg, Leopold Edda, Pass Gerhard. Authorship attribution with Support Vector Machines. Applied Intelligence. 2003; 109–123p.

Zhao Y, Zobel J. Effective and scalable authorship attribution using function words. Proceedings of the 2nd Asia Information Retrieval Symposium, 2005.

Chen H, Huang Z, Li J, Zheng R. A framework for authorship Identification of Online Messages: writing-Style features and classification Techniques. JASIST. 2006; 378–393p.

Sean Stanko, Devin Lu, Irving Hsu. Whose Book is it Anyway? Using Machine Learning to Identify the Author of Unknown Texts. Computer Science Department. Stanford University, Stanford; CA 94305; 2013.

Frederick Mosteller, David Wallace. Inference and disputed authorship: The Federalist. 1964


Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Current Trends in Information Technology

  • eISSN: 2249-4707
  • ISSN: 2348-7895