Open Access Open Access  Restricted Access Subscription Access

A Novel Approach of Tesseract-OCR Usage for Newspaper Article Images

Chaitanya Tejaswi, Bhargav Goradiya

Abstract


A novel approach for optical character recognition of newspaper article images (captured as smartphone camera images) is presented, with evaluation based on two sets of images; both captured using the same camera, under varying lighting conditions.

Cite this Article

Chaitanya Tejaswi, Bhargav Goradiya, Ripal Patel. A Novel Approach of Tesseract-OCR Usage for Newspaper Article Images. Journal of Computer Technology & Applications. 2018; 9(3): 24–29p.


Keywords


Optical Character Recognition (OCR), Binarization, Tesseract-OCR, OpenCV

Full Text:

PDF

References


Seeger, M. et al. (2003), "Background Surface Thresholding", US Patent 6,577,762, Xerox Corporation.

Mayzlin, I. (2008), "Enhanced optical recognition of digitized images through selective bit insertion", US Patent 7,400,768, Cardiff Software, Inc.

Yuan, C. et al. (2016), "Image preprocessing for character recognition ", US Patent 9,298,980, Amazon Technologies, Inc.

Ashok, A. et al. (2013), "Object-Based Optical Character Recognition Pre-Processing Algorithm", US Patent 8,457,423, OmniVision Technologies, Inc.

Simske, S.J. (2006), "Systems and methods for processing text-based electronic documents", US Patent 7,106,905, Hewlett-Packard Development Company, L.P.

Suplee, R.H. III et al. (2018), "Text recognition driven functionality", US Patent 9,916,514, Xerox Corporation.

Borovikov E., “A survey of modern optical character recognition techniques”, arXiv:1412.4183, 13 Dec. 2014.

Seeger M, Dance C. Binarising camera images for OCR, (ICDAR 2001, Proceedings of the 6th International Conference on Document Analysis and Recognition), IEEE, Seattle, WA, USA, USA, 2001.

Ranjith Unnikrishnan, Ray Smith, Combined Script and Page Orientation Estimation using the Tesseract OCR engine, (ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition), Barcelona, Spain, July 2009.

Ray Smith, An Overview of the Tesseract OCR Engine, (MOCR '09 Proceedings of the International Workshop on Multilingual OCR), IEEE, Parana, Brazil, 23–26 Sept. 2007.

Ray Smith, Daria Antoniva, Dar-Shyang Lee, Adapting the Tesseract Open Source OCR Engine for Multilingual OCR, (MOCR '09 Proceedings of the International Workshop on Multilingual OCR), Barcelona, Spain, July 2009.

Sauvola JJ, Tapio Seppänen, Sami Haapakoski, Matti Pietikäinen, Adaptive Document Binarization, (ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition), IEEE, Ulm, Germany, Germany, 18–20 Aug. 1997, pg. 225-236.

Niblack W. An introduction to Digital Image Processing, Prentice-Hall, 1986, pg. 115–116.

Nobuyuki Otsu, A threshold selection method from gray-level histograms. IEEE T Sys Man Cyber. 9, 1979, pg. 62-66.

OpenCV Reference Manual (OpenCV3.0 Documentation), pg. 265, 294–295.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Journal of Computer Technology & Applications