Open Access Open Access  Restricted Access Subscription Access

Efficient Information Retrieval based on Name and Aliases from the Web using Anchor Text Mining

Likhitha S. R., Veerappa B. N., Rafi M.

Abstract


Abstract
Information Retrieval (IR) plays an important role in search engines. The major task is to search and retrieve complete information from the web in an effective and efficient way and hence it is found to be an enabling technology to realize the full potential of it. IR has a vital role in our day to day activities and it is considered to have a most prominent role in the search engines. The major challenge is to retrieve the relevant natural language text document. Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In the proposed system we make use of Natural Language Processing (NLP) for the word extraction process from the anchor links. The web search engine expands the search query by tagging aliases of the given search query to obtain the complete information and hence improving the task of search engine. NLP includes phrase extraction and named entity recognition. It provides advanced text analytics tools for extraction of words from the sentence. Thus, in our project as a part of word extraction module we are using it to extract the associated words from it. Association of words is determined by statistical clustering. Classification is the act of placing cases into categories based on attributes or into clusters based on best fit. In this we do parsing and chunking to extract the Noun Phrases. These noun phrases are used as key phrases to rank the documents. Organizing search results into clusters facilitates user’s quick browsing through search results. Our method will order the aliases based on their associations with the name using the definition of anchor texts between name and represents in graph to improve readability.


Keywords: Anchor text mining, google anchor links, word extraction, natural language processing (NLP), graphical representation


Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Journal of Computer Technology & Applications