https://github.com/alinajafi1998/query-expansion

Query Expansion via thesaurus

https://github.com/alinajafi1998/query-expansion

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Query Expansion via thesaurus

Basic Info
  • Host: GitHub
  • Owner: AliNajafi1998
  • Default Branch: master
  • Homepage:
  • Size: 5.17 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of adelra/query-expansion
Created about 6 years ago · Last pushed about 7 years ago

https://github.com/AliNajafi1998/query-expansion/blob/master/

**Query Expansion via thesaurus**

*Introduction:*

Query expansion in a big part in information retrieval. Its a big part of query understanding. Query understanding is done by inferring what does the user want from the query so that most related documents will be retrieved. In this project, I used several packages and several libraries for query expansion. NLP packages like Hazm. 

*Methods:*

Query expansion uses several processes from Natural Language Processing. In most cases when implementing query understanding systems, a simple or sometimes very complex spell checker is used. In this project; however, our first step in this project is normalization. In normalization process we use Hazm library normalizer. The output undergoes a stemming algorithm which is partly from Hazm algorithm used with some modifications. Some suffixes were removed from the algorithm. Then a simple tokenizer is applied to the output so that tokens will be divided. The most important part of the research is the lexicon. I extracted the vocabularies and their synonyms from the book:         written by:  
After preprocessing of the texts from the book, and cleaning the data I dumped the words and their synonyms into a python pickle file. Python pickle is a module for serializing and de-serializing python objects. With pickle python objects will be converted into byte streams. Then I wrote a module to load this pickle file and get the input query. It will then process the input query and find the synonyms and show the nearest queries in the sense of meaning.

![Alt text](Picture.png?raw=true "diagram")

*Paper:*

https://arxiv.org/pdf/1811.00854.pdf

Owner

  • Name: Ali Najafi
  • Login: AliNajafi1998
  • Kind: user
  • Location: Istanbul, Turkey

Machine Learning Engineer - MSc Computer Science - Sabanci University

GitHub Events

Total
Last Year