https://github.com/captaincodercool/optimized-tfidf-search-with-query-handling
This Python program implements a TF-IDF-based document retrieval system with advanced query optimization. It preprocesses text files by tokenizing, removing stopwords, and stemming. It builds posting lists, normalizes vectors, and retrieves the most relevant documents using cosine similarity, optimized with top-10 element filtering for efficiency.
https://github.com/captaincodercool/optimized-tfidf-search-with-query-handling
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic links in README
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (2.3%) to scientific vocabulary
Repository
This Python program implements a TF-IDF-based document retrieval system with advanced query optimization. It preprocesses text files by tokenizing, removing stopwords, and stemming. It builds posting lists, normalizes vectors, and retrieves the most relevant documents using cosine similarity, optimized with top-10 element filtering for efficiency.
Basic Info
- Host: GitHub
- Owner: CAPTAINCODERCOOL
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: master
- Size: 201 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
Owner
- Login: CAPTAINCODERCOOL
- Kind: user
- Repositories: 1
- Profile: https://github.com/CAPTAINCODERCOOL
GitHub Events
Total
- Push event: 1
- Create event: 1
Last Year
- Push event: 1
- Create event: 1