https://github.com/agrover112/awesome-semantic-search
A curated list of awesome resources related to Semantic Searchπ and Semantic Similarity tasks.
Science Score: 33.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
-
βcodemeta.json file
-
β.zenodo.json file
-
βDOI references
Found 5 DOI reference(s) in README -
βAcademic publication links
Links to: arxiv.org, researchgate.net, ieee.org, acm.org -
βCommitters with academic emails
1 of 22 committers (4.5%) from academic institutions -
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A curated list of awesome resources related to Semantic Searchπ and Semantic Similarity tasks.
Basic Info
Statistics
- Stars: 356
- Watchers: 9
- Forks: 29
- Open Issues: 7
- Releases: 0
Topics
Metadata Files
README.md
Awesome Semantic-Search

Logo made by @createdbytango.
Looking for More Paper Additions. PS: Raise a PR
Following repository aims to serve a meta-repository for Semantic Search and Semantic Similarity related tasks.
Semantic Search isn't limited to text! It can be done with images, speech, etc.There are numerous different use-cases and applications of semantic search.
Feel free to raise a PR on this repo!
Contents
Papers
2010
2014
2015
2016
- Bag of Tricks for Efficient Text Classification π
- Enriching Word Vectors with Subword Information π
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- On Approximately Searching for Similar Word Embeddings
- Learning Distributed Representations of Sentences from Unlabelled Dataπ
- Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement
2017
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data π
- Semantic Textual Similarity For Hindiπ
- Efficient Natural Language Response Suggestion for Smart Replyπ
2018
- Universal Sentence Encoder π
- Learning Semantic Textual Similarity from Conversations π
- Google AI Blog: Advances in Semantic Textual Similarity π
- Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech)π
- Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data π
- Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
- The Case for Learned Index Structures
2019
- LASER: Language Agnostic Sentence Representations π
- Document Expansion by Query Prediction π
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks π
- Multi-Stage Document Ranking with BERT π
- Latent Retrieval for Weakly Supervised Open Domain Question Answering
- End-to-End Open-Domain Question Answering with BERTserini
- BioBERT: a pre-trained biomedical language representation model for biomedical text miningπ
- Analyzing and Improving Representations with the Soft Nearest Neighbor Lossπ·
- DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
2020
- Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned π
- PASSAGE RE-RANKING WITH BERT π
- CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization π
- LaBSE:Language-agnostic BERT Sentence Embedding π
- Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset π
- DeText: A deep NLP framework for intelligent text understanding π
- Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation π
- Pretrained Transformers for Text Ranking: BERT and Beyond π
- REALM: Retrieval-Augmented Language Model Pre-Training
- ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORSπ
- Improving Deep Learning For Airbnb Search
- Managing Diversity in Airbnb Searchπ
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrievalπ
- Unsupervised Image Style Embeddings for Retrieval and Recognition Tasksπ·
- DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representationsπ
2021
- Hybrid approach for semantic similarity calculation between Tamil words π
- Augmented SBERT π
- BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models π
- Compatibility-aware Heterogeneous Visual Search π·
- Learning Personal Style from Few Examplesπ·
- TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learningπ
- A Survey of Transformersππ·
- SPLADE: Sparse Lexical and Expansion Model for First Stage Rankingπ
- High Quality Related Search Query Suggestions using Deep Reinforcement Learning
- Embedding-based Product Retrieval in Taobao Searchππ·
- TPRM: A Topic-based Personalized Ranking Model for Web Searchπ
- mMARCO: A Multilingual Version of MS MARCO Passage Ranking Datasetπ
- Database Reasoning Over Textπ
- How Does Adversarial Fine-Tuning Benefit BERT?)π
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolationπ
- Primer: Searching for Efficient Transformers for Language Modelingπ
- How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddingsπ
- SimCSE: Simple Contrastive Learning of Sentence Embeddingsπ
- Compositional Attention: Disentangling Search and Retrievalππ·
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
- GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval π
- Generative Search Engines: Initial Experiments π·
- Rethinking Search: Making Domain Experts out of Dilettantes -WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach
2022
- Text and Code Embeddings by Contrastive Pre-Trainingπ
- RELIC: Retrieving Evidence for Literary Claimsπ
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillationsπ
- SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representationπ
- An Analysis of Fusion Functions for Hybrid Retrievalπ
- Out-of-distribution Detection with Deep Nearest Neighbors
- ESB: A Benchmark For Multi-Domain End-to-End Speech Recognitionπ
- Analyzing Acoustic Word Embeddings From Pre-Trained Self-Supervised Speech Models)π
- Rethinking with Retrieval: Faithful Large Language Model Inferenceπ
- Precise Zero-Shot Dense Retrieval without Relevance Labelsπ
- Transformer Memory as a Differentiable Search Indexπ
2023
- FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Searchπ
- βLow-Resourceβ Text Classification: A Parameter-Free Classification Method with Compressorsπ
- SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval π
Articles
- Tackling Semantic Search
- Semantic search in Azure Cognitive Search
- How we used semantic search to make our search 10x smarter
- Stanford AI Blog : Building Scalable, Explainable, and Adaptive NLP Models with Retrieval
- Building a semantic search engine with dual space word embeddings
- Billion-scale semantic similarity search with FAISS+SBERT
- Some observations about similarity search thresholds
- Near Duplicate Image Search using Locality Sensitive Hashing
- Free Course on Vector Similarity Search and Faiss
- Comprehensive Guide To Approximate Nearest Neighbors Algorithms
- Introducing the hybrid index to enable keyword-aware semantic search
- Argilla Semantic Search
- Co:here's Multilingual Text Understanding Model
- Simplify Search woth Multilingual Embedding Models
Libraries and Tools
- fastText
- Universal Sentence Encoder
- SBERT
- ELECTRA
- LaBSE
- LASER
- Relevance AI - Vector Platform From Experimentation To Deployment
- Haystack
- Jina.AI
- pinecone
- SentEval Toolkit
- ranx
- BEIR :Benchmarking IR
- RELiC: Retrieving Evidence for Literary Claims Dataset
- matchzoo-py
- deeptextmatching
- Which Frame?
- lexica.art
- emoji semantic search
- PySerini
- BERTSerini
- BERTSimilarity
- milvus
- NeuroNLP++
- weaviate
- semantic-search-through-wikipedia-with-weaviate
- natural-language-youtube-search
- same.energy
- ann benchmarks
- scaNN
- REALM
- annoy
- pynndescent
- nsg
- FALCONN
- redis HNSW
- autofaiss
- DPR
- rank_BM25
- FlashRank
- nearPy
- vearch
- vespa
- PyNNDescent
- pgANN
- Tensorflow Similarity
- opensemanticsearch.org
- GPT3 Semantic Search
- searchy
- txtai
- HyperTag
- vectorai
- embeddinghub
- AquilaDb
- STripNet
Datasets
- Semantic Text Similarity Dataset Hub
- Facebook AI Image Similarity Challenge
- WIT : Wikipedia-based Image Text Dataset
- BEIR
- MTEB
Milestones
Have a look at the project board for the task list to contribute to any of the open issues.
Owner
- Login: Agrover112
- Kind: user
- Repositories: 113
- Profile: https://github.com/Agrover112
Humans trying to understand machines and people.
GitHub Events
Total
- Watch event: 20
- Pull request event: 1
Last Year
- Watch event: 20
- Pull request event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Agrover112 | 4****2 | 169 |
| Shivam Dubey | s****6@g****m | 13 |
| allcontributors[bot] | 4****] | 4 |
| Ankit Grover | a****2@g****m | 3 |
| Shreyas S | 9****u | 2 |
| David Chuan-En Lin | d****4@g****m | 2 |
| Joshua T | b****n@g****m | 2 |
| boba_and_beer | j****g@g****m | 1 |
| Yiyou Sun | s****u@c****u | 1 |
| Aditya Thakur | 4****o | 1 |
| Ali Faizan | 6****a | 1 |
| Divyanshu Singh | 5****7 | 1 |
| Karuna Tata | 7****n | 1 |
| Meenal | m****1 | 1 |
| Vanessasaurus | 8****h | 1 |
| schmelto | 3****o | 1 |
| Michael Floering | m****g@g****m | 1 |
| Krish Dev DB | k****b@g****m | 1 |
| Ana | a****a@g****m | 1 |
| Devvrat1010 | r****7@g****m | 1 |
| Eddie Jaoude | e****e@j****m | 1 |
| Janmejay Chatterjee | j****e@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 22
- Total pull requests: 43
- Average time to close issues: 3 months
- Average time to close pull requests: 7 days
- Total issue authors: 7
- Total pull request authors: 23
- Average comments per issue: 2.59
- Average comments per pull request: 0.56
- Merged pull requests: 35
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Agrover112 (14)
- vinzvinci (2)
- Vyvy-vi (2)
- toth2000 (1)
- vikasganiga05 (1)
- meenal21 (1)
- shubhendumadhukar (1)
Pull Request Authors
- Agrover112 (14)
- WebShivam (3)
- radiantly (2)
- chuanenlin (2)
- SamuelGong (2)
- brownpanthera (2)
- Devvrat1010 (2)
- allcontributors[bot] (2)
- Zhreyu (1)
- sunyiyou (1)
- ghost (1)
- parth-gpt (1)
- NotTheRightGuy (1)
- divyanshu887 (1)
- Alisha-786 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/first-interaction v1 composite
- actions/labeler v3 composite
- tgymnich/fork-sync v1.4 composite
- actions/checkout v2 composite
- urlstechie/urlchecker-action 0.0.27 composite