Science Score: 41.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Keywords
Repository
SGPT: GPT Sentence Embeddings for Semantic Search
Basic Info
- Host: GitHub
- Owner: Muennighoff
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://arxiv.org/abs/2202.08904
- Size: 17.4 MB
Statistics
- Stars: 867
- Watchers: 8
- Forks: 54
- Open Issues: 28
- Releases: 0
Topics
Metadata Files
README.md
SGPT: GPT Sentence Embeddings for Semantic Search
This repository contains code, results & pre-trained models for the paper SGPT: GPT Sentence Embeddings for Semantic Search.
**************************** Updates ****************************
- 2024-02: We released GRIT & GritLM - These models unify SGPT Bi-Encoders, Cross-Encoders, symmetric, asymmetric, and regular GPT (i.e. generation) all in 1 single model at much better performance on all accounts. We recommend switching to these new models :)
- 2022-09: SGPT Bi-Encoders are now easy to use with Sentence Transformers, see new scripts
- 2022-08: Multilingual BLOOM SGPT models were released: Asymmetric, 7.1B parameters & Symmetric, 1.7B parameters. Feel free to open an issue if you need a different model.
- 2022-06: OpenAI released the mechanism of their Search Endpoint that we compared to SGPT Cross-Encoders in the paper. Our methods are very similar. Feel free to test their prompt as seen in
crossencoder/beir/openai_search_endpoint_functionality.py! - 2022-03: 5.8B Bi-Encoder models are now 4% & 1% better on USEB & BEIR, respectively. Paper & models on HF have been updated. This has been done by using larger batch sizes with GradCache, see the paper for more info. If you have previously downloaded them, we recommend replacing it with the new version.
- 2022-02: We released our paper. Check it out! :)
Quick Links
- Overview
- Structure
- Use SGPT with Huggingface
- Use SGPT with Sentence Transformers
- Acknowledgements
- Citation
Overview
We present SGPT-BE and SGPT-CE for applying GPT models as Bi-Encoders or Cross-Encoders to symmetric or asymmetric search. SGPT-BE produces semantically meaningful sentence embeddings by contrastive fine-tuning of only bias tensors and position-weighted mean pooling. SGPT-CE uses log probabilities from GPT models without any fine-tuning. An illustration of the methods follows.

Feel free to open an issue should you have any questions~
Structure
bash
.
├── biencoder # Training & Inference of Bi-Encoders
│ ├── beir
│ │ ├── custommodels # Directory providing BEIR compatibility for asymmetric mdoels & models with special tokens
│ │ │ └── ...
│ │ ├── io_utils # Exclusively used for beir_openai_embeddings_batched_parallel.py
│ │ │ └── ...
│ │ ├── parallelizer # Exclusively used for beir_openai_embeddings_batched_parallel.py
│ │ │ └── ...
│ │ ├── beir_dense_retriever.py
│ │ ├── beir_openai_embeddings_batched_parallel.py
│ │ ├── requirements.txt
│ │ ├── *.bash # Bash scripts to run multiple experiments
│ │ └── README.md
│ ├── nli_msmarco
│ │ ├── sentence-transformers # An adapted version of sentence-transformers - Install this version for all biencoder experiments
│ │ │ └── ...
│ │ └── README.md
│ └── useb
│ ├── useb
│ │ └── ...
│ ├── *.bash # Bash scripts to run multiple experiments
│ ├── useb_dense_retriever.py
│ └── README.md
├── crossencoder # Inference of Cross-Encoders
│ └── beir
│ ├── *.ipynb # Notebooks explained in the README
│ └── README.md
├── other
│ ├── sgpt_graphic.png
│ └── sgpt_utils.ipynb # Code for creating the graphs in the paper & other
├── requirements.txt
└── README.md
Each data sub-directory provides its own README with an overview of its Structure, Downloads (Datasets, Models) & Commands used to produce the datasets, models & other things. Generally, you can find all models at https://huggingface.co/Muennighoff and json results in various datasets at https://www.kaggle.com/muennighoff/datasets. Model names are explained in their Huggingface READMEs. Dataset names are explained in the sub-folders of this repository.
Use SGPT with Huggingface
Below we provide python examples to use the pre-trained models for your own semantic search use case.
We highly recommend replacing the model names with larger models, e.g. Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit for biencoder/symmetric.
Bi-Encoder
Symmetric Semantic Search BE
```python import torch from transformers import AutoModel, AutoTokenizer from scipy.spatial.distance import cosine
Get our models - The package will take care of downloading the models automatically
For best performance: Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit
tokenizer = AutoTokenizer.frompretrained("Muennighoff/SGPT-125M-weightedmean-nli-bitfit") model = AutoModel.frompretrained("Muennighoff/SGPT-125M-weightedmean-nli-bitfit")
Deactivate Dropout (There is no dropout in the above models so it makes no difference here but other SGPT models may have dropout)
model.eval()
Tokenize input texts
texts = [ "deep learning", "artificial intelligence", "deep diving", "artificial snow", ] batchtokens = tokenizer(texts, padding=True, truncation=True, returntensors="pt")
Get the embeddings
with torch.nograd(): # Get hidden state of shape [bs, seqlen, hiddim] lasthiddenstate = model(**batchtokens, outputhiddenstates=True, returndict=True).lasthidden_state
Get weights of shape [bs, seqlen, hiddim]
weights = ( torch.arange(start=1, end=lasthiddenstate.shape[1] + 1) .unsqueeze(0) .unsqueeze(-1) .expand(lasthiddenstate.size()) .float().to(lasthiddenstate.device) )
Get attn mask of shape [bs, seqlen, hiddim]
inputmaskexpanded = ( batchtokens["attentionmask"] .unsqueeze(-1) .expand(lasthiddenstate.size()) .float() )
Perform weighted mean pooling across seqlen: bs, seqlen, hiddendim -> bs, hiddendim
sumembeddings = torch.sum(lasthiddenstate * inputmaskexpanded * weights, dim=1) summask = torch.sum(inputmaskexpanded * weights, dim=1)
embeddings = sumembeddings / summask
Calculate cosine similarities
Cosine similarities are in [-1, 1]. Higher means more similar
cosinesim01 = 1 - cosine(embeddings[0], embeddings[1]) cosinesim02 = 1 - cosine(embeddings[0], embeddings[2]) cosinesim0_3 = 1 - cosine(embeddings[0], embeddings[3])
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosinesim01)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosinesim02)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[3], cosinesim0_3)) ```
Asymmetric Semantic Search BE
```python import torch from transformers import AutoModel, AutoTokenizer from scipy.spatial.distance import cosine
Get our models - The package will take care of downloading the models automatically
For best performance: Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit
tokenizer = AutoTokenizer.frompretrained("Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit") model = AutoModel.frompretrained("Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit")
Deactivate Dropout (There is no dropout in the above models so it makes no difference here but other SGPT models may have dropout)
model.eval()
queries = [ "I'm searching for a planet not too far from Earth.", ]
docs = [ "Neptune is the eighth and farthest-known Solar planet from the Sun. In the Solar System, it is the fourth-largest planet by diameter, the third-most-massive planet, and the densest giant planet. It is 17 times the mass of Earth, slightly more massive than its near-twin Uranus.", "TRAPPIST-1d, also designated as 2MASS J23062928-0502285 d, is a small exoplanet (about 30% the mass of the earth), which orbits on the inner edge of the habitable zone of the ultracool dwarf star TRAPPIST-1 approximately 40 light-years (12.1 parsecs, or nearly 3.7336×1014 km) away from Earth in the constellation of Aquarius.", "A harsh desert world orbiting twin suns in the galaxy’s Outer Rim, Tatooine is a lawless place ruled by Hutt gangsters. Many settlers scratch out a living on moisture farms, while spaceport cities such as Mos Eisley and Mos Espa serve as home base for smugglers, criminals, and other rogues.", ]
SPECBQUEBOS = tokenizer.encode("[", addspecialtokens=False)[0] SPECBQUEEOS = tokenizer.encode("]", addspecialtokens=False)[0]
SPECBDOCBOS = tokenizer.encode("{", addspecialtokens=False)[0] SPECBDOCEOS = tokenizer.encode("}", addspecialtokens=False)[0]
def tokenizewithspecb(texts, isquery):
# Tokenize without padding
batchtokens = tokenizer(texts, padding=False, truncation=True)
# Add special brackets & pay attention to them
for seq, att in zip(batchtokens["inputids"], batchtokens["attentionmask"]):
if isquery:
seq.insert(0, SPECBQUEBOS)
seq.append(SPECBQUEEOS)
else:
seq.insert(0, SPECBDOCBOS)
seq.append(SPECBDOCEOS)
att.insert(0, 1)
att.append(1)
# Add padding
batchtokens = tokenizer.pad(batchtokens, padding=True, returntensors="pt")
return batch_tokens
def getweightedmeanembedding(batchtokens, model): # Get the embeddings with torch.nograd(): # Get hidden state of shape [bs, seqlen, hiddim] lasthiddenstate = model(**batchtokens, outputhiddenstates=True, returndict=True).lasthiddenstate
# Get weights of shape [bs, seq_len, hid_dim]
weights = (
torch.arange(start=1, end=last_hidden_state.shape[1] + 1)
.unsqueeze(0)
.unsqueeze(-1)
.expand(last_hidden_state.size())
.float().to(last_hidden_state.device)
)
# Get attn mask of shape [bs, seq_len, hid_dim]
input_mask_expanded = (
batch_tokens["attention_mask"]
.unsqueeze(-1)
.expand(last_hidden_state.size())
.float()
)
# Perform weighted mean pooling across seq_len: bs, seq_len, hidden_dim -> bs, hidden_dim
sum_embeddings = torch.sum(last_hidden_state * input_mask_expanded * weights, dim=1)
sum_mask = torch.sum(input_mask_expanded * weights, dim=1)
embeddings = sum_embeddings / sum_mask
return embeddings
queryembeddings = getweightedmeanembedding(tokenizewithspecb(queries, isquery=True), model) docembeddings = getweightedmeanembedding(tokenizewithspecb(docs, isquery=False), model)
Calculate cosine similarities
Cosine similarities are in [-1, 1]. Higher means more similar
cosinesim01 = 1 - cosine(queryembeddings[0], docembeddings[0]) cosinesim02 = 1 - cosine(queryembeddings[0], docembeddings[1]) cosinesim03 = 1 - cosine(queryembeddings[0], doc_embeddings[2])
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[0][:20] + "...", cosinesim01)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[1][:20] + "...", cosinesim02)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[2][:20] + "...", cosinesim0_3)) ```
Cross-Encoder
Asymmetric Semantic Search CE
```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from scipy.spatial.distance import cosine
Get models - The package will take care of downloading the models automatically
For best performance: EleutherAI/gpt-j-6B
tokenizer = AutoTokenizer.frompretrained("EleutherAI/gpt-neo-125M") model = AutoModelForCausalLM.frompretrained("EleutherAI/gpt-neo-125M")
Deactivate Dropout (There is no dropout in the above models so it makes no difference here but other SGPT models may have dropout)
model.eval()
prompt = 'Documents are searched to find matches with the same content.\nThe document "{}" is a good search result for "'
queries = [ "I'm searching for a planet not too far from Earth.", ]
docs = [ "Neptune is the eighth and farthest-known Solar planet from the Sun. In the Solar System, it is the fourth-largest planet by diameter, the third-most-massive planet, and the densest giant planet. It is 17 times the mass of Earth, slightly more massive than its near-twin Uranus.", "TRAPPIST-1d, also designated as 2MASS J23062928-0502285 d, is a small exoplanet (about 30% the mass of the earth), which orbits on the inner edge of the habitable zone of the ultracool dwarf star TRAPPIST-1 approximately 40 light-years (12.1 parsecs, or nearly 3.7336×1014 km) away from Earth in the constellation of Aquarius.", "A harsh desert world orbiting twin suns in the galaxy’s Outer Rim, Tatooine is a lawless place ruled by Hutt gangsters. Many settlers scratch out a living on moisture farms, while spaceport cities such as Mos Eisley and Mos Espa serve as home base for smugglers, criminals, and other rogues.", ]
for query in queries: print(f"Query: {query}") for doc in docs: context = prompt.format(doc)
context_enc = tokenizer.encode(context, add_special_tokens=False)
continuation_enc = tokenizer.encode(query, add_special_tokens=False)
# Slice off the last token, as we take its probability from the one before
model_input = torch.tensor(context_enc+continuation_enc[:-1])
continuation_len = len(continuation_enc)
input_len, = model_input.shape
# [seq_len] -> [seq_len, vocab]
logprobs = torch.nn.functional.log_softmax(model(model_input)[0], dim=-1).cpu()
# [seq_len, vocab] -> [continuation_len, vocab]
logprobs = logprobs[input_len-continuation_len:]
# Gather the log probabilities of the continuation tokens -> [continuation_len]
logprobs = torch.gather(logprobs, 1, torch.tensor(continuation_enc).unsqueeze(-1)).squeeze(-1)
score = torch.sum(logprobs)
# The higher (closer to 0), the more similar
print(f"Document: {doc[:20] + '...'} Score: {score}")
```
Symmetric Semantic Search CE
You can use the same code as in the above CE-Asym section but change the prompt. Feel free to share prompts that work well :)
Use SGPT with Sentence Transformers
Bi-Encoder ST
Symmetric Semantic Search BE ST
Symmetric models are now 100% compatible with the latest sentence-transformers via pip install git+https://github.com/UKPLab/sentence-transformers.git. You should get the same results as in the HuggingFace script above.
```python from scipy.spatial.distance import cosine from sentence_transformers import SentenceTransformer
texts = [ "deep learning", "artificial intelligence", "deep diving", "artificial snow", ]
model = SentenceTransformer("Muennighoff/SGPT-125M-weightedmean-nli-bitfit") embeddings = model.encode(texts)
cosinesim01 = 1 - cosine(embeddings[0], embeddings[1]) cosinesim02 = 1 - cosine(embeddings[0], embeddings[2]) cosinesim0_3 = 1 - cosine(embeddings[0], embeddings[3])
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosinesim01)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosinesim02)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[3], cosinesim0_3)) ```
Asymmetric Semantic Search BE ST
SGPT Sentence Transformers
Install: pip install --upgrade git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb
Use the below, which produces the exact same scores as the HuggingFace solution above.
```python from scipy.spatial.distance import cosine from sentence_transformers import SentenceTransformer
queries = [ "I'm searching for a planet not too far from Earth.", ]
docs = [ "Neptune is the eighth and farthest-known Solar planet from the Sun. In the Solar System, it is the fourth-largest planet by diameter, the third-most-massive planet, and the densest giant planet. It is 17 times the mass of Earth, slightly more massive than its near-twin Uranus.", "TRAPPIST-1d, also designated as 2MASS J23062928-0502285 d, is a small exoplanet (about 30% the mass of the earth), which orbits on the inner edge of the habitable zone of the ultracool dwarf star TRAPPIST-1 approximately 40 light-years (12.1 parsecs, or nearly 3.7336×1014 km) away from Earth in the constellation of Aquarius.", "A harsh desert world orbiting twin suns in the galaxy’s Outer Rim, Tatooine is a lawless place ruled by Hutt gangsters. Many settlers scratch out a living on moisture farms, while spaceport cities such as Mos Eisley and Mos Espa serve as home base for smugglers, criminals, and other rogues.", ]
class SentenceTransformerSpecb(SentenceTransformer): # Requires: # pip install git+https://github.com/Muennighoff/sentence-transformers.git@sgptpoolingsspecb def init(self, args, *kwargs): super().init(args, *kwargs) tokens = ["[SOS]", "{SOS}"] self.firstmodule().tokenizer.addtokens(tokens, specialtokens=True) self.firstmodule().automodel.resizetokenembeddings(len(self.firstmodule().tokenizer)) # Will be replaced with the rep tokens in the model ones # The problem is we don't know if a text is query or document when tokenizing in the Transformer.py module, # so we use the SOS tokens as an identifier if we have a query or document at hand & then replace them # If we would directly use the brackets here, they may become part of another token self.firstmodule().bosspectokenq = self.firstmodule().tokenizer.encode("[SOS]", addspecialtokens=False)[0] self.firstmodule().bosspectokend = self.firstmodule().tokenizer.encode("{SOS}", addspecialtokens=False)[0] self.firstmodule().bosspectokenqrep = self.firstmodule().tokenizer.encode("[", addspecialtokens=False)[0] self.firstmodule().eosspectokenq = self.firstmodule().tokenizer.encode("]", addspecialtokens=False)[0] self.firstmodule().bosspectokendrep = self.firstmodule().tokenizer.encode("{", addspecialtokens=False)[0] self.firstmodule().eosspectokend = self.firstmodule().tokenizer.encode("}", addspecialtokens=False)[0] self.firstmodule().replacebos = True
def encode(self, sentences, **kwargs):
is_query = kwargs.pop("is_query", True)
if is_query:
sentences = "[SOS]" + sentences if isinstance(sentences, str) else ["[SOS]" + sent for sent in sentences]
else:
sentences = "{SOS}" + sentences if isinstance(sentences, str) else ["{SOS}" + sent for sent in sentences]
return super().encode(sentences, **kwargs)
model = SentenceTransformerSpecb("Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit")
queryembeddings = model.encode(queries, isquery=True) docembeddings = model.encode(docs, isquery=False)
Calculate cosine similarities
Cosine similarities are in [-1, 1]. Higher means more similar
cosinesim01 = 1 - cosine(queryembeddings[0], docembeddings[0]) cosinesim02 = 1 - cosine(queryembeddings[0], docembeddings[1]) cosinesim03 = 1 - cosine(queryembeddings[0], doc_embeddings[2])
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[0][:20] + "...", cosinesim01)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[1][:20] + "...", cosinesim02)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[2][:20] + "...", cosinesim0_3)) ```
Original Sentence Transformers
If you want to use the Sentence Transformers at https://github.com/UKPLab/sentence-transformers, you can use the below. Make sure to use the latest version (pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git).
Note that this will produce slightly worse scores than SGPT Sentence Transformers, as the special brackets may get intermingled with other tokens upon tokenization. On SciFact (BEIR) NDCG@10 of the below decreases to 0.566 from 0.569 for SGPT-125M-weightedmean-msmarco-specb-bitfit.
```python from scipy.spatial.distance import cosine from sentence_transformers import SentenceTransformer
queries = [ "I'm searching for a planet not too far from Earth.", ]
docs = [ "Neptune is the eighth and farthest-known Solar planet from the Sun. In the Solar System, it is the fourth-largest planet by diameter, the third-most-massive planet, and the densest giant planet. It is 17 times the mass of Earth, slightly more massive than its near-twin Uranus.", "TRAPPIST-1d, also designated as 2MASS J23062928-0502285 d, is a small exoplanet (about 30% the mass of the earth), which orbits on the inner edge of the habitable zone of the ultracool dwarf star TRAPPIST-1 approximately 40 light-years (12.1 parsecs, or nearly 3.7336×1014 km) away from Earth in the constellation of Aquarius.", "A harsh desert world orbiting twin suns in the galaxy’s Outer Rim, Tatooine is a lawless place ruled by Hutt gangsters. Many settlers scratch out a living on moisture farms, while spaceport cities such as Mos Eisley and Mos Espa serve as home base for smugglers, criminals, and other rogues.", ]
class SentenceTransformerSpecb(SentenceTransformer):
def encode(self, sentences, *kwargs):
isquery = kwargs.pop("isquery", True)
if is_query:
sentences = "[" + sentences + "]" if isinstance(sentences, str) else ["[" + sent + "]" for sent in sentences]
else:
sentences = "{" + sentences + "}" if isinstance(sentences, str) else ["{" + sent + "}" for sent in sentences]
return super().encode(sentences, *kwargs)
model = SentenceTransformerSpecb("Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit")
queryembeddings = model.encode(queries, isquery=True) docembeddings = model.encode(docs, isquery=False)
Calculate cosine similarities
Cosine similarities are in [-1, 1]. Higher means more similar
cosinesim01 = 1 - cosine(queryembeddings[0], docembeddings[0]) cosinesim02 = 1 - cosine(queryembeddings[0], docembeddings[1]) cosinesim03 = 1 - cosine(queryembeddings[0], doc_embeddings[2])
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[0][:20] + "...", cosinesim01)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[1][:20] + "...", cosinesim02)) print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[2][:20] + "...", cosinesim0_3)) ```
Acknowledgements
We thank Constantin Eichenberg and Samuel Weinbach for insightful discussions and valuable feedback throughout the project. We thank Robert Baldock, Marco Bellagente and Koen Oostermeijer for reading drafts of the paper. This work has been supported by OpenAI under the academic access program. This work would not have been possible without: - UKPLab: SBERT, BEIR, USEB - Eleuther AI Models - Huggingface Transformers
Citation
Feel free to cite our paper if SGPT is helpful to you :)
bibtex
@article{muennighoff2022sgpt,
title={SGPT: GPT Sentence Embeddings for Semantic Search},
author={Muennighoff, Niklas},
journal={arXiv preprint arXiv:2202.08904},
year={2022}
}
Owner
- Name: Niklas Muennighoff
- Login: Muennighoff
- Kind: user
- Location: Beijing
- Company: PKU
- Website: muennighoff.github.io
- Twitter: Muennighoff
- Repositories: 17
- Profile: https://github.com/Muennighoff
Citation (CITATION.bib)
@article{muennighoff2022sgpt,
title={SGPT: GPT Sentence Embeddings for Semantic Search},
author={Muennighoff, Niklas},
journal={arXiv preprint arXiv:2202.08904},
year={2022}
}
GitHub Events
Total
- Watch event: 27
- Fork event: 3
Last Year
- Watch event: 27
- Fork event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Muennighoff | n****f@g****m | 54 |
| Akshaj Jain | a****n@g****m | 2 |
| Oracle Public Cloud User | o****c@r****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 83
- Total pull requests: 10
- Average time to close issues: 25 days
- Average time to close pull requests: 2 days
- Total issue authors: 36
- Total pull request authors: 3
- Average comments per issue: 3.28
- Average comments per pull request: 0.8
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- rajarajanvakil (2)
- Kartali-Mohamed (2)
- guotong1988 (2)
- aksj98 (2)
- regstuff (2)
- asenasen123 (2)
- ashokrajab (1)
- ttjjlw (1)
- shafkat-07 (1)
- wing7171 (1)
- ennioferreirab (1)
- hongshanli23 (1)
- cm2435 (1)
- rut00 (1)
- KnutJaegersberg (1)
Pull Request Authors
- Muennighoff (4)
- aksj98 (1)
- TrellixVulnTeam (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- more-itertools ==8.8.0
- tqdm ==4.61.0
- beir ==0.2.3
- more-itertools ==8.8.0
- retry ==0.9.2
- tqdm ==4.61.0
- huggingface-hub *
- nltk *
- numpy *
- scikit-learn *
- scipy *
- sentencepiece *
- tokenizers >=0.10.3
- torch >=1.6.0
- torchvision *
- tqdm *
- transformers >=4.6.0,<5.0.0
- huggingface-hub *
- nltk *
- numpy *
- scikit-learn *
- scipy *
- sentencepiece *
- tokenizers >=0.10.3
- torch >=1.6.0
- torchvision *
- tqdm *
- transformers >=4.6.0,<5.0.0
- pytrec_eval *
- sentence-transformers >=1.2.0
- beir ==0.2.3
- openai ==0.11.4
- pytorch ==1.10.1