Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
✓Institutional organization owner
Organization emory-irlab has institutional domain (ir.mathcs.emory.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
Generative Reranker PyTerrier
Basic Info
- Host: GitHub
- Owner: emory-irlab
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 206 KB
Statistics
- Stars: 14
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
The PyTerrier🐕 Plugin for listwise, pointwise and reasoning based (long CoT) generative rerankers like RankGPT, RankVicuna, RankZephyr, RankLLama. A PyTerrier wrapper over the implementation available at RankLLM, Rank1.
Installation
bash
pip install --upgrade git+https://github.com/emory-irlab/pyterrier_genrank.git
Example Usage
Since this implementation uses listwise reranking, it is used a bit differently than other rerankers.
```python import pyterrier as pt
from rerank import LLMReRanker
dataset = pt.get_dataset("irds:vaswani")
bm25 = pt.terrier.Retriever.fromdataset("vaswani", "terrierstemmed", wmodel="BM25") llmreranker = LLMReRanker("castorini/rankvicuna7bv1")
genrankpipeline = bm25 % 100 >> pt.text.gettext(index, 'text') >> llm_reranker
genrank_pipeline.search('best places to have Indian food') ```
If you want to use RankGPT, ensure that you have your api key set in an environment file. Then load the reranker with the OpenAI model string.
python
llm_reranker = LLMReRanker("gpt-35-turbo-1106", use_azure_openai=True)
We recently added functionality for pointwise reranker RankLLama and reasoning based rerankers Rank1 too:
python
from rerank import PointwiseReranker
llm_reranker = PointwiseReranker('castorini/rankllama-v1-7b-lora-passage')
python
from rerank import Rank1Reranker
llm_reranker = Rank1Reranker("jhu-clsp/rank1-7b")
The LLMReRanker function can take any 🤗HuggingFace model id. It has been tested using the following two reranking models for TREC-DL 2019:
| Model | nDCG@10 | |-----------------------------------------|---------| | BM25 | .48 | | BM25 + rankvicuna7bv1 | .67 | | BM25 + rankzephyr7bv1_full | .71 | | BM25 + gpt-35-turbo-1106 | .66 | | BM25 + gpt-4-turbo-0409 | .71 | | BM25 + gpt-4o-mini | .71 | | BM25 + Llama-Spark (8B zero-shot) | .61 |
Read the paper for detailed results here.
The reranker interface takes additional parameters that could be modified.
python
llm_reranker = LLMReRanker(model_path="castorini/rank_vicuna_7b_v1",
num_few_shot_examples=0,
top_k_candidates=100,
window_size=20,
shuffle_candidates=False,
print_prompts_responses=False, step_size=10, variable_passages=True,
system_message='You are RankLLM, an intelligent assistant that can rank passages based on their relevancy to the query.',
prefix_instruction_fn=lambda num, query: f"I will provide you with {num} passages, each indicated by number identifier []. \nRank the passages based on their relevance to query: {query}.",
suffix_instruction_fn=lambda num, query: f"Search Query: {query}. \nRank the {num} passages above. You should rank them based on their relevance to the search query. The passages should be listed in descending order using identifiers. The most relevant passages should be listed first. The output format should be [] > [], e.g., [1] > [2]. Only response the ranking results, do not say any word or explain.",
prompt_mode: PromptMode = PromptMode.RANK_GPT,
context_size: int = 4096,
num_gpus = 1,
text_key = 'text',
use_azure_openai = False)
Reference
bibtex
@software{Dhole_PyTerrier_Genrank,
author = {Dhole, Kaustubh},
license = {Apache-2.0},
institution = {Emory University},
title = {{PyTerrier-GenRank: The PyTerrier Plugin for Reranking with Large Language Models}},
url = {https://github.com/emory-irlab/pyterrier_genrank},
year = {2024}
}
Owner
- Name: Emory Intelligent Information Access Lab (IR Lab)
- Login: emory-irlab
- Kind: organization
- Email: emory.irlab@gmail.com
- Website: http://ir.mathcs.emory.edu/
- Repositories: 24
- Profile: https://github.com/emory-irlab
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
PyTerrier_Genrank: The PyTerrier Plugin for generative
rerankers
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Kaustubh
family-names: Dhole
email: kdhole@emory.edu
affiliation: Emory University
orcid: 'https://orcid.org/0009-0006-6907-2530'
repository-code: 'https://github.com/emory-irlab/pyterrier_genrank'
url: 'https://github.com/emory-irlab/pyterrier_genrank'
abstract: >
The PyTerrier Plugin for Generative Rerankers like
RankVicuna and RankZephyr.
keywords:
- large language models
- information retrieval
- generative reranker
license: Apache-2.0
GitHub Events
Total
- Watch event: 10
- Delete event: 1
- Push event: 7
- Pull request event: 4
- Create event: 3
Last Year
- Watch event: 10
- Delete event: 1
- Push event: 7
- Pull request event: 4
- Create event: 3
Dependencies
- accelerate *
- dacite >=1.8.1
- fschat >=0.2.36
- ftfy >=6.2.0
- protobuf *
- python-terrier *
- sentencepiece *
- torch *
- transformers *