pyterrier_genrank

Generative Reranker PyTerrier

https://github.com/emory-irlab/pyterrier_genrank

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
✓
Institutional organization owner
Organization emory-irlab has institutional domain (ir.mathcs.emory.edu)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Generative Reranker PyTerrier

Basic Info

Host: GitHub
Owner: emory-irlab
License: apache-2.0
Language: Python
Default Branch: main
Size: 206 KB

Statistics

Stars: 14
Watchers: 4
Forks: 1
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

The PyTerrier🐕 Plugin for listwise, pointwise and reasoning based (long CoT) generative rerankers like RankGPT, RankVicuna, RankZephyr, RankLLama. A PyTerrier wrapper over the implementation available at RankLLM, Rank1.

Installation

bash pip install --upgrade git+https://github.com/emory-irlab/pyterrier_genrank.git

Example Usage

Since this implementation uses listwise reranking, it is used a bit differently than other rerankers.

```python import pyterrier as pt

from rerank import LLMReRanker

dataset = pt.get_dataset("irds:vaswani")

bm25 = pt.terrier.Retriever.fromdataset("vaswani", "terrierstemmed", wmodel="BM25") llmreranker = LLMReRanker("castorini/rankvicuna7bv1")

genrankpipeline = bm25 % 100 >> pt.text.gettext(index, 'text') >> llm_reranker

genrank_pipeline.search('best places to have Indian food') ```

If you want to use RankGPT, ensure that you have your api key set in an environment file. Then load the reranker with the OpenAI model string. python llm_reranker = LLMReRanker("gpt-35-turbo-1106", use_azure_openai=True)

We recently added functionality for pointwise reranker RankLLama and reasoning based rerankers Rank1 too: python from rerank import PointwiseReranker llm_reranker = PointwiseReranker('castorini/rankllama-v1-7b-lora-passage')

python from rerank import Rank1Reranker llm_reranker = Rank1Reranker("jhu-clsp/rank1-7b")

The LLMReRanker function can take any 🤗HuggingFace model id. It has been tested using the following two reranking models for TREC-DL 2019:

| Model | nDCG@10 | |-----------------------------------------|---------| | BM25 | .48 | | BM25 + rankvicuna7bv1 | .67 | | BM25 + rankzephyr7bv1_full | .71 | | BM25 + gpt-35-turbo-1106 | .66 | | BM25 + gpt-4-turbo-0409 | .71 | | BM25 + gpt-4o-mini | .71 | | BM25 + Llama-Spark (8B zero-shot) | .61 |

Read the paper for detailed results here.

The reranker interface takes additional parameters that could be modified.

python llm_reranker = LLMReRanker(model_path="castorini/rank_vicuna_7b_v1", num_few_shot_examples=0, top_k_candidates=100, window_size=20, shuffle_candidates=False, print_prompts_responses=False, step_size=10, variable_passages=True, system_message='You are RankLLM, an intelligent assistant that can rank passages based on their relevancy to the query.', prefix_instruction_fn=lambda num, query: f"I will provide you with {num} passages, each indicated by number identifier []. \nRank the passages based on their relevance to query: {query}.", suffix_instruction_fn=lambda num, query: f"Search Query: {query}. \nRank the {num} passages above. You should rank them based on their relevance to the search query. The passages should be listed in descending order using identifiers. The most relevant passages should be listed first. The output format should be [] > [], e.g., [1] > [2]. Only response the ranking results, do not say any word or explain.", prompt_mode: PromptMode = PromptMode.RANK_GPT, context_size: int = 4096, num_gpus = 1, text_key = 'text', use_azure_openai = False)

Reference

bibtex @software{Dhole_PyTerrier_Genrank, author = {Dhole, Kaustubh}, license = {Apache-2.0}, institution = {Emory University}, title = {{PyTerrier-GenRank: The PyTerrier Plugin for Reranking with Large Language Models}}, url = {https://github.com/emory-irlab/pyterrier_genrank}, year = {2024} }

Owner

Name: Emory Intelligent Information Access Lab (IR Lab)
Login: emory-irlab
Kind: organization
Email: emory.irlab@gmail.com

Website: http://ir.mathcs.emory.edu/
Repositories: 24
Profile: https://github.com/emory-irlab

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  PyTerrier_Genrank: The PyTerrier Plugin for generative
  rerankers
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Kaustubh
    family-names: Dhole
    email: kdhole@emory.edu
    affiliation: Emory University
    orcid: 'https://orcid.org/0009-0006-6907-2530'
repository-code: 'https://github.com/emory-irlab/pyterrier_genrank'
url: 'https://github.com/emory-irlab/pyterrier_genrank'
abstract: >
  The PyTerrier Plugin for Generative Rerankers like
  RankVicuna and RankZephyr.
keywords:
  - large language models
  - information retrieval
  - generative reranker
license: Apache-2.0

GitHub Events

Total

Watch event: 10
Delete event: 1
Push event: 7
Pull request event: 4
Create event: 3

Last Year

Watch event: 10
Delete event: 1
Push event: 7
Pull request event: 4
Create event: 3

Dependencies

requirements.txt pypi

accelerate *
dacite >=1.8.1
fschat >=0.2.36
ftfy >=6.2.0
protobuf *
python-terrier *
sentencepiece *
torch *
transformers *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science