pyterrier_genrank

Generative Reranker PyTerrier

https://github.com/emory-irlab/pyterrier_genrank

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization emory-irlab has institutional domain (ir.mathcs.emory.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Generative Reranker PyTerrier

Basic Info
  • Host: GitHub
  • Owner: emory-irlab
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 206 KB
Statistics
  • Stars: 14
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Version Python version License: Apache

The PyTerrier🐕 Plugin for listwise, pointwise and reasoning based (long CoT) generative rerankers like RankGPT, RankVicuna, RankZephyr, RankLLama. A PyTerrier wrapper over the implementation available at RankLLM, Rank1.

Installation

bash pip install --upgrade git+https://github.com/emory-irlab/pyterrier_genrank.git

Example Usage

Since this implementation uses listwise reranking, it is used a bit differently than other rerankers.

```python import pyterrier as pt

from rerank import LLMReRanker

dataset = pt.get_dataset("irds:vaswani")

bm25 = pt.terrier.Retriever.fromdataset("vaswani", "terrierstemmed", wmodel="BM25") llmreranker = LLMReRanker("castorini/rankvicuna7bv1")

genrankpipeline = bm25 % 100 >> pt.text.gettext(index, 'text') >> llm_reranker

genrank_pipeline.search('best places to have Indian food') ```

If you want to use RankGPT, ensure that you have your api key set in an environment file. Then load the reranker with the OpenAI model string. python llm_reranker = LLMReRanker("gpt-35-turbo-1106", use_azure_openai=True)

We recently added functionality for pointwise reranker RankLLama and reasoning based rerankers Rank1 too: python from rerank import PointwiseReranker llm_reranker = PointwiseReranker('castorini/rankllama-v1-7b-lora-passage')

python from rerank import Rank1Reranker llm_reranker = Rank1Reranker("jhu-clsp/rank1-7b")

The LLMReRanker function can take any 🤗HuggingFace model id. It has been tested using the following two reranking models for TREC-DL 2019:

| Model | nDCG@10 | |-----------------------------------------|---------| | BM25 | .48 | | BM25 + rankvicuna7bv1 | .67 | | BM25 + rankzephyr7bv1_full | .71 | | BM25 + gpt-35-turbo-1106 | .66 | | BM25 + gpt-4-turbo-0409 | .71 | | BM25 + gpt-4o-mini | .71 | | BM25 + Llama-Spark (8B zero-shot) | .61 |

Read the paper for detailed results here.

The reranker interface takes additional parameters that could be modified.

python llm_reranker = LLMReRanker(model_path="castorini/rank_vicuna_7b_v1", num_few_shot_examples=0, top_k_candidates=100, window_size=20, shuffle_candidates=False, print_prompts_responses=False, step_size=10, variable_passages=True, system_message='You are RankLLM, an intelligent assistant that can rank passages based on their relevancy to the query.', prefix_instruction_fn=lambda num, query: f"I will provide you with {num} passages, each indicated by number identifier []. \nRank the passages based on their relevance to query: {query}.", suffix_instruction_fn=lambda num, query: f"Search Query: {query}. \nRank the {num} passages above. You should rank them based on their relevance to the search query. The passages should be listed in descending order using identifiers. The most relevant passages should be listed first. The output format should be [] > [], e.g., [1] > [2]. Only response the ranking results, do not say any word or explain.", prompt_mode: PromptMode = PromptMode.RANK_GPT, context_size: int = 4096, num_gpus = 1, text_key = 'text', use_azure_openai = False)

Reference

bibtex @software{Dhole_PyTerrier_Genrank, author = {Dhole, Kaustubh}, license = {Apache-2.0}, institution = {Emory University}, title = {{PyTerrier-GenRank: The PyTerrier Plugin for Reranking with Large Language Models}}, url = {https://github.com/emory-irlab/pyterrier_genrank}, year = {2024} }

Owner

  • Name: Emory Intelligent Information Access Lab (IR Lab)
  • Login: emory-irlab
  • Kind: organization
  • Email: emory.irlab@gmail.com

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  PyTerrier_Genrank: The PyTerrier Plugin for generative
  rerankers
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Kaustubh
    family-names: Dhole
    email: kdhole@emory.edu
    affiliation: Emory University
    orcid: 'https://orcid.org/0009-0006-6907-2530'
repository-code: 'https://github.com/emory-irlab/pyterrier_genrank'
url: 'https://github.com/emory-irlab/pyterrier_genrank'
abstract: >
  The PyTerrier Plugin for Generative Rerankers like
  RankVicuna and RankZephyr.
keywords:
  - large language models
  - information retrieval
  - generative reranker
license: Apache-2.0

GitHub Events

Total
  • Watch event: 10
  • Delete event: 1
  • Push event: 7
  • Pull request event: 4
  • Create event: 3
Last Year
  • Watch event: 10
  • Delete event: 1
  • Push event: 7
  • Pull request event: 4
  • Create event: 3

Dependencies

requirements.txt pypi
  • accelerate *
  • dacite >=1.8.1
  • fschat >=0.2.36
  • ftfy >=6.2.0
  • protobuf *
  • python-terrier *
  • sentencepiece *
  • torch *
  • transformers *
setup.py pypi