rankify

πŸ”₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation πŸ”₯. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.

https://github.com/datascienceuibk/rankify

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • βœ“
    CITATION.cff file
    Found CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • βœ“
    .zenodo.json file
    Found .zenodo.json file
  • β—‹
    DOI references
  • βœ“
    Academic publication links
    Links to: arxiv.org
  • β—‹
    Academic email domains
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

agent ai chatgpt information-retrieval llm nlp question-answering rag ranked-retrieval reranking retrieval retrival-augmented-generation
Last synced: 6 months ago · JSON representation ·

Repository

πŸ”₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation πŸ”₯. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.

Basic Info
Statistics
  • Stars: 499
  • Watchers: 12
  • Forks: 36
  • Open Issues: 1
  • Releases: 7
Topics
agent ai chatgpt information-retrieval llm nlp question-answering rag ranked-retrieval reranking retrieval retrival-augmented-generation
Created about 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Citation

README-PyPI.md

πŸ”₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation πŸ”₯

Static Badge PyPI Downloads GitHub release

A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.

Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field.

✨ Features

  • Comprehensive Retrieval & Reranking Framework: Rankify unifies retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular Python toolkit, enabling seamless experimentation and benchmarking.

  • Extensive Dataset Support: Includes 40 benchmark datasets with pre-retrieved documents, covering diverse domains such as question answering, dialogue, entity linking, and fact verification.

  • Diverse Retriever Integration: Supports 7 retrieval techniques, including BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever, providing flexibility for various retrieval strategies.

  • Advanced Re-ranking Models: Implements 24 primary re-ranking models with 41 sub-methods, covering pointwise, pairwise, and listwise re-ranking approaches for enhanced ranking performance.

  • Prebuilt Retrieval Indices: Provides precomputed Wikipedia and MS MARCO corpora for multiple retrieval models, eliminating indexing overhead and accelerating experiments.

  • Seamless RAG Integration: Bridges retrieval and generative models (e.g., GPT, LLAMA, T5), enabling retrieval-augmented generation with zero-shot, Fusion-in-Decoder (FiD), and in-context learning strategies.

  • Modular & Extensible Design: Easily integrates custom datasets, retrievers, re-rankers, and generation models using Rankify’s structured Python API.

  • Comprehensive Evaluation Suite: Offers automated performance evaluation with retrieval, ranking, and RAG metrics, ensuring reproducible benchmarking.

  • User-Friendly Documentation: Detailed πŸ“– online documentation, example notebooks, and tutorials for easy adoption.

πŸ” Roadmap

Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.

πŸš€ Planned Improvements

  • Retrievers

    • [x] Support for BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever
    • [ ] Add missing retrievers: Spar, MSS, MSS-DPR
    • [ ] Enable custom index loading and support for user-defined retrieval corpora
  • Re-Rankers

    • [x] 24 primary re-ranking models with 41 sub-methods
    • [ ] Expand the list by adding more advanced ranking models
  • Datasets

    • [x] 40 benchmark datasets for retrieval, ranking, and RAG
    • [ ] Add more datasets
    • [ ] Support for custom dataset integration
  • Retrieval-Augmented Generation (RAG)

    • [x] Integration with GPT, LLAMA, and T5
    • [ ] Extend support for more generative models
  • Evaluation & Usability

    • [x] Standard retrieval and ranking evaluation metrics (Top-K, EM, Recall, ...)
    • [ ] Add advanced evaluation metrics (NDCG, MAP for retriever )
  • Pipeline Integration

    • [ ] Add a pipeline module for streamlined retrieval, re-ranking, and RAG workflows

πŸ”§ Installation

Set up the virtual environment

First, create and activate a conda environment with Python 3.10:

bash conda create -n rankify python=3.10 conda activate rankify

Install PyTorch 2.5.1

we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.

If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.

To install Pytorch 2.5.1 you can install it from the following cmd bash pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Basic Installation

To install Rankify, simply use pip (requires Python 3.10+):
base pip install rankify

Or, to install from GitHub for the latest development version:

```bash git clone https://github.com/DataScienceUIBK/rankify.git cd rankify pip install -e .

For full functionality we recommend installing Rankify with all dependencies:

pip install -e ".[all]"

Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)

pip install -e ".[retriever]"

Install dependencies for base re-ranking only (excluding vLLM)

pip install -e ".[base]"

Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.

pip install -e ".[reranking]"

Install dependencies for retrieval-augmented generation (RAG)

pip install -e ".[rag]" ``` This will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).

Recommended Installation

For full functionality, we recommend installing Rankify with all dependencies: bash pip install "rankify[all]" This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.

Optional Dependencies

If you prefer to install only specific components, choose from the following: ```bash

Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)

pip install "rankify[retriever]"

Install dependencies for base re-ranking only (excluding vLLM)

pip install "rankify[base]"

Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.

pip install "rankify[reranking]"

Install dependencies for retrieval-augmented generation (RAG)

pip install "rankify[rag]" ```

Using ColBERT Retriever

If you want to use ColBERT Retriever, follow these additional setup steps: ```bash

Install GCC and required libraries

conda install -c conda-forge gcc=9.4.0 gxx=9.4.0 conda install -c conda-forge libstdcxx-ng bash

Export necessary environment variables

export LDLIBRARYPATH=$CONDAPREFIX/lib:$LDLIBRARYPATH export CC=gcc export CXX=g++ export PATH=$CONDAPREFIX/bin:$PATH

Clear cached torch extensions

rm -rf ~/.cache/torch_extensions/* ```


πŸš€ Quick Start

1️⃣. Pre-retrieved Datasets

We provide 1,000 pre-retrieved documents per dataset, which you can download from:

πŸ”— Hugging Face Dataset Repository

Dataset Format

The pre-retrieved documents are structured as follows: json [ { "question": "...", "answers": ["...", "...", ...], "ctxs": [ { "id": "...", // Passage ID from database TSV file "score": "...", // Retriever score "has_answer": true|false // Whether the passage contains the answer } ] } ]

Access Datasets in Rankify

You can easily download and use pre-retrieved datasets through Rankify.

List Available Datasets

To see all available datasets: ```python from rankify.dataset.dataset import Dataset

Display available datasets

Dataset.avaiable_dataset() ```

BM25 Retriever ```python from rankify.dataset.dataset import Dataset

Download BM25-retrieved documents for nq-dev

dataset = Dataset(retriever="bm25", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="bm25", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="bm25", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for archivialqa-test

dataset = Dataset(retriever="bm25", datasetname="archivialqa-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for chroniclingamericaqa-test

dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for chroniclingamericaqa-dev

dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for entityquestions-test

dataset = Dataset(retriever="bm25", datasetname="entityquestions-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for ambig_qa-dev

dataset = Dataset(retriever="bm25", datasetname="ambigqa-dev", ndocs=100) documents = dataset.download(forcedownload=False)

Download BM25-retrieved documents for ambig_qa-train

dataset = Dataset(retriever="bm25", datasetname="ambigqa-train", ndocs=100) documents = dataset.download(forcedownload=False)

Download BM25-retrieved documents for arc-test

dataset = Dataset(retriever="bm25", datasetname="arc-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for arc-dev

dataset = Dataset(retriever="bm25", datasetname="arc-dev", ndocs=100) documents = dataset.download(force_download=False) ```

BGE Retriever ```python from rankify.dataset.dataset import Dataset

Download BGE-retrieved documents for nq-dev

dataset = Dataset(retriever="bge", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BGE-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="bge", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download BGE-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="bge", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

ColBERT Retriever

```python from rankify.dataset.dataset import Dataset

Download ColBERT-retrieved documents for nq-dev

dataset = Dataset(retriever="colbert", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download ColBERT-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="colbert", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download ColBERT-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="colbert", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

MSS-DPR Retriever

```python from rankify.dataset.dataset import Dataset

Download MSS-DPR-retrieved documents for nq-dev

dataset = Dataset(retriever="mss-dpr", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="mss-dpr", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-DPR-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="mss-dpr", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

MSS Retriever

```python from rankify.dataset.dataset import Dataset

Download MSS-retrieved documents for nq-dev

dataset = Dataset(retriever="mss", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="mss", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="mss", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

Contriever Retriever

```python from rankify.dataset.dataset import Dataset

Download MSS-retrieved documents for nq-dev

dataset = Dataset(retriever="contriever", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="contriever", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="contriever", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

ANCE Retriever

```python from rankify.dataset.dataset import Dataset

Download ANCE-retrieved documents for nq-dev

dataset = Dataset(retriever="ance", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download ANCE-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="ance", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download ANCE-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="ance", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

Load Pre-retrieved Dataset from File

If you have already downloaded a dataset, you can load it directly: ```python from rankify.dataset.dataset import Dataset

Load pre-downloaded BM25 dataset for WebQuestions

documents = Dataset.loaddataset('./tests/out-datasets/bm25/webquestions/test.json', 100) ``` Now, you can integrate retrieved documents with re-ranking and RAG workflows! πŸš€


2️⃣. Running Retrieval

To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, and BGE.

Example: Running Retrieval on Sample Queries
```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.retrievers.retriever import Retriever

Sample Documents

documents = [ Document(question=Question("the cast of a good day to die hard?"), answers=Answer([ "Jai Courtney", "Sebastian Koch", "Radivoje Bukvić", "Yuliya Snigir", "Sergei Kolesnikov", "Mary Elizabeth Winstead", "Bruce Willis" ]), contexts=[]), Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[]) ] ```

```python

BM25 retrieval on Wikipedia

bm25retrieverwiki = Retriever(method="bm25", ndocs=5, indextype="wiki")

BM25 retrieval on MS MARCO

bm25retrievermsmacro = Retriever(method="bm25", ndocs=5, indextype="msmarco")

DPR (multi-encoder) retrieval on Wikipedia

dprretrieverwiki = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="wiki")

DPR (multi-encoder) retrieval on MS MARCO

dprretrievermsmacro = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="msmarco")

DPR (single-encoder) retrieval on Wikipedia

dprretrieverwiki = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="wiki")

DPR (single-encoder) retrieval on MS MARCO

dprretrievermsmacro = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="msmarco")

ANCE retrieval on Wikipedia

anceretrieverwiki = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="wiki")

ANCE retrieval on MS MARCO

anceretrievermsmacro = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="msmarco")

Contriever retrieval on Wikipedia

contrieverretrieverwiki = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="wiki")

Contriever retrieval on MS MARCO

contrieverretrievermsmacro = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="msmarco")

ColBERT retrieval on Wikipedia

colbertretrieverwiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="wiki")

ColBERT retrieval on MS MARCO

colbertretrievermsmacro = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="msmarco")

BGE retrieval on Wikipedia

bgeretrieverwiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="wiki")

BGE retrieval on MS MARCO

bgeretrievermsmacro = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="msmarco") ```

Running Retrieval

After defining the retriever, you can retrieve documents using: ```python retrieveddocuments = bm25retriever_wiki.retrieve(documents)

for i, doc in enumerate(retrieved_documents): print(f"\nDocument {i+1}:") print(doc) ```


3️⃣. Running Reranking

Rankify provides support for multiple reranking models. Below are examples of how to use each model.

** Example: Reranking a Document**
```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.models.reranking import Reranking

Sample document setup

question = Question("When did Thomas Edison invent the light bulb?") answers = Answer(["1879"]) contexts = [ Context(text="Lightning strike at Seoul National University", id=1), Context(text="Thomas Edison tried to invent a device for cars but failed", id=2), Context(text="Coffee is good for diet", id=3), Context(text="Thomas Edison invented the light bulb in 1879", id=4), Context(text="Thomas Edison worked with electricity", id=5), ] document = Document(question=question, answers=answers, contexts=contexts)

Initialize the reranker

reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")

Apply reranking

reranker.rank([document])

Print reordered contexts

for context in document.reorder_contexts: print(f" - {context.text}") ```

Examples of Using Different Reranking Models
```python

UPR

model = Reranking(method='upr', model_name='t5-base')

API-Based Rerankers

model = Reranking(method='apiranker', modelname='voyage', apikey='your-api-key') model = Reranking(method='apiranker', modelname='jina', apikey='your-api-key') model = Reranking(method='apiranker', modelname='mixedbread.ai', apikey='your-api-key')

Blender Reranker

model = Reranking(method='blenderreranker', modelname='PairRM')

ColBERT Reranker

model = Reranking(method='colbertranker', modelname='Colbert')

EchoRank

model = Reranking(method='echorank', model_name='flan-t5-large')

First Ranker

model = Reranking(method='firstranker', modelname='base')

FlashRank

model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')

InContext Reranker

Reranking(method='incontextreranker', modelname='llamav3.1-8b')

InRanker

model = Reranking(method='inranker', model_name='inranker-small')

ListT5

model = Reranking(method='listt5', model_name='listt5-base')

LiT5 Distill

model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')

LiT5 Score

model = Reranking(method='lit5score', model_name='LiT5-Distill-base')

LLM Layerwise Ranker

model = Reranking(method='llmlayerwiseranker', model_name='bge-multilingual-gemma2')

LLM2Vec

model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')

MonoBERT

model = Reranking(method='monobert', model_name='monobert-large')

MonoT5

Reranking(method='monot5', model_name='monot5-base-msmarco')

RankGPT

model = Reranking(method='rankgpt', model_name='llamav3.1-8b')

RankGPT API

model = Reranking(method='rankgpt-api', modelname='gpt-3.5', apikey="gpt-api-key") model = Reranking(method='rankgpt-api', modelname='gpt-4', apikey="gpt-api-key") model = Reranking(method='rankgpt-api', modelname='llamav3.1-8b', apikey="together-api-key") model = Reranking(method='rankgpt-api', modelname='claude-3-5', apikey="claude-api-key")

RankT5

model = Reranking(method='rankt5', model_name='rankt5-base')

Sentence Transformer Reranker

model = Reranking(method='sentencetransformerreranker', modelname='all-MiniLM-L6-v2') model = Reranking(method='sentencetransformerreranker', modelname='gtr-t5-base') model = Reranking(method='sentencetransformerreranker', modelname='sentence-t5-base') model = Reranking(method='sentencetransformerreranker', modelname='distilbert-multilingual-nli-stsb-quora-ranking') model = Reranking(method='sentencetransformerreranker', model_name='msmarco-bert-co-condensor')

SPLADE

model = Reranking(method='splade', model_name='splade-cocondenser')

Transformer Ranker

model = Reranking(method='transformerranker', modelname='mxbai-rerank-xsmall') model = Reranking(method='transformerranker', modelname='bge-reranker-base') model = Reranking(method='transformerranker', modelname='bce-reranker-base') model = Reranking(method='transformerranker', modelname='jina-reranker-tiny') model = Reranking(method='transformerranker', modelname='gte-multilingual-reranker-base') model = Reranking(method='transformerranker', modelname='nli-deberta-v3-large') model = Reranking(method='transformerranker', modelname='ms-marco-TinyBERT-L-6') model = Reranking(method='transformerranker', modelname='msmarco-MiniLM-L12-en-de-v1')

TwoLAR

model = Reranking(method='twolar', model_name='twolar-xl')

Vicuna Reranker

model = Reranking(method='vicunareranker', modelname='rankvicuna7b_v1')

Zephyr Reranker

model = Reranking(method='zephyrreranker', modelname='rankzephyr7bv1full')

```

4️⃣. Using Generator Module

Rankify provides a Generator Module to facilitate retrieval-augmented generation (RAG) by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.

```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.generator.generator import Generator

Define question and answer

question = Question("What is the capital of France?") answers = Answer(["Paris"]) contexts = [ Context(id=1, title="France", text="The capital of France is Paris.", score=0.9), Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5) ]

Construct document

doc = Document(question=question, answers=answers, contexts=contexts)

Initialize Generator (e.g., Meta Llama)

generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

Generate answer

generatedanswers = generator.generate([doc]) print(generatedanswers) # Output: ["Paris"] ```


5️⃣ Evaluating with Metrics

Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.

Evaluating Generated Answers

You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers. ```python from rankify.metrics.metrics import Metrics from rankify.dataset.dataset import Dataset

Load dataset

dataset = Dataset('bm25', 'nq-test', 100) documents = dataset.download(force_download=False)

Initialize Generator

generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

Generate answers

generated_answers = generator.generate(documents)

Evaluate generated answers

metrics = Metrics(documents) print(metrics.calculategenerationmetrics(generated_answers)) ```

Evaluating Retrieval Performance

```python

Calculate retrieval metrics before reranking

metrics = Metrics(documents) beforerankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)

print(beforerankingmetrics) ```

Evaluating Reranked Results
```python

Calculate retrieval metrics after reranking

afterrankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], usereordered=True) print(afterranking_metrics) ```

πŸ“œ Supported Models

1️⃣ Retrievers

  • βœ… BM25
  • βœ… DPR
  • βœ… ColBERT
  • βœ… ANCE
  • βœ… BGE
  • βœ… Contriever

- βœ… BPR

2️⃣ Rerankers

  • βœ… Cross-Encoders
  • βœ… RankGPT
  • βœ… RankGPT-API
  • βœ… MonoT5
  • βœ… MonoBert
  • βœ… RankT5
  • βœ… ListT5
  • βœ… LiT5Score
  • βœ… LiT5Dist
  • βœ… Vicuna Reranker
  • βœ… Zephyr Reranker
  • βœ… Sentence Transformer-based
  • βœ… FlashRank Models
  • βœ… API-Based Rerankers
  • βœ… ColBERT Reranker
  • βœ… LLM Layerwise Ranker
  • βœ… Splade Reranker
  • βœ… ColBERT Reranker
  • βœ… UPR Reranker
  • βœ… Inranker Reranker
  • βœ… Transformer Reranker
  • βœ… FIRST Reranker
  • βœ… Blender Reranker
  • βœ… LLM2VEC Reranker
  • βœ… ECHO Reranker

- βœ… Incontext Reranker

3️⃣ Generators

  • βœ… Fusion-in-Decoder (FiD) with T5

- βœ… In-Context Learning RLAM

πŸ“– Documentation

For full API documentation, visit the Rankify Docs.


πŸ’‘ Contributing

Follow these steps to get involved:

  1. Fork this repository to your GitHub account.

  2. Create a new branch for your feature or fix:

bash git checkout -b feature/YourFeatureName

  1. Make your changes and commit them:

bash git commit -m "Add YourFeatureName"

  1. Push the changes to your branch:

bash git push origin feature/YourFeatureName

  1. Submit a Pull Request to propose your changes.

Thank you for helping make this project better!


πŸ”– License

Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.

🌟 Citation

Please kindly cite our paper if helps your research:

BibTex @article{abdallah2025rankify, title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation}, author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam}, journal={arXiv preprint arXiv:2502.02464}, year={2025} }

Owner

  • Name: DataScienceUIBK
  • Login: DataScienceUIBK
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2025-02
message: "If you use this software, please cite it as below."
authors:
- family-names: "Abdallah"
  given-names: "Abdelrahman"
- family-names: "Mozafari"
  given-names: "Jamshid"
- family-names: "Piryani"
  given-names: "Bhawna"
- family-names: "Ali"
  given-names: "Mohammed"
- family-names: "Jatowt"
  given-names: "Adam"
title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
url: "https://arxiv.org/abs/2502.02464"
preferred-citation:
  type: article
  authors:
    - family-names: "Abdallah"
      given-names: "Abdelrahman"
    - family-names: "Mozafari"
      given-names: "Jamshid"
    - family-names: "Piryani"
      given-names: "Bhawna"
    - family-names: "Ali"
      given-names: "Mohammed"
    - family-names: "Jatowt"
      given-names: "Adam"
  title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
  journal: "CoRR"
  volume: "abs/2502.02464"
  year: 2025
  url: "https://arxiv.org/abs/2502.02464"
  eprinttype: "arXiv"
  eprint: "2502.02464"

GitHub Events

Total
  • Create event: 11
  • Issues event: 9
  • Release event: 8
  • Watch event: 474
  • Issue comment event: 5
  • Member event: 4
  • Push event: 155
  • Pull request event: 45
  • Fork event: 33
Last Year
  • Create event: 11
  • Issues event: 9
  • Release event: 8
  • Watch event: 474
  • Issue comment event: 5
  • Member event: 4
  • Push event: 155
  • Pull request event: 45
  • Fork event: 33

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 25
  • Average time to close issues: about 1 month
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 4
  • Total pull request authors: 7
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.0
  • Merged pull requests: 21
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 25
  • Average time to close issues: about 1 month
  • Average time to close pull requests: about 1 hour
  • Issue authors: 4
  • Pull request authors: 7
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.0
  • Merged pull requests: 21
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Ahmedg2021 (1)
  • vcchain2019 (1)
  • qiyuxinlin (1)
  • xhuiyan (1)
Pull Request Authors
  • abdoelsayed2016 (10)
  • aherzinger (6)
  • baraayusry (4)
  • MohammedAli9330 (1)
  • eltociear (1)
  • MahmoudElsayedMahmoud (1)
  • tobias124 (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 129 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
pypi.org: rankify

A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 129 Last month
Rankings
Dependent packages count: 9.6%
Average: 32.0%
Dependent repos count: 54.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • mkdocs *
  • mkdocs-material *
  • mkdocs-rtd-dropdown *
  • mkdocstrings *
.github/workflows/python-publish.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
pyproject.toml pypi
  • datasets ==3.2.0
  • httpx ==0.27.2
  • pandas ==2.2.3
  • prettytable ==3.11.0
  • requests ==2.32.3
  • tqdm ==4.66.5
  • transformers ==4.45.2
rankify/requirements.txt pypi
  • anthropic ==0.37.1
  • dacite ==1.8.1
  • faiss-cpu ==1.9.0.post1
  • flash-attn ==2.5.0
  • fschat >=0.2.36
  • ftfy ==6.3.1
  • h5py ==3.12.1
  • litellm ==1.50.4
  • llama-cpp-python ==0.2.76
  • llm-blender ==0.0.2
  • ninja ==1.11.1.3
  • omegaconf ==2.3.0
  • onnxruntime ==1.19.2
  • openai ==1.52.2
  • pandas ==2.2.3
  • prettytable ==3.11.0
  • py7zr ==0.22.0
  • pyserini ==0.43.0
  • requests ==2.32.3
  • sentence_transformers ==3.3.0
  • sentencepiece ==0.2.0
  • together ==1.3.3
  • torch ==2.5.0
  • tqdm ==4.66.5
  • transformers ==4.45.2
  • ujson ==5.10.0
  • vllm ==0.6.3