rankify
π₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation π₯. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
Found CITATION.cff file -
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
Found .zenodo.json file -
βDOI references
-
βAcademic publication links
Links to: arxiv.org -
βAcademic email domains
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Keywords
Repository
π₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation π₯. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.
Basic Info
- Host: GitHub
- Owner: DataScienceUIBK
- Language: Python
- Default Branch: main
- Homepage: https://rankify.readthedocs.io/
- Size: 28.3 MB
Statistics
- Stars: 499
- Watchers: 12
- Forks: 36
- Open Issues: 1
- Releases: 7
Topics
Metadata Files
README-PyPI.md
π₯ Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation π₯
A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.
Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field.
β¨ Features
Comprehensive Retrieval & Reranking Framework: Rankify unifies retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular Python toolkit, enabling seamless experimentation and benchmarking.
Extensive Dataset Support: Includes 40 benchmark datasets with pre-retrieved documents, covering diverse domains such as question answering, dialogue, entity linking, and fact verification.
Diverse Retriever Integration: Supports 7 retrieval techniques, including BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever, providing flexibility for various retrieval strategies.
Advanced Re-ranking Models: Implements 24 primary re-ranking models with 41 sub-methods, covering pointwise, pairwise, and listwise re-ranking approaches for enhanced ranking performance.
Prebuilt Retrieval Indices: Provides precomputed Wikipedia and MS MARCO corpora for multiple retrieval models, eliminating indexing overhead and accelerating experiments.
Seamless RAG Integration: Bridges retrieval and generative models (e.g., GPT, LLAMA, T5), enabling retrieval-augmented generation with zero-shot, Fusion-in-Decoder (FiD), and in-context learning strategies.
Modular & Extensible Design: Easily integrates custom datasets, retrievers, re-rankers, and generation models using Rankifyβs structured Python API.
Comprehensive Evaluation Suite: Offers automated performance evaluation with retrieval, ranking, and RAG metrics, ensuring reproducible benchmarking.
User-Friendly Documentation: Detailed π online documentation, example notebooks, and tutorials for easy adoption.
π Roadmap
Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.
π Planned Improvements
Retrievers
- [x] Support for BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever
- [ ] Add missing retrievers: Spar, MSS, MSS-DPR
- [ ] Enable custom index loading and support for user-defined retrieval corpora
Re-Rankers
- [x] 24 primary re-ranking models with 41 sub-methods
- [ ] Expand the list by adding more advanced ranking models
Datasets
- [x] 40 benchmark datasets for retrieval, ranking, and RAG
- [ ] Add more datasets
- [ ] Support for custom dataset integration
Retrieval-Augmented Generation (RAG)
- [x] Integration with GPT, LLAMA, and T5
- [ ] Extend support for more generative models
Evaluation & Usability
- [x] Standard retrieval and ranking evaluation metrics (Top-K, EM, Recall, ...)
- [ ] Add advanced evaluation metrics (NDCG, MAP for retriever )
Pipeline Integration
- [ ] Add a pipeline module for streamlined retrieval, re-ranking, and RAG workflows
π§ Installation
Set up the virtual environment
First, create and activate a conda environment with Python 3.10:
bash
conda create -n rankify python=3.10
conda activate rankify
Install PyTorch 2.5.1
we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.
If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.
To install Pytorch 2.5.1 you can install it from the following cmd
bash
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
Basic Installation
To install Rankify, simply use pip (requires Python 3.10+):
base
pip install rankify
Or, to install from GitHub for the latest development version:
```bash
git clone https://github.com/DataScienceUIBK/rankify.git
cd rankify
pip install -e .
For full functionality we recommend installing Rankify with all dependencies:
pip install -e ".[all]"
Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install -e ".[retriever]"
Install dependencies for base re-ranking only (excluding vLLM)
pip install -e ".[base]"
Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.
pip install -e ".[reranking]"
Install dependencies for retrieval-augmented generation (RAG)
pip install -e ".[rag]"
```
This will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).
Recommended Installation
For full functionality, we recommend installing Rankify with all dependencies:
bash
pip install "rankify[all]"
This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.
Optional Dependencies
If you prefer to install only specific components, choose from the following:
```bash
Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install "rankify[retriever]"
Install dependencies for base re-ranking only (excluding vLLM)
pip install "rankify[base]"
Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.
pip install "rankify[reranking]"
Install dependencies for retrieval-augmented generation (RAG)
pip install "rankify[rag]"
```
Using ColBERT Retriever
If you want to use ColBERT Retriever, follow these additional setup steps:
```bash
Install GCC and required libraries
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng
bash
Export necessary environment variables
export LDLIBRARYPATH=$CONDAPREFIX/lib:$LDLIBRARYPATH
export CC=gcc
export CXX=g++
export PATH=$CONDAPREFIX/bin:$PATH
Clear cached torch extensions
rm -rf ~/.cache/torch_extensions/*
```
π Quick Start
1οΈβ£. Pre-retrieved Datasets
We provide 1,000 pre-retrieved documents per dataset, which you can download from:
π Hugging Face Dataset Repository
Dataset Format
The pre-retrieved documents are structured as follows:
json
[
{
"question": "...",
"answers": ["...", "...", ...],
"ctxs": [
{
"id": "...", // Passage ID from database TSV file
"score": "...", // Retriever score
"has_answer": true|false // Whether the passage contains the answer
}
]
}
]
Access Datasets in Rankify
You can easily download and use pre-retrieved datasets through Rankify.
List Available Datasets
To see all available datasets:
```python
from rankify.dataset.dataset import Dataset
Display available datasets
Dataset.avaiable_dataset()
```
BM25 Retriever
```python
from rankify.dataset.dataset import Dataset
Download BM25-retrieved documents for nq-dev
dataset = Dataset(retriever="bm25", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="bm25", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="bm25", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for archivialqa-test
dataset = Dataset(retriever="bm25", datasetname="archivialqa-test", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for chroniclingamericaqa-test
dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-test", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for chroniclingamericaqa-dev
dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for entityquestions-test
dataset = Dataset(retriever="bm25", datasetname="entityquestions-test", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for ambig_qa-dev
dataset = Dataset(retriever="bm25", datasetname="ambigqa-dev", ndocs=100)
documents = dataset.download(forcedownload=False)
Download BM25-retrieved documents for ambig_qa-train
dataset = Dataset(retriever="bm25", datasetname="ambigqa-train", ndocs=100)
documents = dataset.download(forcedownload=False)
Download BM25-retrieved documents for arc-test
dataset = Dataset(retriever="bm25", datasetname="arc-test", ndocs=100)
documents = dataset.download(force_download=False)
Download BM25-retrieved documents for arc-dev
dataset = Dataset(retriever="bm25", datasetname="arc-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
BGE Retriever
```python
from rankify.dataset.dataset import Dataset
Download BGE-retrieved documents for nq-dev
dataset = Dataset(retriever="bge", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download BGE-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="bge", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download BGE-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="bge", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
ColBERT Retriever
```python
from rankify.dataset.dataset import Dataset
Download ColBERT-retrieved documents for nq-dev
dataset = Dataset(retriever="colbert", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download ColBERT-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="colbert", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download ColBERT-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="colbert", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
MSS-DPR Retriever
```python
from rankify.dataset.dataset import Dataset
Download MSS-DPR-retrieved documents for nq-dev
dataset = Dataset(retriever="mss-dpr", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="mss-dpr", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download MSS-DPR-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="mss-dpr", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
MSS Retriever
```python
from rankify.dataset.dataset import Dataset
Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="mss", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download MSS-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="mss", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download MSS-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="mss", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
Contriever Retriever
```python
from rankify.dataset.dataset import Dataset
Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="contriever", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download MSS-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="contriever", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download MSS-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="contriever", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
ANCE Retriever
```python
from rankify.dataset.dataset import Dataset
Download ANCE-retrieved documents for nq-dev
dataset = Dataset(retriever="ance", datasetname="nq-dev", ndocs=100)
documents = dataset.download(force_download=False)
Download ANCE-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="ance", datasetname="2wikimultihopqa-train", ndocs=100)
documents = dataset.download(force_download=False)
Download ANCE-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="ance", datasetname="archivialqa-dev", ndocs=100)
documents = dataset.download(force_download=False)
```
Load Pre-retrieved Dataset from File
If you have already downloaded a dataset, you can load it directly:
```python
from rankify.dataset.dataset import Dataset
Load pre-downloaded BM25 dataset for WebQuestions
documents = Dataset.loaddataset('./tests/out-datasets/bm25/webquestions/test.json', 100)
```
Now, you can integrate retrieved documents with re-ranking and RAG workflows! π
2οΈβ£. Running Retrieval
To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, and BGE.
Example: Running Retrieval on Sample Queries
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.retrievers.retriever import Retriever
Sample Documents
documents = [
Document(question=Question("the cast of a good day to die hard?"), answers=Answer([
"Jai Courtney",
"Sebastian Koch",
"Radivoje BukviΔ",
"Yuliya Snigir",
"Sergei Kolesnikov",
"Mary Elizabeth Winstead",
"Bruce Willis"
]), contexts=[]),
Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[])
]
```
```python
BM25 retrieval on Wikipedia
bm25retrieverwiki = Retriever(method="bm25", ndocs=5, indextype="wiki")
BM25 retrieval on MS MARCO
bm25retrievermsmacro = Retriever(method="bm25", ndocs=5, indextype="msmarco")
DPR (multi-encoder) retrieval on Wikipedia
dprretrieverwiki = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="wiki")
DPR (multi-encoder) retrieval on MS MARCO
dprretrievermsmacro = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="msmarco")
DPR (single-encoder) retrieval on Wikipedia
dprretrieverwiki = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="wiki")
DPR (single-encoder) retrieval on MS MARCO
dprretrievermsmacro = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="msmarco")
ANCE retrieval on Wikipedia
anceretrieverwiki = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="wiki")
ANCE retrieval on MS MARCO
anceretrievermsmacro = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="msmarco")
Contriever retrieval on Wikipedia
contrieverretrieverwiki = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="wiki")
Contriever retrieval on MS MARCO
contrieverretrievermsmacro = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="msmarco")
ColBERT retrieval on Wikipedia
colbertretrieverwiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="wiki")
ColBERT retrieval on MS MARCO
colbertretrievermsmacro = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="msmarco")
BGE retrieval on Wikipedia
bgeretrieverwiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="wiki")
BGE retrieval on MS MARCO
bgeretrievermsmacro = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="msmarco")
```
Running Retrieval
After defining the retriever, you can retrieve documents using:
```python
retrieveddocuments = bm25retriever_wiki.retrieve(documents)
for i, doc in enumerate(retrieved_documents):
print(f"\nDocument {i+1}:")
print(doc)
```
3οΈβ£. Running Reranking
Rankify provides support for multiple reranking models. Below are examples of how to use each model.
** Example: Reranking a Document**
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.models.reranking import Reranking
Sample document setup
question = Question("When did Thomas Edison invent the light bulb?")
answers = Answer(["1879"])
contexts = [
Context(text="Lightning strike at Seoul National University", id=1),
Context(text="Thomas Edison tried to invent a device for cars but failed", id=2),
Context(text="Coffee is good for diet", id=3),
Context(text="Thomas Edison invented the light bulb in 1879", id=4),
Context(text="Thomas Edison worked with electricity", id=5),
]
document = Document(question=question, answers=answers, contexts=contexts)
Initialize the reranker
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")
Apply reranking
reranker.rank([document])
Print reordered contexts
for context in document.reorder_contexts:
print(f" - {context.text}")
```
Examples of Using Different Reranking Models
```python
UPR
model = Reranking(method='upr', model_name='t5-base')
API-Based Rerankers
model = Reranking(method='apiranker', modelname='voyage', apikey='your-api-key')
model = Reranking(method='apiranker', modelname='jina', apikey='your-api-key')
model = Reranking(method='apiranker', modelname='mixedbread.ai', apikey='your-api-key')
Blender Reranker
model = Reranking(method='blenderreranker', modelname='PairRM')
ColBERT Reranker
model = Reranking(method='colbertranker', modelname='Colbert')
EchoRank
model = Reranking(method='echorank', model_name='flan-t5-large')
First Ranker
model = Reranking(method='firstranker', modelname='base')
FlashRank
model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')
InContext Reranker
Reranking(method='incontextreranker', modelname='llamav3.1-8b')
InRanker
model = Reranking(method='inranker', model_name='inranker-small')
ListT5
model = Reranking(method='listt5', model_name='listt5-base')
LiT5 Distill
model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')
LiT5 Score
model = Reranking(method='lit5score', model_name='LiT5-Distill-base')
LLM Layerwise Ranker
model = Reranking(method='llmlayerwiseranker', model_name='bge-multilingual-gemma2')
LLM2Vec
model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')
MonoBERT
model = Reranking(method='monobert', model_name='monobert-large')
MonoT5
Reranking(method='monot5', model_name='monot5-base-msmarco')
RankGPT
model = Reranking(method='rankgpt', model_name='llamav3.1-8b')
RankGPT API
model = Reranking(method='rankgpt-api', modelname='gpt-3.5', apikey="gpt-api-key")
model = Reranking(method='rankgpt-api', modelname='gpt-4', apikey="gpt-api-key")
model = Reranking(method='rankgpt-api', modelname='llamav3.1-8b', apikey="together-api-key")
model = Reranking(method='rankgpt-api', modelname='claude-3-5', apikey="claude-api-key")
RankT5
model = Reranking(method='rankt5', model_name='rankt5-base')
Sentence Transformer Reranker
model = Reranking(method='sentencetransformerreranker', modelname='all-MiniLM-L6-v2')
model = Reranking(method='sentencetransformerreranker', modelname='gtr-t5-base')
model = Reranking(method='sentencetransformerreranker', modelname='sentence-t5-base')
model = Reranking(method='sentencetransformerreranker', modelname='distilbert-multilingual-nli-stsb-quora-ranking')
model = Reranking(method='sentencetransformerreranker', model_name='msmarco-bert-co-condensor')
SPLADE
model = Reranking(method='splade', model_name='splade-cocondenser')
Transformer Ranker
model = Reranking(method='transformerranker', modelname='mxbai-rerank-xsmall')
model = Reranking(method='transformerranker', modelname='bge-reranker-base')
model = Reranking(method='transformerranker', modelname='bce-reranker-base')
model = Reranking(method='transformerranker', modelname='jina-reranker-tiny')
model = Reranking(method='transformerranker', modelname='gte-multilingual-reranker-base')
model = Reranking(method='transformerranker', modelname='nli-deberta-v3-large')
model = Reranking(method='transformerranker', modelname='ms-marco-TinyBERT-L-6')
model = Reranking(method='transformerranker', modelname='msmarco-MiniLM-L12-en-de-v1')
TwoLAR
model = Reranking(method='twolar', model_name='twolar-xl')
Vicuna Reranker
model = Reranking(method='vicunareranker', modelname='rankvicuna7b_v1')
Zephyr Reranker
model = Reranking(method='zephyrreranker', modelname='rankzephyr7bv1full')
```
4οΈβ£. Using Generator Module
Rankify provides a Generator Module to facilitate retrieval-augmented generation (RAG) by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator
Define question and answer
question = Question("What is the capital of France?")
answers = Answer(["Paris"])
contexts = [
Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]
Construct document
doc = Document(question=question, answers=answers, contexts=contexts)
Initialize Generator (e.g., Meta Llama)
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')
Generate answer
generatedanswers = generator.generate([doc])
print(generatedanswers) # Output: ["Paris"]
```
5οΈβ£ Evaluating with Metrics
Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.
Evaluating Generated Answers
You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers.
```python
from rankify.metrics.metrics import Metrics
from rankify.dataset.dataset import Dataset
Load dataset
dataset = Dataset('bm25', 'nq-test', 100)
documents = dataset.download(force_download=False)
Initialize Generator
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')
Generate answers
generated_answers = generator.generate(documents)
Evaluate generated answers
metrics = Metrics(documents)
print(metrics.calculategenerationmetrics(generated_answers))
```
Evaluating Retrieval Performance
```python
Calculate retrieval metrics before reranking
metrics = Metrics(documents)
beforerankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)
print(beforerankingmetrics)
```
Evaluating Reranked Results
```python
Calculate retrieval metrics after reranking
afterrankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], usereordered=True)
print(afterranking_metrics)
```
π Supported Models
1οΈβ£ Retrievers
- β
BM25
- β
DPR
- β
ColBERT
- β
ANCE
- β
BGE
- β
Contriever
- β
BPR
2οΈβ£ Rerankers
- β
Cross-Encoders
- β
RankGPT
- β
RankGPT-API
- β
MonoT5
- β
MonoBert
- β
RankT5
- β
ListT5
- β
LiT5Score
- β
LiT5Dist
- β
Vicuna Reranker
- β
Zephyr Reranker
- β
Sentence Transformer-based
- β
FlashRank Models
- β
API-Based Rerankers
- β
ColBERT Reranker
- β
LLM Layerwise Ranker
- β
Splade Reranker
- β
ColBERT Reranker
- β
UPR Reranker
- β
Inranker Reranker
- β
Transformer Reranker
- β
FIRST Reranker
- β
Blender Reranker
- β
LLM2VEC Reranker
- β
ECHO Reranker
- β
Incontext Reranker
3οΈβ£ Generators
- β
Fusion-in-Decoder (FiD) with T5
- β
In-Context Learning RLAM
π Documentation
For full API documentation, visit the Rankify Docs.
π‘ Contributing
Follow these steps to get involved:
Fork this repository to your GitHub account.
Create a new branch for your feature or fix:
bash
git checkout -b feature/YourFeatureName
- Make your changes and commit them:
bash
git commit -m "Add YourFeatureName"
- Push the changes to your branch:
bash
git push origin feature/YourFeatureName
- Submit a Pull Request to propose your changes.
Thank you for helping make this project better!
π License
Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.
π Citation
Please kindly cite our paper if helps your research:
BibTex
@article{abdallah2025rankify,
title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},
journal={arXiv preprint arXiv:2502.02464},
year={2025}
}
Owner
-
Name: DataScienceUIBK
-
Login: DataScienceUIBK
-
Kind: organization
-
Repositories: 1
-
Profile: https://github.com/DataScienceUIBK
Citation
(CITATION.cff)
cff-version: 1.2.0
date-released: 2025-02
message: "If you use this software, please cite it as below."
authors:
- family-names: "Abdallah"
given-names: "Abdelrahman"
- family-names: "Mozafari"
given-names: "Jamshid"
- family-names: "Piryani"
given-names: "Bhawna"
- family-names: "Ali"
given-names: "Mohammed"
- family-names: "Jatowt"
given-names: "Adam"
title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
url: "https://arxiv.org/abs/2502.02464"
preferred-citation:
type: article
authors:
- family-names: "Abdallah"
given-names: "Abdelrahman"
- family-names: "Mozafari"
given-names: "Jamshid"
- family-names: "Piryani"
given-names: "Bhawna"
- family-names: "Ali"
given-names: "Mohammed"
- family-names: "Jatowt"
given-names: "Adam"
title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
journal: "CoRR"
volume: "abs/2502.02464"
year: 2025
url: "https://arxiv.org/abs/2502.02464"
eprinttype: "arXiv"
eprint: "2502.02464"
GitHub Events
Total
-
Create event: 11
-
Issues event: 9
-
Release event: 8
-
Watch event: 474
-
Issue comment event: 5
-
Member event: 4
-
Push event: 155
-
Pull request event: 45
-
Fork event: 33
Last Year
-
Create event: 11
-
Issues event: 9
-
Release event: 8
-
Watch event: 474
-
Issue comment event: 5
-
Member event: 4
-
Push event: 155
-
Pull request event: 45
-
Fork event: 33
Issues and Pull Requests
Last synced: 6 months ago
All Time
-
Total issues: 4
-
Total pull requests: 25
-
Average time to close issues: about 1 month
-
Average time to close pull requests: about 1 hour
-
Total issue authors: 4
-
Total pull request authors: 7
-
Average comments per issue: 0.75
-
Average comments per pull request: 0.0
-
Merged pull requests: 21
-
Bot issues: 0
-
Bot pull requests: 0
Past Year
-
Issues: 4
-
Pull requests: 25
-
Average time to close issues: about 1 month
-
Average time to close pull requests: about 1 hour
-
Issue authors: 4
-
Pull request authors: 7
-
Average comments per issue: 0.75
-
Average comments per pull request: 0.0
-
Merged pull requests: 21
-
Bot issues: 0
-
Bot pull requests: 0
Top Authors
Issue Authors
-
Ahmedg2021
(1)
-
vcchain2019
(1)
-
qiyuxinlin
(1)
-
xhuiyan
(1)
Pull Request Authors
-
abdoelsayed2016
(10)
-
aherzinger
(6)
-
baraayusry
(4)
-
MohammedAli9330
(1)
-
eltociear
(1)
-
MahmoudElsayedMahmoud
(1)
-
tobias124
(1)
Top Labels
Issue Labels
Pull Request Labels
Packages
-
Total packages: 1
-
Total downloads:
-
pypi
129
last-month
-
Total dependent packages: 0
-
Total dependent repositories: 0
-
Total versions: 7
-
Total maintainers: 1
pypi.org:
rankify
A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation
-
Homepage: https://github.com/DataScienceUIBK/rankify
-
Documentation: http://rankify.readthedocs.io/
-
Latest release: 0.1.3
published 12 months ago
Rankings
Dependent packages count: 9.6%
Average: 32.0%
Dependent repos count: 54.3%
Maintainers (1)
Last synced:
6 months ago
Dependencies
docs/requirements.txt
pypi
-
mkdocs
*
-
mkdocs-material
*
-
mkdocs-rtd-dropdown
*
-
mkdocstrings
*
.github/workflows/python-publish.yml
actions
-
actions/checkout
v4
composite
-
actions/setup-python
v3
composite
-
pypa/gh-action-pypi-publish
27b31702a0e7fc50959f5ad993c78deac1bdfc29
composite
pyproject.toml
pypi
-
datasets
==3.2.0
-
httpx
==0.27.2
-
pandas
==2.2.3
-
prettytable
==3.11.0
-
requests
==2.32.3
-
tqdm
==4.66.5
-
transformers
==4.45.2
rankify/requirements.txt
pypi
-
anthropic
==0.37.1
-
dacite
==1.8.1
-
faiss-cpu
==1.9.0.post1
-
flash-attn
==2.5.0
-
fschat
>=0.2.36
-
ftfy
==6.3.1
-
h5py
==3.12.1
-
litellm
==1.50.4
-
llama-cpp-python
==0.2.76
-
llm-blender
==0.0.2
-
ninja
==1.11.1.3
-
omegaconf
==2.3.0
-
onnxruntime
==1.19.2
-
openai
==1.52.2
-
pandas
==2.2.3
-
prettytable
==3.11.0
-
py7zr
==0.22.0
-
pyserini
==0.43.0
-
requests
==2.32.3
-
sentence_transformers
==3.3.0
-
sentencepiece
==0.2.0
-
together
==1.3.3
-
torch
==2.5.0
-
tqdm
==4.66.5
-
transformers
==4.45.2
-
ujson
==5.10.0
-
vllm
==0.6.3
A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.
Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field.
β¨ Features
Comprehensive Retrieval & Reranking Framework: Rankify unifies retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular Python toolkit, enabling seamless experimentation and benchmarking.
Extensive Dataset Support: Includes 40 benchmark datasets with pre-retrieved documents, covering diverse domains such as question answering, dialogue, entity linking, and fact verification.
Diverse Retriever Integration: Supports 7 retrieval techniques, including BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever, providing flexibility for various retrieval strategies.
Advanced Re-ranking Models: Implements 24 primary re-ranking models with 41 sub-methods, covering pointwise, pairwise, and listwise re-ranking approaches for enhanced ranking performance.
Prebuilt Retrieval Indices: Provides precomputed Wikipedia and MS MARCO corpora for multiple retrieval models, eliminating indexing overhead and accelerating experiments.
Seamless RAG Integration: Bridges retrieval and generative models (e.g., GPT, LLAMA, T5), enabling retrieval-augmented generation with zero-shot, Fusion-in-Decoder (FiD), and in-context learning strategies.
Modular & Extensible Design: Easily integrates custom datasets, retrievers, re-rankers, and generation models using Rankifyβs structured Python API.
Comprehensive Evaluation Suite: Offers automated performance evaluation with retrieval, ranking, and RAG metrics, ensuring reproducible benchmarking.
User-Friendly Documentation: Detailed π online documentation, example notebooks, and tutorials for easy adoption.
π Roadmap
Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.
π Planned Improvements
Retrievers
- [x] Support for BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever
- [ ] Add missing retrievers: Spar, MSS, MSS-DPR
- [ ] Enable custom index loading and support for user-defined retrieval corpora
- [x] Support for BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever
Re-Rankers
- [x] 24 primary re-ranking models with 41 sub-methods
- [ ] Expand the list by adding more advanced ranking models
- [x] 24 primary re-ranking models with 41 sub-methods
Datasets
- [x] 40 benchmark datasets for retrieval, ranking, and RAG
- [ ] Add more datasets
- [ ] Support for custom dataset integration
- [x] 40 benchmark datasets for retrieval, ranking, and RAG
Retrieval-Augmented Generation (RAG)
- [x] Integration with GPT, LLAMA, and T5
- [ ] Extend support for more generative models
- [x] Integration with GPT, LLAMA, and T5
Evaluation & Usability
- [x] Standard retrieval and ranking evaluation metrics (Top-K, EM, Recall, ...)
- [ ] Add advanced evaluation metrics (NDCG, MAP for retriever )
Pipeline Integration
- [ ] Add a pipeline module for streamlined retrieval, re-ranking, and RAG workflows
π§ Installation
Set up the virtual environment
First, create and activate a conda environment with Python 3.10:
bash
conda create -n rankify python=3.10
conda activate rankify
Install PyTorch 2.5.1
we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.
If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.
To install Pytorch 2.5.1 you can install it from the following cmd
bash
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
Basic Installation
To install Rankify, simply use pip (requires Python 3.10+):
base
pip install rankify
Or, to install from GitHub for the latest development version:
```bash git clone https://github.com/DataScienceUIBK/rankify.git cd rankify pip install -e .
For full functionality we recommend installing Rankify with all dependencies:
pip install -e ".[all]"
Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install -e ".[retriever]"
Install dependencies for base re-ranking only (excluding vLLM)
pip install -e ".[base]"
Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.
pip install -e ".[reranking]"
Install dependencies for retrieval-augmented generation (RAG)
pip install -e ".[rag]" ``` This will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).
Recommended Installation
For full functionality, we recommend installing Rankify with all dependencies:
bash
pip install "rankify[all]"
This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.
Optional Dependencies
If you prefer to install only specific components, choose from the following: ```bash
Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install "rankify[retriever]"
Install dependencies for base re-ranking only (excluding vLLM)
pip install "rankify[base]"
Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.
pip install "rankify[reranking]"
Install dependencies for retrieval-augmented generation (RAG)
pip install "rankify[rag]" ```
Using ColBERT Retriever
If you want to use ColBERT Retriever, follow these additional setup steps: ```bash
Install GCC and required libraries
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng
bash
Export necessary environment variables
export LDLIBRARYPATH=$CONDAPREFIX/lib:$LDLIBRARYPATH export CC=gcc export CXX=g++ export PATH=$CONDAPREFIX/bin:$PATH
Clear cached torch extensions
rm -rf ~/.cache/torch_extensions/* ```
π Quick Start
1οΈβ£. Pre-retrieved Datasets
We provide 1,000 pre-retrieved documents per dataset, which you can download from:
π Hugging Face Dataset Repository
Dataset Format
The pre-retrieved documents are structured as follows:
json
[
{
"question": "...",
"answers": ["...", "...", ...],
"ctxs": [
{
"id": "...", // Passage ID from database TSV file
"score": "...", // Retriever score
"has_answer": true|false // Whether the passage contains the answer
}
]
}
]
Access Datasets in Rankify
You can easily download and use pre-retrieved datasets through Rankify.
List Available Datasets
To see all available datasets: ```python from rankify.dataset.dataset import Dataset
Display available datasets
Dataset.avaiable_dataset() ```
BM25 Retriever ```python from rankify.dataset.dataset import Dataset
Download BM25-retrieved documents for nq-dev
dataset = Dataset(retriever="bm25", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="bm25", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="bm25", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for archivialqa-test
dataset = Dataset(retriever="bm25", datasetname="archivialqa-test", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for chroniclingamericaqa-test
dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-test", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for chroniclingamericaqa-dev
dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-dev", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for entityquestions-test
dataset = Dataset(retriever="bm25", datasetname="entityquestions-test", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for ambig_qa-dev
dataset = Dataset(retriever="bm25", datasetname="ambigqa-dev", ndocs=100) documents = dataset.download(forcedownload=False)
Download BM25-retrieved documents for ambig_qa-train
dataset = Dataset(retriever="bm25", datasetname="ambigqa-train", ndocs=100) documents = dataset.download(forcedownload=False)
Download BM25-retrieved documents for arc-test
dataset = Dataset(retriever="bm25", datasetname="arc-test", ndocs=100) documents = dataset.download(force_download=False)
Download BM25-retrieved documents for arc-dev
dataset = Dataset(retriever="bm25", datasetname="arc-dev", ndocs=100) documents = dataset.download(force_download=False) ```
BGE Retriever ```python from rankify.dataset.dataset import Dataset
Download BGE-retrieved documents for nq-dev
dataset = Dataset(retriever="bge", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download BGE-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="bge", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download BGE-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="bge", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```
ColBERT Retriever
```python from rankify.dataset.dataset import Dataset
Download ColBERT-retrieved documents for nq-dev
dataset = Dataset(retriever="colbert", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download ColBERT-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="colbert", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download ColBERT-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="colbert", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```
MSS-DPR Retriever
```python from rankify.dataset.dataset import Dataset
Download MSS-DPR-retrieved documents for nq-dev
dataset = Dataset(retriever="mss-dpr", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="mss-dpr", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download MSS-DPR-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="mss-dpr", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```
MSS Retriever
```python from rankify.dataset.dataset import Dataset
Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="mss", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download MSS-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="mss", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download MSS-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="mss", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```
Contriever Retriever
```python from rankify.dataset.dataset import Dataset
Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="contriever", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download MSS-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="contriever", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download MSS-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="contriever", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```
ANCE Retriever
```python from rankify.dataset.dataset import Dataset
Download ANCE-retrieved documents for nq-dev
dataset = Dataset(retriever="ance", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)
Download ANCE-retrieved documents for 2wikimultihopqa-dev
dataset = Dataset(retriever="ance", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)
Download ANCE-retrieved documents for archivialqa-dev
dataset = Dataset(retriever="ance", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```
Load Pre-retrieved Dataset from File
If you have already downloaded a dataset, you can load it directly: ```python from rankify.dataset.dataset import Dataset
Load pre-downloaded BM25 dataset for WebQuestions
documents = Dataset.loaddataset('./tests/out-datasets/bm25/webquestions/test.json', 100) ``` Now, you can integrate retrieved documents with re-ranking and RAG workflows! π
2οΈβ£. Running Retrieval
To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, and BGE.
Example: Running Retrieval on Sample Queries
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.retrievers.retriever import Retriever
Sample Documents
documents = [ Document(question=Question("the cast of a good day to die hard?"), answers=Answer([ "Jai Courtney", "Sebastian Koch", "Radivoje BukviΔ", "Yuliya Snigir", "Sergei Kolesnikov", "Mary Elizabeth Winstead", "Bruce Willis" ]), contexts=[]), Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[]) ] ```
```python
BM25 retrieval on Wikipedia
bm25retrieverwiki = Retriever(method="bm25", ndocs=5, indextype="wiki")
BM25 retrieval on MS MARCO
bm25retrievermsmacro = Retriever(method="bm25", ndocs=5, indextype="msmarco")
DPR (multi-encoder) retrieval on Wikipedia
dprretrieverwiki = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="wiki")
DPR (multi-encoder) retrieval on MS MARCO
dprretrievermsmacro = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="msmarco")
DPR (single-encoder) retrieval on Wikipedia
dprretrieverwiki = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="wiki")
DPR (single-encoder) retrieval on MS MARCO
dprretrievermsmacro = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="msmarco")
ANCE retrieval on Wikipedia
anceretrieverwiki = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="wiki")
ANCE retrieval on MS MARCO
anceretrievermsmacro = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="msmarco")
Contriever retrieval on Wikipedia
contrieverretrieverwiki = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="wiki")
Contriever retrieval on MS MARCO
contrieverretrievermsmacro = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="msmarco")
ColBERT retrieval on Wikipedia
colbertretrieverwiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="wiki")
ColBERT retrieval on MS MARCO
colbertretrievermsmacro = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="msmarco")
BGE retrieval on Wikipedia
bgeretrieverwiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="wiki")
BGE retrieval on MS MARCO
bgeretrievermsmacro = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="msmarco") ```
Running Retrieval
After defining the retriever, you can retrieve documents using: ```python retrieveddocuments = bm25retriever_wiki.retrieve(documents)
for i, doc in enumerate(retrieved_documents): print(f"\nDocument {i+1}:") print(doc) ```
3οΈβ£. Running Reranking
Rankify provides support for multiple reranking models. Below are examples of how to use each model.
** Example: Reranking a Document**
```python
from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.models.reranking import Reranking
Sample document setup
question = Question("When did Thomas Edison invent the light bulb?") answers = Answer(["1879"]) contexts = [ Context(text="Lightning strike at Seoul National University", id=1), Context(text="Thomas Edison tried to invent a device for cars but failed", id=2), Context(text="Coffee is good for diet", id=3), Context(text="Thomas Edison invented the light bulb in 1879", id=4), Context(text="Thomas Edison worked with electricity", id=5), ] document = Document(question=question, answers=answers, contexts=contexts)
Initialize the reranker
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")
Apply reranking
reranker.rank([document])
Print reordered contexts
for context in document.reorder_contexts: print(f" - {context.text}") ```
Examples of Using Different Reranking Models
```python
UPR
model = Reranking(method='upr', model_name='t5-base')
API-Based Rerankers
model = Reranking(method='apiranker', modelname='voyage', apikey='your-api-key') model = Reranking(method='apiranker', modelname='jina', apikey='your-api-key') model = Reranking(method='apiranker', modelname='mixedbread.ai', apikey='your-api-key')
Blender Reranker
model = Reranking(method='blenderreranker', modelname='PairRM')
ColBERT Reranker
model = Reranking(method='colbertranker', modelname='Colbert')
EchoRank
model = Reranking(method='echorank', model_name='flan-t5-large')
First Ranker
model = Reranking(method='firstranker', modelname='base')
FlashRank
model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')
InContext Reranker
Reranking(method='incontextreranker', modelname='llamav3.1-8b')
InRanker
model = Reranking(method='inranker', model_name='inranker-small')
ListT5
model = Reranking(method='listt5', model_name='listt5-base')
LiT5 Distill
model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')
LiT5 Score
model = Reranking(method='lit5score', model_name='LiT5-Distill-base')
LLM Layerwise Ranker
model = Reranking(method='llmlayerwiseranker', model_name='bge-multilingual-gemma2')
LLM2Vec
model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')
MonoBERT
model = Reranking(method='monobert', model_name='monobert-large')
MonoT5
Reranking(method='monot5', model_name='monot5-base-msmarco')
RankGPT
model = Reranking(method='rankgpt', model_name='llamav3.1-8b')
RankGPT API
model = Reranking(method='rankgpt-api', modelname='gpt-3.5', apikey="gpt-api-key") model = Reranking(method='rankgpt-api', modelname='gpt-4', apikey="gpt-api-key") model = Reranking(method='rankgpt-api', modelname='llamav3.1-8b', apikey="together-api-key") model = Reranking(method='rankgpt-api', modelname='claude-3-5', apikey="claude-api-key")
RankT5
model = Reranking(method='rankt5', model_name='rankt5-base')
Sentence Transformer Reranker
model = Reranking(method='sentencetransformerreranker', modelname='all-MiniLM-L6-v2') model = Reranking(method='sentencetransformerreranker', modelname='gtr-t5-base') model = Reranking(method='sentencetransformerreranker', modelname='sentence-t5-base') model = Reranking(method='sentencetransformerreranker', modelname='distilbert-multilingual-nli-stsb-quora-ranking') model = Reranking(method='sentencetransformerreranker', model_name='msmarco-bert-co-condensor')
SPLADE
model = Reranking(method='splade', model_name='splade-cocondenser')
Transformer Ranker
model = Reranking(method='transformerranker', modelname='mxbai-rerank-xsmall') model = Reranking(method='transformerranker', modelname='bge-reranker-base') model = Reranking(method='transformerranker', modelname='bce-reranker-base') model = Reranking(method='transformerranker', modelname='jina-reranker-tiny') model = Reranking(method='transformerranker', modelname='gte-multilingual-reranker-base') model = Reranking(method='transformerranker', modelname='nli-deberta-v3-large') model = Reranking(method='transformerranker', modelname='ms-marco-TinyBERT-L-6') model = Reranking(method='transformerranker', modelname='msmarco-MiniLM-L12-en-de-v1')
TwoLAR
model = Reranking(method='twolar', model_name='twolar-xl')
Vicuna Reranker
model = Reranking(method='vicunareranker', modelname='rankvicuna7b_v1')
Zephyr Reranker
model = Reranking(method='zephyrreranker', modelname='rankzephyr7bv1full')
```
4οΈβ£. Using Generator Module
Rankify provides a Generator Module to facilitate retrieval-augmented generation (RAG) by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.
```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.generator.generator import Generator
Define question and answer
question = Question("What is the capital of France?") answers = Answer(["Paris"]) contexts = [ Context(id=1, title="France", text="The capital of France is Paris.", score=0.9), Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5) ]
Construct document
doc = Document(question=question, answers=answers, contexts=contexts)
Initialize Generator (e.g., Meta Llama)
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')
Generate answer
generatedanswers = generator.generate([doc]) print(generatedanswers) # Output: ["Paris"] ```
5οΈβ£ Evaluating with Metrics
Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.
Evaluating Generated Answers
You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers. ```python from rankify.metrics.metrics import Metrics from rankify.dataset.dataset import Dataset
Load dataset
dataset = Dataset('bm25', 'nq-test', 100) documents = dataset.download(force_download=False)
Initialize Generator
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')
Generate answers
generated_answers = generator.generate(documents)
Evaluate generated answers
metrics = Metrics(documents) print(metrics.calculategenerationmetrics(generated_answers)) ```
Evaluating Retrieval Performance
```python
Calculate retrieval metrics before reranking
metrics = Metrics(documents) beforerankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)
print(beforerankingmetrics) ```
Evaluating Reranked Results
```python
Calculate retrieval metrics after reranking
afterrankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], usereordered=True) print(afterranking_metrics) ```
π Supported Models
1οΈβ£ Retrievers
- β BM25
- β DPR
- β
ColBERT
- β ANCE
- β BGE
- β Contriever
- β BPR
2οΈβ£ Rerankers
- β Cross-Encoders
- β RankGPT
- β RankGPT-API
- β MonoT5
- β MonoBert
- β RankT5
- β ListT5
- β LiT5Score
- β LiT5Dist
- β Vicuna Reranker
- β Zephyr Reranker
- β Sentence Transformer-based
- β
FlashRank Models
- β
API-Based Rerankers
- β ColBERT Reranker
- β LLM Layerwise Ranker
- β Splade Reranker
- β ColBERT Reranker
- β UPR Reranker
- β Inranker Reranker
- β Transformer Reranker
- β FIRST Reranker
- β Blender Reranker
- β LLM2VEC Reranker
- β ECHO Reranker
- β Incontext Reranker
3οΈβ£ Generators
- β Fusion-in-Decoder (FiD) with T5
- β In-Context Learning RLAM
π Documentation
For full API documentation, visit the Rankify Docs.
π‘ Contributing
Follow these steps to get involved:
Fork this repository to your GitHub account.
Create a new branch for your feature or fix:
bash
git checkout -b feature/YourFeatureName
- Make your changes and commit them:
bash
git commit -m "Add YourFeatureName"
- Push the changes to your branch:
bash
git push origin feature/YourFeatureName
- Submit a Pull Request to propose your changes.
Thank you for helping make this project better!
π License
Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.
π Citation
Please kindly cite our paper if helps your research:
BibTex
@article{abdallah2025rankify,
title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},
journal={arXiv preprint arXiv:2502.02464},
year={2025}
}
Owner
- Name: DataScienceUIBK
- Login: DataScienceUIBK
- Kind: organization
- Repositories: 1
- Profile: https://github.com/DataScienceUIBK
Citation (CITATION.cff)
cff-version: 1.2.0
date-released: 2025-02
message: "If you use this software, please cite it as below."
authors:
- family-names: "Abdallah"
given-names: "Abdelrahman"
- family-names: "Mozafari"
given-names: "Jamshid"
- family-names: "Piryani"
given-names: "Bhawna"
- family-names: "Ali"
given-names: "Mohammed"
- family-names: "Jatowt"
given-names: "Adam"
title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
url: "https://arxiv.org/abs/2502.02464"
preferred-citation:
type: article
authors:
- family-names: "Abdallah"
given-names: "Abdelrahman"
- family-names: "Mozafari"
given-names: "Jamshid"
- family-names: "Piryani"
given-names: "Bhawna"
- family-names: "Ali"
given-names: "Mohammed"
- family-names: "Jatowt"
given-names: "Adam"
title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
journal: "CoRR"
volume: "abs/2502.02464"
year: 2025
url: "https://arxiv.org/abs/2502.02464"
eprinttype: "arXiv"
eprint: "2502.02464"
GitHub Events
Total
- Create event: 11
- Issues event: 9
- Release event: 8
- Watch event: 474
- Issue comment event: 5
- Member event: 4
- Push event: 155
- Pull request event: 45
- Fork event: 33
Last Year
- Create event: 11
- Issues event: 9
- Release event: 8
- Watch event: 474
- Issue comment event: 5
- Member event: 4
- Push event: 155
- Pull request event: 45
- Fork event: 33
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 25
- Average time to close issues: about 1 month
- Average time to close pull requests: about 1 hour
- Total issue authors: 4
- Total pull request authors: 7
- Average comments per issue: 0.75
- Average comments per pull request: 0.0
- Merged pull requests: 21
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 25
- Average time to close issues: about 1 month
- Average time to close pull requests: about 1 hour
- Issue authors: 4
- Pull request authors: 7
- Average comments per issue: 0.75
- Average comments per pull request: 0.0
- Merged pull requests: 21
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Ahmedg2021 (1)
- vcchain2019 (1)
- qiyuxinlin (1)
- xhuiyan (1)
Pull Request Authors
- abdoelsayed2016 (10)
- aherzinger (6)
- baraayusry (4)
- MohammedAli9330 (1)
- eltociear (1)
- MahmoudElsayedMahmoud (1)
- tobias124 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 129 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 7
- Total maintainers: 1
pypi.org: rankify
A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation
- Homepage: https://github.com/DataScienceUIBK/rankify
- Documentation: http://rankify.readthedocs.io/
-
Latest release: 0.1.3
published 12 months ago
Rankings
Maintainers (1)
- mkdocs *
- mkdocs-material *
- mkdocs-rtd-dropdown *
- mkdocstrings *
- actions/checkout v4 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
- datasets ==3.2.0
- httpx ==0.27.2
- pandas ==2.2.3
- prettytable ==3.11.0
- requests ==2.32.3
- tqdm ==4.66.5
- transformers ==4.45.2
- anthropic ==0.37.1
- dacite ==1.8.1
- faiss-cpu ==1.9.0.post1
- flash-attn ==2.5.0
- fschat >=0.2.36
- ftfy ==6.3.1
- h5py ==3.12.1
- litellm ==1.50.4
- llama-cpp-python ==0.2.76
- llm-blender ==0.0.2
- ninja ==1.11.1.3
- omegaconf ==2.3.0
- onnxruntime ==1.19.2
- openai ==1.52.2
- pandas ==2.2.3
- prettytable ==3.11.0
- py7zr ==0.22.0
- pyserini ==0.43.0
- requests ==2.32.3
- sentence_transformers ==3.3.0
- sentencepiece ==0.2.0
- together ==1.3.3
- torch ==2.5.0
- tqdm ==4.66.5
- transformers ==4.45.2
- ujson ==5.10.0
- vllm ==0.6.3