rankify

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.

https://github.com/datascienceuibk/rankify

Keywords

agent ai chatgpt information-retrieval llm nlp question-answering rag ranked-retrieval reranking retrieval retrival-augmented-generation

Last synced: 11 months ago · JSON representation ·

Repository

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.

Basic Info

Host: GitHub
Owner: DataScienceUIBK
Language: Python
Default Branch: main
Homepage: https://rankify.readthedocs.io/
Size: 28.3 MB

Statistics

Stars: 499
Watchers: 12
Forks: 36
Open Issues: 1
Releases: 7

Topics

agent ai chatgpt information-retrieval llm nlp question-answering rag ranked-retrieval reranking retrieval retrival-augmented-generation

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Citation

README-PyPI.md

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.

Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field.

✨ Features

Comprehensive Retrieval & Reranking Framework: Rankify unifies retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular Python toolkit, enabling seamless experimentation and benchmarking.
Extensive Dataset Support: Includes 40 benchmark datasets with pre-retrieved documents, covering diverse domains such as question answering, dialogue, entity linking, and fact verification.
Diverse Retriever Integration: Supports 7 retrieval techniques, including BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever, providing flexibility for various retrieval strategies.
Advanced Re-ranking Models: Implements 24 primary re-ranking models with 41 sub-methods, covering pointwise, pairwise, and listwise re-ranking approaches for enhanced ranking performance.
Prebuilt Retrieval Indices: Provides precomputed Wikipedia and MS MARCO corpora for multiple retrieval models, eliminating indexing overhead and accelerating experiments.
Seamless RAG Integration: Bridges retrieval and generative models (e.g., GPT, LLAMA, T5), enabling retrieval-augmented generation with zero-shot, Fusion-in-Decoder (FiD), and in-context learning strategies.
Modular & Extensible Design: Easily integrates custom datasets, retrievers, re-rankers, and generation models using Rankify’s structured Python API.
Comprehensive Evaluation Suite: Offers automated performance evaluation with retrieval, ranking, and RAG metrics, ensuring reproducible benchmarking.
User-Friendly Documentation: Detailed 📖 online documentation, example notebooks, and tutorials for easy adoption.

🔍 Roadmap

Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.

🚀 Planned Improvements

Retrievers
- [x] Support for BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever
- [ ] Add missing retrievers: Spar, MSS, MSS-DPR
- [ ] Enable custom index loading and support for user-defined retrieval corpora
Re-Rankers
- [x] 24 primary re-ranking models with 41 sub-methods
- [ ] Expand the list by adding more advanced ranking models
Datasets
- [x] 40 benchmark datasets for retrieval, ranking, and RAG
- [ ] Add more datasets
- [ ] Support for custom dataset integration
Retrieval-Augmented Generation (RAG)
- [x] Integration with GPT, LLAMA, and T5
- [ ] Extend support for more generative models
Evaluation & Usability
- [x] Standard retrieval and ranking evaluation metrics (Top-K, EM, Recall, ...)
- [ ] Add advanced evaluation metrics (NDCG, MAP for retriever )
Pipeline Integration
- [ ] Add a pipeline module for streamlined retrieval, re-ranking, and RAG workflows

🔧 Installation

Set up the virtual environment

First, create and activate a conda environment with Python 3.10:

bash conda create -n rankify python=3.10 conda activate rankify

Install PyTorch 2.5.1

we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.

If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.

To install Pytorch 2.5.1 you can install it from the following cmd bash pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Basic Installation

To install Rankify, simply use pip (requires Python 3.10+):
base pip install rankify

Or, to install from GitHub for the latest development version:

```bash git clone https://github.com/DataScienceUIBK/rankify.git cd rankify pip install -e .

For full functionality we recommend installing Rankify with all dependencies:

pip install -e ".[all]"

Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)

pip install -e ".[retriever]"

Install dependencies for base re-ranking only (excluding vLLM)

pip install -e ".[base]"

Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.

pip install -e ".[reranking]"

Install dependencies for retrieval-augmented generation (RAG)

pip install -e ".[rag]" ``` This will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).

Recommended Installation

For full functionality, we recommend installing Rankify with all dependencies: bash pip install "rankify[all]" This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.

Optional Dependencies

If you prefer to install only specific components, choose from the following: ```bash

Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)

pip install "rankify[retriever]"

Install dependencies for base re-ranking only (excluding vLLM)

pip install "rankify[base]"

Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.

pip install "rankify[reranking]"

Install dependencies for retrieval-augmented generation (RAG)

pip install "rankify[rag]" ```

Using ColBERT Retriever

If you want to use ColBERT Retriever, follow these additional setup steps: ```bash

Install GCC and required libraries

conda install -c conda-forge gcc=9.4.0 gxx=9.4.0 conda install -c conda-forge libstdcxx-ng bash

Export necessary environment variables

export LDLIBRARYPATH=$CONDAPREFIX/lib:$LDLIBRARYPATH export CC=gcc export CXX=g++ export PATH=$CONDAPREFIX/bin:$PATH

Clear cached torch extensions

rm -rf ~/.cache/torch_extensions/* ```

🚀 Quick Start

1️⃣. Pre-retrieved Datasets

We provide 1,000 pre-retrieved documents per dataset, which you can download from:

🔗 Hugging Face Dataset Repository

Dataset Format

The pre-retrieved documents are structured as follows: json [ { "question": "...", "answers": ["...", "...", ...], "ctxs": [ { "id": "...", // Passage ID from database TSV file "score": "...", // Retriever score "has_answer": true|false // Whether the passage contains the answer } ] } ]

Access Datasets in Rankify

You can easily download and use pre-retrieved datasets through Rankify.

List Available Datasets

To see all available datasets: ```python from rankify.dataset.dataset import Dataset

Display available datasets

Dataset.avaiable_dataset() ```

BM25 Retriever ```python from rankify.dataset.dataset import Dataset

Download BM25-retrieved documents for nq-dev

dataset = Dataset(retriever="bm25", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="bm25", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="bm25", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for archivialqa-test

dataset = Dataset(retriever="bm25", datasetname="archivialqa-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for chroniclingamericaqa-test

dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for chroniclingamericaqa-dev

dataset = Dataset(retriever="bm25", datasetname="chroniclingamericaqa-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for entityquestions-test

dataset = Dataset(retriever="bm25", datasetname="entityquestions-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for ambig_qa-dev

dataset = Dataset(retriever="bm25", datasetname="ambigqa-dev", ndocs=100) documents = dataset.download(forcedownload=False)

Download BM25-retrieved documents for ambig_qa-train

dataset = Dataset(retriever="bm25", datasetname="ambigqa-train", ndocs=100) documents = dataset.download(forcedownload=False)

Download BM25-retrieved documents for arc-test

dataset = Dataset(retriever="bm25", datasetname="arc-test", ndocs=100) documents = dataset.download(force_download=False)

Download BM25-retrieved documents for arc-dev

dataset = Dataset(retriever="bm25", datasetname="arc-dev", ndocs=100) documents = dataset.download(force_download=False) ```

BGE Retriever ```python from rankify.dataset.dataset import Dataset

Download BGE-retrieved documents for nq-dev

dataset = Dataset(retriever="bge", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download BGE-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="bge", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download BGE-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="bge", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

ColBERT Retriever

```python from rankify.dataset.dataset import Dataset

Download ColBERT-retrieved documents for nq-dev

dataset = Dataset(retriever="colbert", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download ColBERT-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="colbert", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download ColBERT-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="colbert", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

MSS-DPR Retriever

```python from rankify.dataset.dataset import Dataset

Download MSS-DPR-retrieved documents for nq-dev

dataset = Dataset(retriever="mss-dpr", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="mss-dpr", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-DPR-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="mss-dpr", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

MSS Retriever

```python from rankify.dataset.dataset import Dataset

Download MSS-retrieved documents for nq-dev

dataset = Dataset(retriever="mss", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="mss", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="mss", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

Contriever Retriever

```python from rankify.dataset.dataset import Dataset

Download MSS-retrieved documents for nq-dev

dataset = Dataset(retriever="contriever", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="contriever", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download MSS-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="contriever", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

ANCE Retriever

```python from rankify.dataset.dataset import Dataset

Download ANCE-retrieved documents for nq-dev

dataset = Dataset(retriever="ance", datasetname="nq-dev", ndocs=100) documents = dataset.download(force_download=False)

Download ANCE-retrieved documents for 2wikimultihopqa-dev

dataset = Dataset(retriever="ance", datasetname="2wikimultihopqa-train", ndocs=100) documents = dataset.download(force_download=False)

Download ANCE-retrieved documents for archivialqa-dev

dataset = Dataset(retriever="ance", datasetname="archivialqa-dev", ndocs=100) documents = dataset.download(force_download=False) ```

Load Pre-retrieved Dataset from File

If you have already downloaded a dataset, you can load it directly: ```python from rankify.dataset.dataset import Dataset

Load pre-downloaded BM25 dataset for WebQuestions

documents = Dataset.loaddataset('./tests/out-datasets/bm25/webquestions/test.json', 100) ``` Now, you can integrate retrieved documents with re-ranking and RAG workflows! 🚀

2️⃣. Running Retrieval

To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, and BGE.

Example: Running Retrieval on Sample Queries
```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.retrievers.retriever import Retriever

Sample Documents

documents = [ Document(question=Question("the cast of a good day to die hard?"), answers=Answer([ "Jai Courtney", "Sebastian Koch", "Radivoje Bukvić", "Yuliya Snigir", "Sergei Kolesnikov", "Mary Elizabeth Winstead", "Bruce Willis" ]), contexts=[]), Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[]) ] ```

```python

BM25 retrieval on Wikipedia

bm25retrieverwiki = Retriever(method="bm25", ndocs=5, indextype="wiki")

BM25 retrieval on MS MARCO

bm25retrievermsmacro = Retriever(method="bm25", ndocs=5, indextype="msmarco")

DPR (multi-encoder) retrieval on Wikipedia

dprretrieverwiki = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="wiki")

DPR (multi-encoder) retrieval on MS MARCO

dprretrievermsmacro = Retriever(method="dpr", model="dpr-multi", ndocs=5, indextype="msmarco")

DPR (single-encoder) retrieval on Wikipedia

dprretrieverwiki = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="wiki")

DPR (single-encoder) retrieval on MS MARCO

dprretrievermsmacro = Retriever(method="dpr", model="dpr-single", ndocs=5, indextype="msmarco")

ANCE retrieval on Wikipedia

anceretrieverwiki = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="wiki")

ANCE retrieval on MS MARCO

anceretrievermsmacro = Retriever(method="ance", model="ance-multi", ndocs=5, indextype="msmarco")

Contriever retrieval on Wikipedia

contrieverretrieverwiki = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="wiki")

Contriever retrieval on MS MARCO

contrieverretrievermsmacro = Retriever(method="contriever", model="facebook/contriever-msmarco", ndocs=5, indextype="msmarco")

ColBERT retrieval on Wikipedia

colbertretrieverwiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="wiki")

ColBERT retrieval on MS MARCO

colbertretrievermsmacro = Retriever(method="colbert", model="colbert-ir/colbertv2.0", ndocs=5, indextype="msmarco")

BGE retrieval on Wikipedia

bgeretrieverwiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="wiki")

BGE retrieval on MS MARCO

bgeretrievermsmacro = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", ndocs=5, indextype="msmarco") ```

Running Retrieval

After defining the retriever, you can retrieve documents using: ```python retrieveddocuments = bm25retriever_wiki.retrieve(documents)

for i, doc in enumerate(retrieved_documents): print(f"\nDocument {i+1}:") print(doc) ```

3️⃣. Running Reranking

Rankify provides support for multiple reranking models. Below are examples of how to use each model.

** Example: Reranking a Document**
```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.models.reranking import Reranking

Sample document setup

question = Question("When did Thomas Edison invent the light bulb?") answers = Answer(["1879"]) contexts = [ Context(text="Lightning strike at Seoul National University", id=1), Context(text="Thomas Edison tried to invent a device for cars but failed", id=2), Context(text="Coffee is good for diet", id=3), Context(text="Thomas Edison invented the light bulb in 1879", id=4), Context(text="Thomas Edison worked with electricity", id=5), ] document = Document(question=question, answers=answers, contexts=contexts)

Initialize the reranker

reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")

Apply reranking

reranker.rank([document])

Print reordered contexts

for context in document.reorder_contexts: print(f" - {context.text}") ```

Examples of Using Different Reranking Models
```python

UPR

model = Reranking(method='upr', model_name='t5-base')

API-Based Rerankers

model = Reranking(method='apiranker', modelname='voyage', apikey='your-api-key') model = Reranking(method='apiranker', modelname='jina', apikey='your-api-key') model = Reranking(method='apiranker', modelname='mixedbread.ai', apikey='your-api-key')

Blender Reranker

model = Reranking(method='blenderreranker', modelname='PairRM')

ColBERT Reranker

model = Reranking(method='colbertranker', modelname='Colbert')

EchoRank

model = Reranking(method='echorank', model_name='flan-t5-large')

First Ranker

model = Reranking(method='firstranker', modelname='base')

FlashRank

model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')

InContext Reranker

Reranking(method='incontextreranker', modelname='llamav3.1-8b')

InRanker

model = Reranking(method='inranker', model_name='inranker-small')

ListT5

model = Reranking(method='listt5', model_name='listt5-base')

LiT5 Distill

model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')

LiT5 Score

model = Reranking(method='lit5score', model_name='LiT5-Distill-base')

LLM Layerwise Ranker

model = Reranking(method='llmlayerwiseranker', model_name='bge-multilingual-gemma2')

LLM2Vec

model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')

MonoBERT

model = Reranking(method='monobert', model_name='monobert-large')

MonoT5

Reranking(method='monot5', model_name='monot5-base-msmarco')

RankGPT

model = Reranking(method='rankgpt', model_name='llamav3.1-8b')

RankGPT API

model = Reranking(method='rankgpt-api', modelname='gpt-3.5', apikey="gpt-api-key") model = Reranking(method='rankgpt-api', modelname='gpt-4', apikey="gpt-api-key") model = Reranking(method='rankgpt-api', modelname='llamav3.1-8b', apikey="together-api-key") model = Reranking(method='rankgpt-api', modelname='claude-3-5', apikey="claude-api-key")

RankT5

model = Reranking(method='rankt5', model_name='rankt5-base')

Sentence Transformer Reranker

model = Reranking(method='sentencetransformerreranker', modelname='all-MiniLM-L6-v2') model = Reranking(method='sentencetransformerreranker', modelname='gtr-t5-base') model = Reranking(method='sentencetransformerreranker', modelname='sentence-t5-base') model = Reranking(method='sentencetransformerreranker', modelname='distilbert-multilingual-nli-stsb-quora-ranking') model = Reranking(method='sentencetransformerreranker', model_name='msmarco-bert-co-condensor')

SPLADE

model = Reranking(method='splade', model_name='splade-cocondenser')

Transformer Ranker

model = Reranking(method='transformerranker', modelname='mxbai-rerank-xsmall') model = Reranking(method='transformerranker', modelname='bge-reranker-base') model = Reranking(method='transformerranker', modelname='bce-reranker-base') model = Reranking(method='transformerranker', modelname='jina-reranker-tiny') model = Reranking(method='transformerranker', modelname='gte-multilingual-reranker-base') model = Reranking(method='transformerranker', modelname='nli-deberta-v3-large') model = Reranking(method='transformerranker', modelname='ms-marco-TinyBERT-L-6') model = Reranking(method='transformerranker', modelname='msmarco-MiniLM-L12-en-de-v1')

TwoLAR

model = Reranking(method='twolar', model_name='twolar-xl')

Vicuna Reranker

model = Reranking(method='vicunareranker', modelname='rankvicuna7b_v1')

Zephyr Reranker

model = Reranking(method='zephyrreranker', modelname='rankzephyr7bv1full')

```

4️⃣. Using Generator Module

Rankify provides a Generator Module to facilitate retrieval-augmented generation (RAG) by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.

```python from rankify.dataset.dataset import Document, Question, Answer, Context from rankify.generator.generator import Generator

Define question and answer

question = Question("What is the capital of France?") answers = Answer(["Paris"]) contexts = [ Context(id=1, title="France", text="The capital of France is Paris.", score=0.9), Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5) ]

Construct document

doc = Document(question=question, answers=answers, contexts=contexts)

Initialize Generator (e.g., Meta Llama)

generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

Generate answer

generatedanswers = generator.generate([doc]) print(generatedanswers) # Output: ["Paris"] ```

5️⃣ Evaluating with Metrics

Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.

Evaluating Generated Answers

You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers. ```python from rankify.metrics.metrics import Metrics from rankify.dataset.dataset import Dataset

Load dataset

dataset = Dataset('bm25', 'nq-test', 100) documents = dataset.download(force_download=False)

Initialize Generator

generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

Generate answers

generated_answers = generator.generate(documents)

Evaluate generated answers

metrics = Metrics(documents) print(metrics.calculategenerationmetrics(generated_answers)) ```

Evaluating Retrieval Performance

```python

Calculate retrieval metrics before reranking

metrics = Metrics(documents) beforerankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)

print(beforerankingmetrics) ```

Evaluating Reranked Results
```python

Calculate retrieval metrics after reranking

afterrankingmetrics = metrics.calculateretrievalmetrics(ks=[1, 5, 10, 20, 50, 100], usereordered=True) print(afterranking_metrics) ```

📜 Supported Models

1️⃣ Retrievers

✅ BM25
✅ DPR
✅ ColBERT
✅ ANCE
✅ BGE
✅ Contriever

- ✅ BPR

2️⃣ Rerankers

✅ Cross-Encoders
✅ RankGPT
✅ RankGPT-API
✅ MonoT5
✅ MonoBert
✅ RankT5
✅ ListT5
✅ LiT5Score
✅ LiT5Dist
✅ Vicuna Reranker
✅ Zephyr Reranker
✅ Sentence Transformer-based
✅ FlashRank Models
✅ API-Based Rerankers
✅ ColBERT Reranker
✅ LLM Layerwise Ranker
✅ Splade Reranker
✅ ColBERT Reranker
✅ UPR Reranker
✅ Inranker Reranker
✅ Transformer Reranker
✅ FIRST Reranker
✅ Blender Reranker
✅ LLM2VEC Reranker
✅ ECHO Reranker

- ✅ Incontext Reranker

3️⃣ Generators

✅ Fusion-in-Decoder (FiD) with T5

- ✅ In-Context Learning RLAM

📖 Documentation

For full API documentation, visit the Rankify Docs.

💡 Contributing

Follow these steps to get involved:

Fork this repository to your GitHub account.
Create a new branch for your feature or fix:

bash git checkout -b feature/YourFeatureName

Make your changes and commit them:

bash git commit -m "Add YourFeatureName"

Push the changes to your branch:

bash git push origin feature/YourFeatureName

Submit a Pull Request to propose your changes.

Thank you for helping make this project better!

🔖 License

Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.

🌟 Citation

Please kindly cite our paper if helps your research:

BibTex @article{abdallah2025rankify, title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation}, author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam}, journal={arXiv preprint arXiv:2502.02464}, year={2025} }

Owner

Name: DataScienceUIBK
Login: DataScienceUIBK
Kind: organization

Repositories: 1
Profile: https://github.com/DataScienceUIBK

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2025-02
message: "If you use this software, please cite it as below."
authors:
- family-names: "Abdallah"
  given-names: "Abdelrahman"
- family-names: "Mozafari"
  given-names: "Jamshid"
- family-names: "Piryani"
  given-names: "Bhawna"
- family-names: "Ali"
  given-names: "Mohammed"
- family-names: "Jatowt"
  given-names: "Adam"
title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
url: "https://arxiv.org/abs/2502.02464"
preferred-citation:
  type: article
  authors:
    - family-names: "Abdallah"
      given-names: "Abdelrahman"
    - family-names: "Mozafari"
      given-names: "Jamshid"
    - family-names: "Piryani"
      given-names: "Bhawna"
    - family-names: "Ali"
      given-names: "Mohammed"
    - family-names: "Jatowt"
      given-names: "Adam"
  title: "Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation"
  journal: "CoRR"
  volume: "abs/2502.02464"
  year: 2025
  url: "https://arxiv.org/abs/2502.02464"
  eprinttype: "arXiv"
  eprint: "2502.02464"

GitHub Events

Total

Create event: 11
Issues event: 9
Release event: 8
Watch event: 474
Issue comment event: 5
Member event: 4
Push event: 155
Pull request event: 45
Fork event: 33

Last Year

Create event: 11
Issues event: 9
Release event: 8
Watch event: 474
Issue comment event: 5
Member event: 4
Push event: 155
Pull request event: 45
Fork event: 33

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 4
Total pull requests: 25
Average time to close issues: about 1 month
Average time to close pull requests: about 1 hour
Total issue authors: 4
Total pull request authors: 7
Average comments per issue: 0.75
Average comments per pull request: 0.0
Merged pull requests: 21
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 25
Average time to close issues: about 1 month
Average time to close pull requests: about 1 hour
Issue authors: 4
Pull request authors: 7
Average comments per issue: 0.75
Average comments per pull request: 0.0
Merged pull requests: 21
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Ahmedg2021 (1)
vcchain2019 (1)
qiyuxinlin (1)
xhuiyan (1)

Pull Request Authors

abdoelsayed2016 (10)
aherzinger (6)
baraayusry (4)
MohammedAli9330 (1)
eltociear (1)
MahmoudElsayedMahmoud (1)
tobias124 (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 129 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 7
Total maintainers: 1

pypi.org: rankify

A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

Homepage: https://github.com/DataScienceUIBK/rankify
Documentation: http://rankify.readthedocs.io/
Latest release: 0.1.3
published over 1 year ago

Versions: 7
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 129 Last month

Rankings

Dependent packages count: 9.6%

Average: 32.0%

Dependent repos count: 54.3%

Maintainers (1)

abdoelsayed2016

Last synced: 11 months ago

rankify

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README-PyPI.md

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

✨ Features

🔍 Roadmap

🚀 Planned Improvements

🔧 Installation

Set up the virtual environment

Install PyTorch 2.5.1

Basic Installation

For full functionality we recommend installing Rankify with all dependencies:

Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)

Install dependencies for base re-ranking only (excluding vLLM)

Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.

Install dependencies for retrieval-augmented generation (RAG)

Recommended Installation

Optional Dependencies

Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)

Install dependencies for base re-ranking only (excluding vLLM)

Install base re-ranking with vLLM support for FirstModelReranker, LiT5ScoreReranker, LiT5DistillReranker, VicunaReranker, and `ZephyrReranker'.

Install dependencies for retrieval-augmented generation (RAG)

Using ColBERT Retriever

Install GCC and required libraries

Export necessary environment variables

Clear cached torch extensions

🚀 Quick Start

1️⃣. Pre-retrieved Datasets

Dataset Format

Access Datasets in Rankify

List Available Datasets

Display available datasets

Download BM25-retrieved documents for nq-dev

Download BM25-retrieved documents for 2wikimultihopqa-dev

Download BM25-retrieved documents for archivialqa-dev

Download BM25-retrieved documents for archivialqa-test

Download BM25-retrieved documents for chroniclingamericaqa-test

Download BM25-retrieved documents for chroniclingamericaqa-dev

Download BM25-retrieved documents for entityquestions-test

Download BM25-retrieved documents for ambig_qa-dev

Download BM25-retrieved documents for ambig_qa-train

Download BM25-retrieved documents for arc-test

Download BM25-retrieved documents for arc-dev

Download BGE-retrieved documents for nq-dev

Download BGE-retrieved documents for 2wikimultihopqa-dev

Download BGE-retrieved documents for archivialqa-dev

Download ColBERT-retrieved documents for nq-dev

Download ColBERT-retrieved documents for 2wikimultihopqa-dev

Download ColBERT-retrieved documents for archivialqa-dev

Download MSS-DPR-retrieved documents for nq-dev

Download MSS-DPR-retrieved documents for 2wikimultihopqa-dev

Download MSS-DPR-retrieved documents for archivialqa-dev

Download MSS-retrieved documents for nq-dev

Download MSS-retrieved documents for 2wikimultihopqa-dev

Download MSS-retrieved documents for archivialqa-dev

Download MSS-retrieved documents for nq-dev

Download MSS-retrieved documents for 2wikimultihopqa-dev

Download MSS-retrieved documents for archivialqa-dev

Download ANCE-retrieved documents for nq-dev

Download ANCE-retrieved documents for 2wikimultihopqa-dev

Download ANCE-retrieved documents for archivialqa-dev

Load pre-downloaded BM25 dataset for WebQuestions

2️⃣. Running Retrieval

Sample Documents

BM25 retrieval on Wikipedia

BM25 retrieval on MS MARCO

DPR (multi-encoder) retrieval on Wikipedia

DPR (multi-encoder) retrieval on MS MARCO

DPR (single-encoder) retrieval on Wikipedia

DPR (single-encoder) retrieval on MS MARCO

ANCE retrieval on Wikipedia

ANCE retrieval on MS MARCO

Contriever retrieval on Wikipedia

Contriever retrieval on MS MARCO

Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.

Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.