ragthoven
RAGthoven, a Retrieval Augmented Generation Toolkit that helps you easily set up and execute your RAG experiments
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.1%) to scientific vocabulary
Keywords
Repository
RAGthoven, a Retrieval Augmented Generation Toolkit that helps you easily set up and execute your RAG experiments
Basic Info
Statistics
- Stars: 14
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
RAGthoven
Features:
- Run end-to-end retrieval-augmented generation (RAG) experiments with minimal setup.
- Configure indexing, retrieval, re-ranking, and generation using YAML or CLI.
- Processes the validation dataset in parallel, for lightning fast experiments.
- Supports multiple retrievers and rerankers for scalable experimentation.
- Use matrixable options to define multiple pipeline configurations for batch evaluation.
- Automate benchmarking for consistent and reproducible RAG-based evaluations.
- Adapt and extend configurations for different NLP tasks.
- Integrate with popular LLMs for generation and in-context learning.
- Log results for experiment tracking and evaluation.
- ...and more!
Demo
https://github.com/user-attachments/assets/1bd037a7-0af5-4f93-9347-64143d74e5c6
How to get started
Prepare the repository
- Install
poetry(package manager for Python), following the official instructions Install project dependencies:
sh poetry installActivate the virtual env:
sh poetry shellCreate a
.envfilesh cp .env.sample .envSet up the LLM by setting the correct environment variables. For instance, you can set your OpenAI API token to
OPENAI_API_KEYin your.envfile (see .env.sample) or use ollama by providingOLLAMA_API_BASE.
Run the project
First, follow the instructions in Prepare the repository section
Run the project
sh ragthoven <config file path>Sample config files can be found in theconfig/directory.For example, suppose we want to evaluate the example config
config/example_ag_news.yamlwhose contents are listed below:```yaml name: "AG news" trainingdata: dataset: "fancyzhx/agnews" inputfeature: "text" labelfeature: "label" splitname: "train" textuallabels: ["World", "Sports", "Business", "Sci/Tech"] validationdata: inputfeature: "text" splitname: "test" results: outputfilename: "exampleagnews_results" embed: k: 10 rerank: k: 3 llm: model: "azure/gpt-4o" sprompt: | You are a journalist. You would like to categorise each of the articles by it's headline. There are these 4 categories you can choose from: - World - news about world, general news - Sports - news about sports and related stuff - Business - news from business world - Sci/Tech - news about new science and technology As a busy journalist please answer with single world:
Here are some of the headlines you labeled: {{ examples }}uprompt: | Please determine the category of the article based of this heading: {{ text }} ```
As the configuration suggests, it is a text classification task, in which the article headlines are classified into four classes. In this particular case, we utilize both an embedding as well as the re-ranking mechanism to aid the LLM (in this case GPT-4o accessed via the Azure OpenAI Endpoints) with the classification.
To obtain the actual results for the test split, you can run the following:
sh ragthoven config/example_ag_news.yamlOptional: Select output format
You can also select the output format with the CLI option
--output. See example below:sh ragthoven <config file path> --output csvThe default output format is
.jsonl, if the output format is not specified.You can always use the
--helpoption to know which output formats are supported.
YAML configuration
To configure RAGthoven to run experiment on a dataset it needs a valid configuration file. The structure of configuration file is logicaly split into sections and it is described bellow. The configuration file described below can bee found in config/comp_case2024-climate_matrix_multiprompt_custom_examples.yaml.
Config file sections
Every RAGthoven run requires a name, making it a required argument.
yaml name: "Shared task on Climate Activism Stance and Hate Event Detection at CASE 2024"Training dataset is used to provide examples to the LLM. We need to specify the dataset source, textual intput, labels and split.
dataset- This can be any huggingface dataset (eg.nyu-mll/glue), acsvfile (as described bellow) or ajsonfile. These can be specified by prefixing the path withcsv:orjson:respectively.input_feature- which feature to use for embedding/rerankingoutput_feature(orlabel_feature) - which feature to use asinput_feature's classification label/regression score/summarization summary/translation target.split_name- useful especially with HuggingFace dataset, leave astrainwhen loadingjsonorcsvtextual_labels- some labels (especially in classification task) may come as indexes (0, 1, 2 ...). In order for LLM to make sense out of these labels, RAGthoven has the ability to translate indexes into textual labels. In the examples bellow this would mean(0 => "No hatespeech", 1 => "Hate speech"). Provide an empty array to pass the label directly.dataset_version- (Optional) some dataset have multiple versions. Use this parameter to select the desired version.
A full example:
yaml
training_data:
dataset: "csv:./data/SubTask-A-train.csv"
input_feature: "tweet"
output_feature: "label"
split_name: "train"
textual_labels: ["No hatespeech", "Hate speech"]
dataset_version: "001"
- Validation dataset is the primary source of input RAGthoven uses during its execution. This is the text that is used as query for retrieval/reranking and which is then judged by LLM.
dataset- (Optional) the same applies as fortraining_data.dataset. When not providedtraining_data.datasetis used instead.input_feature- what feature is used for retrieval/reranking. This feature can be passed into prompts as{{ text }}split_name- the same applies as fortraining_data.split_nameHere is a full example:
yaml
validation_data:
dataset: "csv:./data/SubTask-A-test.csv"
input_feature: "tweet"
split_name: "train"
- Output specific configurations
output_cached- (Optional) when running validation on huge datasets (eg. 30k validation data) in case of a crash (e.g. due to no API credits available ...) only partial data is written into the results file by default. When running RAGthoven once again, the results file is recreated and all existing results data is lost. In order to prevent this, it is possible to setoutput_cachedtotrue. When running again RAGthoven loads results file and performs inference only onidsthat are not yet present in the results file.output_cache_id- (Optional) specify what is the name of the feature to be used asidin the outputoutput_filename- (Optional) specify the output file name (whithout extension)bad_request_default_value- (Optional) what is the desired output of the RAGthoven when the request ends in error (e.g. because the LLM API refuses to respond...) A full example:
yaml
results:
output_cached: true
output_cache_id: "index"
output_filename: "results"
bad_request_default_value: -1
- Retrieval is a crucial part of RAG. It is configured in the next section. When retrieval is not used, you may omit this configuration section entirely (see
ragthoven/config/comp_case2024-climate_matrix_no_retrieval.yaml)k- number of relevant training samples to be retrieved for every query. Matrixable parameter, provide an array of values (eg.[20, 10, 5]) to run in matrix mode.training_size_limit- (Optional) when doing testing, it is best to not start by embedding all training samples, as this can be time consuming. Setting this option reduces training dataset to more manageable number.model- (Optional) choose any HuggingFace-provided model for retrieval. Basic retrieval is done using [Chroma](https://github.com/chroma-core/chroma. Matrixable, specify models in an array. A full example can be seen below:
yaml
embed:
k: 20
training_size_limit: 20
model: "sentence-transformers/all-MiniLM-L6-v2"
embedder: "sbert"
device: "cpu"
docs_embedding_count: 512
- In order to obtain high performance on the retrieved results, it is often a good idea to rerank them and select only the most relevant ones. This is done in cross encoder fashion. When reranking is not used, omit this configuration entry completely (see
ragthoven/config/comp_case2024-climate_matrix_no_rerank.yaml)k- select only top-kreranked from retrieved samples. Matrixable, specifyks in an array.training_size_limit- limits number of documents used for retrieval. First number of documents is used.model- (Optional) choose model for reranking. Basic reranker utilizes FlashRank, which you can consult for more details on available models. Matrixable, specify models in an array. A full example can be seen below:embedder- currently there are two embedding methods available,sbertwhich utilizessentence-transformersmodels supported byChromadbandcdewhich can useContextual Document Embeddings (CDE)models, eg.jxm/cde-small-v1. In both caseschromadbis used as vector database.device- when utilizingsbertas embedder, this option allows for setting the device used in the embedding computation (eg.cuda, defaults tocpu)docs_embedding_count- used only incdemode. Number of documents to use for creating dataset embeddings. Randomly sampled. See https://huggingface.co/jxm/cde-small-v1 for more information. Whendocs_embedding_count>training_size_limitthendocs_embedding_count=training_size_limit.
yaml
rerank:
k: [5, 1]
model: "ms-marco-MiniLM-L-12-v2"
- At the heart of RAGthoven, there is always a call to an LLM API, be it local or commercial.
model- which model to use. Matrixable, provide an array of models.temperature- (Optional) - temperature for the models decoder. Defaults to0. Matrixable, provide array with temperature values.examples- (Optional) examples can be ordered, arranged and arbitrary textual information can be added. Thetextandlabelare as proved in configuration for the training dataset. Other than that, a variableexamples[i].data.feature, wherefeatureis a feature present in the original dataset allows for access of any feature for the given example.
A full example can be seen below:
yaml
llm:
model: "gpt-4o"
temperature: 0
examples: |
This is example number one: {{ examples[0].text }} and it's corresponding label: {{ examples[0].label }}
...
This is example number two: {{ examples[10].text }} and it's corresponding label: {{ examples[10].label }}
If you want more examples than there are retrieved/reranker, it will fail and throw an exception!
...
Special features
Matrix configuration
Some parameters are Matrixable - support matrix configuration. The matrix configuration runs for every configuration.
eg. when two variables are set k: [20, 10] and n: [5, 1] the there are 4 combinations evaluated:
1. k: 20, n: 5
2. k: 20, n: 1
3. k: 10, n: 5
4. k: 10, n: 1
adding another variable with two options to the matrix would result in 8 combinations.
Multiple prompts
Prompts can be specified in two ways:
- Multiprompt version: specify prompts in array named prompts with following structure:
- name - name of the prompt, this is used to later access output of specific prompt as input in another. There is one special prompt added into every call called system with role system.
- role - select system or user role (or any other role supported by the model).
- prompt - jinja2 formatable prompt. Each prompt is provided with examples, text variables, output of previous propts accessible by {{ previous_prompt_name.out }} and {{ data.<feature> }} where <feature> is a feature that is available in validation dataset.
- Single prompt version (see ragthoven/config/comp_case2024-climate_matrix_with_custom_examples.yaml)
- sprompt - single system prompt. Matrixable, provide an array of prompts.
- uprompt - single user prompt. Matrixable, provide an array of prompts.
both prompts can make use of {{ examples }}, {{ text }} and {{ data.<feature> }} where <feature> is a feature that is available in validation dataset.
An example of the multiprompt version of the configuration. ```yaml ... prompts: - name: "system" role: "system" prompt: | Analyze the input tweet to determine if it is hate speech or not, based on the following criteria:
1. Some instructions here
...
4. Some instructions there
## Examples
{{ examples }}
-
name: "select_key_phrases"
role: "user"
prompt: |
Try to reason and think of the words and phrases that best describe the mood/behaviour of the person writing
that text. Try to list at least 5 - 6 words/phrases.
Text: {{ text }}
-
name: "result"
role: "user"
prompt: |
Please determine the category of the text given the text and words/phrases describing it's sentiment.
Use "1" for hatespeech tweets and "0" for all other:
Text: {{ text }}
Sentiment keywords: {{ select_key_phrases.out }}
ANSER ONLY WITH A SINGLE NUMBER!
```
An example of the single prompt version of the configuration.
```yaml ... sprompt: | Analyze the input tweet to determine if it is hate speech or not, based on the following criteria:
1. Some instructions here
...
4. Some instructions there
## Examples
{{ examples }}
uprompt: |
Please determine the category of the text use "1" for hatespeech tweets and "0" for all other:
{{ text }}
ANSER ONLY WITH SINGLE NUMBER!
```
Preprocessing
RAGthoven provides a way to run custom python code on every example in the validation dataset. Please refer to code ragthoven/tools/example_tool.py on how to write a tool for RAGthoven preprocessing. In order to specify which tools to run, specify them in yaml config as follows:
(Note that the tools are run sequentially, in the order of appearance in the entries key of the YAML config.)
yaml
preprocessor:
entries: ["example_tool.fizzbuzz", "example_tool.count_ands"]
Function calling
RAGthoven provides a way to use function calling. LLMs can be provided with tools (e.g. functions to call) which can help them fetch fresh data from various sources or take actions (e.g. call an API, send an email etc). The LLM can then decide on its own whether it needs to call any of the provided functions and what arguments to pass to those functions. You can read more about function calling here.
Note that the LLM does not actually perform the function call - it merely notes in its response that it wants the function to be executed with specific arguments, and then RAGthoven executes the function. Finally, the results of the function call can be used in the 2 following ways:
1. If you set llm.messages: true in the config, then the results of the function call will be forwarded to the LLM in any api call to the LLM for any future prompts (for the current validation example). The messages are passed along as part of the array of messages which contain the whole conversation, that the LLM API accepts.
2. If you set llm.messages: false in the config, then no messages array is passed to the LLM and the results of the function call can be used manually in a subsequent prompt to the LLM. You need to specify in the yaml config where they will be used inside a prompt.
Please refer to code ragthoven/tools/example_fun_calling.py on how to write functions for function calling. Also, in order to use function calling, please use multiprompt in your yaml config (example: config/single-shot-example-function-calling.yaml). In order to use this example please install wikipedia package:
```yaml llm: ... tools: ["examplefuncalling.WikipediaPageSearch", "examplefuncalling.WikipediaPageSummary"] prompts: - name: "system" role: "system" prompt: You are the best at knowing ... - name: "wikipedia_search" role: "user" tools: ["WikipediaPageSearch"] prompt: | First, let's have a look at wikipedia page about this movie. This is the text of the review: {{ data.text }}
Please first find some useful information online about this movie.
-
name: "wikipedia_summary"
role: "user"
tools: ["WikipediaPageSummary"]
prompt: |
Now, you have obtained following list of results for your search:
{{ wikipedia_search.out }}
Please obtain a summary of this movie.
...
```
Tests
In order to run tests:
First, follow the instructions in Prepare the repository section
Run tests (uses only 10 data entries for embedding and runs on 2 validation data entries):
sh make test
Tutorials
News headlines classification
This is a tutorial about how to use RAGthoven to classify News headlines into categories.
First, follow the instructions in Prepare the repository section. This takes care of installing the relevant dependencies, activating the virtual env, and preparing a
.envfile with the relevant env variables needed to interact with the LLM API of your choice.In order to use RAGthoven, you need to specify your RAG pipeline and any other parameters in a yarml configuration file.
For this tutorial, we are going to use an existing example yaml configuration file available here.
Let's explore some parts of the yaml file we will be using (open the file from the previous point to follow along if you like):
- Under the section
training_datawe are specifyingdataset: "fancyzhx/ag_news", which is a Hugging Face dataset containing news articles headlines and their respective labels. We will be using this dataset to populate our Vector database with training examples, which will be retrieved at inference time and included in the LLM prompt as examples for in-context learning. - Under the
resultssection we are specifyingoutput_filename: "example_ag_news_results". RAGthoven will use the stringexample_ag_news_resultsas the prefix to the output files it generates. - Under the
embedsection we are specifyingtraining_size_limit: 10, which means that we will insert only the 10 first examples from the dataset into our Vector databse, and they will be available for querying at inference time. - Under the
reranksection we are specifyingk: 3. This means that, at inference time, we will rerank the retrieved training examples from the Vector database and include the 3 best matches in our LLM prompt. - Under the section
llmwe are specifying various parameters related to the generation step of the RAG pipeline. For example, we can specify the system and user prompts that we are passing to the LLM api (spromptandupromptrespectively).
- Under the section
Run RAGthoven against the yaml file:
sh ragthoven config/example_ag_news.yamlNote that this step will take a lot of time to run, since it needs to process the whole validation dataset. Feel free to terminate the program prematurely after about 20 seconds to move on to the next step of this tutorial.
RAGthoven should have created 2 output files. The names of both those files have the prefix
example_ag_news_results. One of the files has the suffix.metadata.jsonand it is a metadata file that contains all the parameters used for that RAG pipeline run. The second file has the suffix.jsonland contains the output of the classification task for some of the validation dataset examples.
License
RAGthoven is licensed under the terms of the MIT license.
Paper & Citation
You can find more info on RAGthoven, as well as a few case studies, in the associated paper: https://aclanthology.org/2025.coling-demos.12/
If you found RAGthoven helpful in your work, please consider citing it:
@inproceedings{karetka-etal-2025-ragthoven,
title = "{RAG}thoven: A Configurable Toolkit for {RAG}-enabled {LLM} Experimentation",
author = "Karetka, Gregor and
Skottis, Demetris and
Dutkov{\'a}, Lucia and
Hra{\v{s}}ka, Peter and
Suppa, Marek",
editor = "Rambow, Owen and
Wanner, Leo and
Apidianaki, Marianna and
Al-Khalifa, Hend and
Eugenio, Barbara Di and
Schockaert, Steven and
Mather, Brodie and
Dras, Mark",
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations",
month = jan,
year = "2025",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.coling-demos.12/",
pages = "117--125"
}
Owner
- Name: ragthoven-dev
- Login: ragthoven-dev
- Kind: organization
- Repositories: 1
- Profile: https://github.com/ragthoven-dev
GitHub Events
Total
- Issues event: 2
- Watch event: 11
- Delete event: 2
- Issue comment event: 3
- Push event: 54
- Pull request review event: 41
- Pull request review comment event: 32
- Pull request event: 34
- Fork event: 1
- Create event: 19
Last Year
- Issues event: 2
- Watch event: 11
- Delete event: 2
- Issue comment event: 3
- Push event: 54
- Pull request review event: 41
- Pull request review comment event: 32
- Pull request event: 34
- Fork event: 1
- Create event: 19
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 19
- Average time to close issues: 15 days
- Average time to close pull requests: 4 days
- Total issue authors: 1
- Total pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.11
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 19
- Average time to close issues: 15 days
- Average time to close pull requests: 4 days
- Issue authors: 1
- Pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.11
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- giuliosmall (1)
Pull Request Authors
- gkaretka (13)
- trimitris (5)
- mrshu (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 21 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 3
- Total maintainers: 1
pypi.org: ragthoven
- Documentation: https://ragthoven.readthedocs.io/
- License: mit
-
Latest release: 0.2.3
published 10 months ago