ragthoven

RAGthoven, a Retrieval Augmented Generation Toolkit that helps you easily set up and execute your RAG experiments

https://github.com/ragthoven-dev/ragthoven

Keywords

experiment llm rag

Last synced: 6 months ago · JSON representation

Repository

RAGthoven, a Retrieval Augmented Generation Toolkit that helps you easily set up and execute your RAG experiments

Basic Info

Host: GitHub
Owner: ragthoven-dev
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 3.7 MB

Statistics

Stars: 14
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Topics

experiment llm rag

Created over 1 year ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

README.md

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

![Supported Python Versions](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue) [![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/ragthoven-dev/ragthoven?branch=master&label=Code%20Size&logo=GitHub&logoColor=ffffff&labelColor=282828&style=flat)]() [![GitHub repo size](https://img.shields.io/github/repo-size/ragthoven-dev/ragthoven?branch=master&label=Repo%20Size&logo=GitHub&logoColor=ffffff&labelColor=282828&style=flat)]() [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](https://makeapullrequest.com)

RAGthoven

_**[`RAGthoven`](https://github.com/ragthoven-dev/ragthoven) is to Retrieval Augmented Generation (RAG) what [`axolotl`](https://github.com/axolotl-ai-cloud/axolotl) is to model finetuning.**_

Features:

Run end-to-end retrieval-augmented generation (RAG) experiments with minimal setup.
Configure indexing, retrieval, re-ranking, and generation using YAML or CLI.
Processes the validation dataset in parallel, for lightning fast experiments.
Supports multiple retrievers and rerankers for scalable experimentation.
Use matrixable options to define multiple pipeline configurations for batch evaluation.
Automate benchmarking for consistent and reproducible RAG-based evaluations.
Adapt and extend configurations for different NLP tasks.
Integrate with popular LLMs for generation and in-context learning.
Log results for experiment tracking and evaluation.
...and more!

Demo

https://github.com/user-attachments/assets/1bd037a7-0af5-4f93-9347-64143d74e5c6

How to get started

Prepare the repository

Install poetry (package manager for Python), following the official instructions
Install project dependencies:

sh poetry install
Activate the virtual env:

sh poetry shell
Create a .env file

sh cp .env.sample .env
Set up the LLM by setting the correct environment variables. For instance, you can set your OpenAI API token to OPENAI_API_KEY in your .env file (see .env.sample) or use ollama by providing OLLAMA_API_BASE.

Run the project

First, follow the instructions in Prepare the repository section
Run the project

sh ragthoven <config file path> Sample config files can be found in the config/ directory.

For example, suppose we want to evaluate the example config config/example_ag_news.yaml whose contents are listed below:

```yaml name: "AG news" trainingdata: dataset: "fancyzhx/agnews" inputfeature: "text" labelfeature: "label" splitname: "train" textuallabels: ["World", "Sports", "Business", "Sci/Tech"] validationdata: inputfeature: "text" splitname: "test" results: outputfilename: "exampleagnews_results" embed: k: 10 rerank: k: 3 llm: model: "azure/gpt-4o" sprompt: | You are a journalist. You would like to categorise each of the articles by it's headline. There are these 4 categories you can choose from: - World - news about world, general news - Sports - news about sports and related stuff - Business - news from business world - Sci/Tech - news about new science and technology As a busy journalist please answer with single world:
```
Here are some of the headlines you labeled:
{{ examples }}
```
uprompt: | Please determine the category of the article based of this heading: {{ text }} ```

As the configuration suggests, it is a text classification task, in which the article headlines are classified into four classes. In this particular case, we utilize both an embedding as well as the re-ranking mechanism to aid the LLM (in this case GPT-4o accessed via the Azure OpenAI Endpoints) with the classification.

To obtain the actual results for the test split, you can run the following:

sh ragthoven config/example_ag_news.yaml
Optional: Select output format

You can also select the output format with the CLI option --output. See example below:

sh ragthoven <config file path> --output csv

The default output format is .jsonl, if the output format is not specified.

You can always use the --help option to know which output formats are supported.

YAML configuration

To configure RAGthoven to run experiment on a dataset it needs a valid configuration file. The structure of configuration file is logicaly split into sections and it is described bellow. The configuration file described below can bee found in config/comp_case2024-climate_matrix_multiprompt_custom_examples.yaml.

Config file sections

Every RAGthoven run requires a name, making it a required argument. yaml name: "Shared task on Climate Activism Stance and Hate Event Detection at CASE 2024"
Training dataset is used to provide examples to the LLM. We need to specify the dataset source, textual intput, labels and split.
- dataset - This can be any huggingface dataset (eg. nyu-mll/glue), a csv file (as described bellow) or a json file. These can be specified by prefixing the path with csv: or json: respectively.
- input_feature - which feature to use for embedding/reranking
- output_feature (or label_feature) - which feature to use as input_feature's classification label/regression score/summarization summary/translation target.
- split_name - useful especially with HuggingFace dataset, leave as train when loading json or csv
- textual_labels - some labels (especially in classification task) may come as indexes (0, 1, 2 ...). In order for LLM to make sense out of these labels, RAGthoven has the ability to translate indexes into textual labels. In the examples bellow this would mean (0 => "No hatespeech", 1 => "Hate speech"). Provide an empty array to pass the label directly.
- dataset_version - (Optional) some dataset have multiple versions. Use this parameter to select the desired version.

A full example:

yaml training_data: dataset: "csv:./data/SubTask-A-train.csv" input_feature: "tweet" output_feature: "label" split_name: "train" textual_labels: ["No hatespeech", "Hate speech"] dataset_version: "001"

Validation dataset is the primary source of input RAGthoven uses during its execution. This is the text that is used as query for retrieval/reranking and which is then judged by LLM.
- dataset - (Optional) the same applies as for training_data.dataset. When not provided training_data.dataset is used instead.
- input_feature - what feature is used for retrieval/reranking. This feature can be passed into prompts as {{ text }}
- split_name - the same applies as for training_data.split_name Here is a full example:

yaml validation_data: dataset: "csv:./data/SubTask-A-test.csv" input_feature: "tweet" split_name: "train"

Output specific configurations
- output_cached - (Optional) when running validation on huge datasets (eg. 30k validation data) in case of a crash (e.g. due to no API credits available ...) only partial data is written into the results file by default. When running RAGthoven once again, the results file is recreated and all existing results data is lost. In order to prevent this, it is possible to set output_cached to true. When running again RAGthoven loads results file and performs inference only on ids that are not yet present in the results file.
- output_cache_id - (Optional) specify what is the name of the feature to be used as id in the output
- output_filename - (Optional) specify the output file name (whithout extension)
- bad_request_default_value - (Optional) what is the desired output of the RAGthoven when the request ends in error (e.g. because the LLM API refuses to respond...) A full example:

yaml results: output_cached: true output_cache_id: "index" output_filename: "results" bad_request_default_value: -1

Retrieval is a crucial part of RAG. It is configured in the next section. When retrieval is not used, you may omit this configuration section entirely (see ragthoven/config/comp_case2024-climate_matrix_no_retrieval.yaml)
- k - number of relevant training samples to be retrieved for every query. Matrixable parameter, provide an array of values (eg. [20, 10, 5]) to run in matrix mode.
- training_size_limit - (Optional) when doing testing, it is best to not start by embedding all training samples, as this can be time consuming. Setting this option reduces training dataset to more manageable number.
- model - (Optional) choose any HuggingFace-provided model for retrieval. Basic retrieval is done using [Chroma](https://github.com/chroma-core/chroma. Matrixable, specify models in an array. A full example can be seen below:

yaml embed: k: 20 training_size_limit: 20 model: "sentence-transformers/all-MiniLM-L6-v2" embedder: "sbert" device: "cpu" docs_embedding_count: 512

In order to obtain high performance on the retrieved results, it is often a good idea to rerank them and select only the most relevant ones. This is done in cross encoder fashion. When reranking is not used, omit this configuration entry completely (see ragthoven/config/comp_case2024-climate_matrix_no_rerank.yaml)
- k - select only top-k reranked from retrieved samples. Matrixable, specify ks in an array.
- training_size_limit - limits number of documents used for retrieval. First number of documents is used.
- model - (Optional) choose model for reranking. Basic reranker utilizes FlashRank, which you can consult for more details on available models. Matrixable, specify models in an array. A full example can be seen below:
- embedder - currently there are two embedding methods available, sbert which utilizes sentence-transformers models supported by Chromadb and cde which can use Contextual Document Embeddings (CDE) models, eg. jxm/cde-small-v1. In both cases chromadb is used as vector database.
- device - when utilizing sbert as embedder, this option allows for setting the device used in the embedding computation (eg. cuda, defaults to cpu)
- docs_embedding_count - used only in cde mode. Number of documents to use for creating dataset embeddings. Randomly sampled. See https://huggingface.co/jxm/cde-small-v1 for more information. When docs_embedding_count > training_size_limit then docs_embedding_count = training_size_limit.

yaml rerank: k: [5, 1] model: "ms-marco-MiniLM-L-12-v2"

At the heart of RAGthoven, there is always a call to an LLM API, be it local or commercial.
- model - which model to use. Matrixable, provide an array of models.
- temperature - (Optional) - temperature for the models decoder. Defaults to 0. Matrixable, provide array with temperature values.
- examples - (Optional) examples can be ordered, arranged and arbitrary textual information can be added. The text and label are as proved in configuration for the training dataset. Other than that, a variable examples[i].data.feature, where feature is a feature present in the original dataset allows for access of any feature for the given example.

A full example can be seen below:

yaml llm: model: "gpt-4o" temperature: 0 examples: | This is example number one: {{ examples[0].text }} and it's corresponding label: {{ examples[0].label }} ... This is example number two: {{ examples[10].text }} and it's corresponding label: {{ examples[10].label }} If you want more examples than there are retrieved/reranker, it will fail and throw an exception! ...

Special features

Matrix configuration

Some parameters are Matrixable - support matrix configuration. The matrix configuration runs for every configuration.

eg. when two variables are set k: [20, 10] and n: [5, 1] the there are 4 combinations evaluated: 1. k: 20, n: 5 2. k: 20, n: 1 3. k: 10, n: 5 4. k: 10, n: 1

adding another variable with two options to the matrix would result in 8 combinations.

Multiple prompts

Prompts can be specified in two ways: - Multiprompt version: specify prompts in array named prompts with following structure: - name - name of the prompt, this is used to later access output of specific prompt as input in another. There is one special prompt added into every call called system with role system. - role - select system or user role (or any other role supported by the model). - prompt - jinja2 formatable prompt. Each prompt is provided with examples, text variables, output of previous propts accessible by {{ previous_prompt_name.out }} and {{ data.<feature> }} where <feature> is a feature that is available in validation dataset. - Single prompt version (see ragthoven/config/comp_case2024-climate_matrix_with_custom_examples.yaml) - sprompt - single system prompt. Matrixable, provide an array of prompts. - uprompt - single user prompt. Matrixable, provide an array of prompts. both prompts can make use of {{ examples }}, {{ text }} and {{ data.<feature> }} where <feature> is a feature that is available in validation dataset.

An example of the multiprompt version of the configuration. ```yaml ... prompts: - name: "system" role: "system" prompt: | Analyze the input tweet to determine if it is hate speech or not, based on the following criteria:

      1. Some instructions here
      ...
      4. Some instructions there

      ## Examples

      {{ examples }}
  -
    name: "select_key_phrases"
    role: "user"
    prompt: |
      Try to reason and think of the words and phrases that best describe the mood/behaviour of the person writing
      that text. Try to list at least 5 - 6 words/phrases.
      Text: {{ text }}
  -
    name: "result"
    role: "user"
    prompt: |
      Please determine the category of the text given the text and words/phrases describing it's sentiment.
      Use "1" for hatespeech tweets and "0" for all other:
      Text: {{ text }}
      Sentiment keywords: {{ select_key_phrases.out }}
      ANSER ONLY WITH A SINGLE NUMBER!

```

An example of the single prompt version of the configuration.

```yaml ... sprompt: | Analyze the input tweet to determine if it is hate speech or not, based on the following criteria:

  1. Some instructions here
  ...
  4. Some instructions there

  ## Examples

  {{ examples }}
uprompt: |
  Please determine the category of the text use "1" for hatespeech tweets and "0" for all other:
  {{ text }}
  ANSER ONLY WITH SINGLE NUMBER!

```

Preprocessing

RAGthoven provides a way to run custom python code on every example in the validation dataset. Please refer to code ragthoven/tools/example_tool.py on how to write a tool for RAGthoven preprocessing. In order to specify which tools to run, specify them in yaml config as follows:

(Note that the tools are run sequentially, in the order of appearance in the entries key of the YAML config.)

yaml preprocessor: entries: ["example_tool.fizzbuzz", "example_tool.count_ands"]

Function calling

RAGthoven provides a way to use function calling. LLMs can be provided with tools (e.g. functions to call) which can help them fetch fresh data from various sources or take actions (e.g. call an API, send an email etc). The LLM can then decide on its own whether it needs to call any of the provided functions and what arguments to pass to those functions. You can read more about function calling here.

Note that the LLM does not actually perform the function call - it merely notes in its response that it wants the function to be executed with specific arguments, and then RAGthoven executes the function. Finally, the results of the function call can be used in the 2 following ways: 1. If you set llm.messages: true in the config, then the results of the function call will be forwarded to the LLM in any api call to the LLM for any future prompts (for the current validation example). The messages are passed along as part of the array of messages which contain the whole conversation, that the LLM API accepts. 2. If you set llm.messages: false in the config, then no messages array is passed to the LLM and the results of the function call can be used manually in a subsequent prompt to the LLM. You need to specify in the yaml config where they will be used inside a prompt.

Please refer to code ragthoven/tools/example_fun_calling.py on how to write functions for function calling. Also, in order to use function calling, please use multiprompt in your yaml config (example: config/single-shot-example-function-calling.yaml). In order to use this example please install wikipedia package:

```yaml llm: ... tools: ["examplefuncalling.WikipediaPageSearch", "examplefuncalling.WikipediaPageSummary"] prompts: - name: "system" role: "system" prompt: You are the best at knowing ... - name: "wikipedia_search" role: "user" tools: ["WikipediaPageSearch"] prompt: | First, let's have a look at wikipedia page about this movie. This is the text of the review: {{ data.text }}

    Please first find some useful information online about this movie.
-
  name: "wikipedia_summary"
  role: "user"
  tools: ["WikipediaPageSummary"]
  prompt: |
    Now, you have obtained following list of results for your search:
    {{ wikipedia_search.out }}
    Please obtain a summary of this movie.
...

```

Tests

In order to run tests:

First, follow the instructions in Prepare the repository section
Run tests (uses only 10 data entries for embedding and runs on 2 validation data entries):

sh make test

Tutorials

News headlines classification

This is a tutorial about how to use RAGthoven to classify News headlines into categories.

First, follow the instructions in Prepare the repository section. This takes care of installing the relevant dependencies, activating the virtual env, and preparing a .env file with the relevant env variables needed to interact with the LLM API of your choice.
In order to use RAGthoven, you need to specify your RAG pipeline and any other parameters in a yarml configuration file.

For this tutorial, we are going to use an existing example yaml configuration file available here.
Let's explore some parts of the yaml file we will be using (open the file from the previous point to follow along if you like):
1. Under the section training_data we are specifying dataset: "fancyzhx/ag_news", which is a Hugging Face dataset containing news articles headlines and their respective labels. We will be using this dataset to populate our Vector database with training examples, which will be retrieved at inference time and included in the LLM prompt as examples for in-context learning.
2. Under the results section we are specifying output_filename: "example_ag_news_results". RAGthoven will use the string example_ag_news_results as the prefix to the output files it generates.
3. Under the embed section we are specifying training_size_limit: 10, which means that we will insert only the 10 first examples from the dataset into our Vector databse, and they will be available for querying at inference time.
4. Under the rerank section we are specifying k: 3. This means that, at inference time, we will rerank the retrieved training examples from the Vector database and include the 3 best matches in our LLM prompt.
5. Under the section llm we are specifying various parameters related to the generation step of the RAG pipeline. For example, we can specify the system and user prompts that we are passing to the LLM api (sprompt and uprompt respectively).
Run RAGthoven against the yaml file:

sh ragthoven config/example_ag_news.yaml

Note that this step will take a lot of time to run, since it needs to process the whole validation dataset. Feel free to terminate the program prematurely after about 20 seconds to move on to the next step of this tutorial.
RAGthoven should have created 2 output files. The names of both those files have the prefix example_ag_news_results. One of the files has the suffix .metadata.json and it is a metadata file that contains all the parameters used for that RAG pipeline run. The second file has the suffix .jsonl and contains the output of the classification task for some of the validation dataset examples.

License

RAGthoven is licensed under the terms of the MIT license.

Paper & Citation

You can find more info on RAGthoven, as well as a few case studies, in the associated paper: https://aclanthology.org/2025.coling-demos.12/

If you found RAGthoven helpful in your work, please consider citing it:

@inproceedings{karetka-etal-2025-ragthoven, title = "{RAG}thoven: A Configurable Toolkit for {RAG}-enabled {LLM} Experimentation", author = "Karetka, Gregor and Skottis, Demetris and Dutkov{\'a}, Lucia and Hra{\v{s}}ka, Peter and Suppa, Marek", editor = "Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven and Mather, Brodie and Dras, Mark", booktitle = "Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations", month = jan, year = "2025", address = "Abu Dhabi, UAE", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.coling-demos.12/", pages = "117--125" }

Owner

Name: ragthoven-dev
Login: ragthoven-dev
Kind: organization

Repositories: 1
Profile: https://github.com/ragthoven-dev

GitHub Events

Total

Issues event: 2
Watch event: 11
Delete event: 2
Issue comment event: 3
Push event: 54
Pull request review event: 41
Pull request review comment event: 32
Pull request event: 34
Fork event: 1
Create event: 19

Last Year

Issues event: 2
Watch event: 11
Delete event: 2
Issue comment event: 3
Push event: 54
Pull request review event: 41
Pull request review comment event: 32
Pull request event: 34
Fork event: 1
Create event: 19

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 19
Average time to close issues: 15 days
Average time to close pull requests: 4 days
Total issue authors: 1
Total pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.11
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 19
Average time to close issues: 15 days
Average time to close pull requests: 4 days
Issue authors: 1
Pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.11
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

giuliosmall (1)

Pull Request Authors

gkaretka (13)
trimitris (5)
mrshu (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 21 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 3
Total maintainers: 1

pypi.org: ragthoven

Documentation: https://ragthoven.readthedocs.io/
License: mit
Latest release: 0.2.3
published 10 months ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 21 Last month

Rankings

Dependent packages count: 9.3%

Average: 30.9%

Dependent repos count: 52.5%

Maintainers (1)

gkaretka