rag_fundamentals

Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking

https://github.com/dcirne/rag_fundamentals

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary

Keywords

artificial-intelligence machine-learning rag semantic-chunking
Last synced: 6 months ago · JSON representation

Repository

Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking

Basic Info
  • Host: GitHub
  • Owner: dcirne
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 27.3 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
artificial-intelligence machine-learning rag semantic-chunking
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

RAG Fundamentals and Semantic Chunking

The material in this repository was initially prepared for a lecture I gave at The 2024 IARIA Annual Congress on Frontiers in Science, Technology, Services, and Applications, on the topics of Retrieval-Augmented Generation (RAG) and Semantic Chunking.

RAG is a technique used to optimize the output of a Large Language Model (LLM). The expectation is that In-Context Learning (ICL) takes place, leading the LLM to produce better results.

RAG can be more effective when semantic chunking is used. The basic idea is to retrieve and compile small "chunks" of data to augment the prompt to be sent to an LLM, rather than inserting entire documents that contain the topic of interest, but also information that is not relevant to the user interaction. For example, imagine a book about how to assemble a computer. It contains sections about CPUs, mother boards, displays, and so on. Now suppose you have a question on how to install a hard drive. Would you read the section about keyboards, or would you go straight to the hard drives one?

The same idea is applicable to LLMs. In addition, the context window is limited, so use the available space wisely.

One challenges that emerges from semantic chunking is determining the optimal chunk size. Too much or too little information would produce embeddings that would either try to encode too many meanings, or not be able to express enough meaning.

Context for RAG and semantic chunking comes from the paper "Fostering Trust and Quantifying Value of AI and ML," which I am an author and presented at the same conference. The pdf was converted to plain text using pdftotext.

.........

The number of tokens a model can receive as input. Its capacity influences how much information can be leveraged to run inferences.

Doing it yourself

This repository contains all the files you need to experiment with RAG and semantic chunking. Everything is siloed in a Docker image, this way you can run the code without messing up with any configuration on your computer.

There are a few prerequisites and assumptions:

  • You have either Docker or Podman configured, installed, and running
  • You have a developer account with OpenAI and an API Key for you project
    • Make sure you have access to the text-embedding-3-small and gpt-3.5-turbo models

Now that you're ready, follow these steps to access and run the code:

Build the Docker image

bash docker build -f Dockerfile -t rag_fundamentals --rm .

Export the OpenAI API Key to an environment variable

bash export OPENAI_API_KEY="your-openai-api-key"

Run the Docker image

The Docker container will use your local directory with this project and mount it in a shared volume inside of the container.

bash docker run --rm -p 8024:8024 -e OPENAI_API_KEY=$OPENAI_API_KEY -v $(pwd):/workspace rag_fundamentals

Access the Jupyter Notebook

The Jupyter Server requires an authentication token for access. Once the container is running, you will see a log message on the terminal similar to the one shown below. Copy it and paste it to a browser.

bash http://127.0.0.1:8024/tree?token=5f0ccbf63ee6dc8151240fae2828d94e3ebf21d892cd6822

After you are able to access the Jupyter Server, double click on rag_fundamentals.ipynb to launch it. Follow the documentation and instruction in the notebook.

Owner

  • Name: Dalmo Cirne
  • Login: dcirne
  • Kind: user
  • Location: Boulder, Colorado

Accomplished, results-driven professional with over ten years of experience leading engineering teams in building the most innovative products.

GitHub Events

Total
Last Year

Dependencies

Dockerfile docker
  • python 3.11-bookworm build
requirements.txt pypi
  • chromadb ==0.5.0
  • datasets ==2.19.1
  • jupyter ==1.0.0
  • openai ==1.30.5