quick-start-creating-a-vector-database-for-rag

A playground for vector database exploration using Chroma

https://github.com/joshuapowell/quick-start-creating-a-vector-database-for-rag

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary

Keywords

llm rag vector-database

Last synced: 6 months ago · JSON representation ·

Repository

A playground for vector database exploration using Chroma

Basic Info

Host: GitHub
Owner: joshuapowell
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 43.9 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Topics

llm rag vector-database

Created 8 months ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

Quick Start for Creating a Vector Database for LLM with RAG

I'm working to learn more about how I can use large language models (LLM) with retrieval-augmented generation (RAG) to improve existing process and work-pipelines by offloading repetitive tasks. A part of expanding my knowledge around LLMs and RAG includes being able to generate a body of knowledge that is specific to the process or work-pipeline that I'm targeting at the time.

To help me focus on getting to a RAG and not be swallowed by the vast amount of information on the topic or the other options like knowledge graphs, I've decided to begin my exploration using Chroma (Chroma). Chroma is an open-source (Apache 2.0) vector database that, from what I can tell, provides a simple Python SDK and CLI that I can use to build out my knowledge base.

Prerequisities

Python 3.13.2 or later
ChromaDB 1.0.15 or later
IPYKernel 6.29.5 or later
Poetry

Using

Launch one of the notebooks in the notebooks directory in your preferred Jupyter Notebook environment.

Explore how to use a vector database: exploring-chromadb.ipynb
Explore how to ETL PDF data into a vector database: exploring-data-extraction-from-pdf.ipynb

Development

Virtual Environment via `pyenv`

The project

Install the required version of Python for this project, currently at >=3.13.
Create a new virtual environment for this project using pyenv

pyenv virtualenv <PYTHON_VERSION> audit_webpage_metadata

Activate the virtual environment

pyenv activate audit_webpage_metadata

Install Dependencies via `poetry`

The project is managed using Poetry, a Python packaging and depdency manager. More information can be found on the official Poetry project website.

Install the package with dependencies

poetry install --no-root

Disclaimer

The content, including but not limited to code, text, images, audio, and/or video, hereafter referred to as "content", in this document are provided for informational and educational purposes only. TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE AUTHOR PROVIDES THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event shall the author or their employer be liable for any claim, damages or other liability, direct or indirect, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the code and content or the use or other dealings in the code and content. Use this code and all other content at your own risk.

Third-party API Disclaimer: Additionally, the code examples in this post may interact with third-party APIs and services. The availability and functionality of these APIs are subject to change without notice. The author is not responsible for any issues arising from changes to these APIs or any downtime or limitations imposed by the service providers. You are responsible for complying with the terms of service and usage policies of any third-party APIs you use in conjunction with this code. Use this code at your own risk, and be aware of potential security implications when connecting to external services.

Product Link Disclaimer: This blog post may contain links to products or services available for purchase. These links are provided to offer readers additional information and resources. The author's opinions expressed in this post are independent and not influenced by any potential commercial relationships. No compensation is received for including these links, and their presence does not constitute an endorsement. Readers are encouraged to conduct their own research before making any purchasing decisions.

Copyright

Owner

Name: Joshua Powell
Login: joshuapowell
Kind: user
Location: Pittsburgh, PA
Company: @broadcom

Website: https://www.joshuapowell.io/
Twitter: joshuapowell_io
Repositories: 3
Profile: https://github.com/joshuapowell

Researcher and engineer with deep expertise developing data products

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Powell, Joshua I. (2025, July 3). Exploring vector database usage as it applies to LLM and RAG. United States."
authors:
- family-names: "Powell"
  given-names: "Joshua"
  orcid: "https://orcid.org/0000-0002-0894-2399"
title: "Exploring vector database usage as it applies to LLM and RAG"
version: 1.0.0
doi: "00.0000/00000000.0000.0000000"
date-released: 2025-07-03
url: "https://github.com/joshuapowell/quick-start-creating-a-vector-database-for-RAG"

GitHub Events

Total

Push event: 9

Last Year

Push event: 9

Dependencies

pyproject.toml pypi

chromadb (>=1.0.15,<2.0.0)
ipykernel (>=6.29.5,<7.0.0)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

quick-start-creating-a-vector-database-for-rag

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Quick Start for Creating a Vector Database for LLM with RAG

Prerequisities

Using

Development

Virtual Environment via `pyenv`

Install Dependencies via `poetry`

Disclaimer

Copyright

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies

quick-start-creating-a-vector-database-for-rag

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Quick Start for Creating a Vector Database for LLM with RAG

Prerequisities

Using

Development

Virtual Environment via pyenv

Install Dependencies via poetry

Disclaimer

Copyright

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies

Virtual Environment via `pyenv`

Install Dependencies via `poetry`