quick-start-creating-a-vector-database-for-rag
A playground for vector database exploration using Chroma
https://github.com/joshuapowell/quick-start-creating-a-vector-database-for-rag
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Keywords
Repository
A playground for vector database exploration using Chroma
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Quick Start for Creating a Vector Database for LLM with RAG
I'm working to learn more about how I can use large language models (LLM) with retrieval-augmented generation (RAG) to improve existing process and work-pipelines by offloading repetitive tasks. A part of expanding my knowledge around LLMs and RAG includes being able to generate a body of knowledge that is specific to the process or work-pipeline that I'm targeting at the time.
To help me focus on getting to a RAG and not be swallowed by the vast amount of information on the topic or the other options like knowledge graphs, I've decided to begin my exploration using Chroma (Chroma). Chroma is an open-source (Apache 2.0) vector database that, from what I can tell, provides a simple Python SDK and CLI that I can use to build out my knowledge base.
Prerequisities
- Python 3.13.2 or later
- ChromaDB 1.0.15 or later
- IPYKernel 6.29.5 or later
- Poetry
Using
Launch one of the notebooks in the notebooks directory in your preferred Jupyter Notebook environment.
- Explore how to use a vector database:
exploring-chromadb.ipynb - Explore how to ETL PDF data into a vector database:
exploring-data-extraction-from-pdf.ipynb
Development
Virtual Environment via pyenv
The project
Install the required version of Python for this project, currently at
>=3.13.Create a new virtual environment for this project using
pyenv
pyenv virtualenv <PYTHON_VERSION> audit_webpage_metadata
- Activate the virtual environment
pyenv activate audit_webpage_metadata
Install Dependencies via poetry
The project is managed using Poetry, a Python packaging and depdency manager. More information can be found on the official Poetry project website.
- Install the package with dependencies
poetry install --no-root
Disclaimer
The content, including but not limited to code, text, images, audio, and/or video, hereafter referred to as "content", in this document are provided for informational and educational purposes only. TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE AUTHOR PROVIDES THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event shall the author or their employer be liable for any claim, damages or other liability, direct or indirect, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the code and content or the use or other dealings in the code and content. Use this code and all other content at your own risk.
Third-party API Disclaimer: Additionally, the code examples in this post may interact with third-party APIs and services. The availability and functionality of these APIs are subject to change without notice. The author is not responsible for any issues arising from changes to these APIs or any downtime or limitations imposed by the service providers. You are responsible for complying with the terms of service and usage policies of any third-party APIs you use in conjunction with this code. Use this code at your own risk, and be aware of potential security implications when connecting to external services.
Product Link Disclaimer: This blog post may contain links to products or services available for purchase. These links are provided to offer readers additional information and resources. The author's opinions expressed in this post are independent and not influenced by any potential commercial relationships. No compensation is received for including these links, and their presence does not constitute an endorsement. Readers are encouraged to conduct their own research before making any purchasing decisions.
Copyright
Copyright © 2025 J.I. Powell. All rights reserved.
Owner
- Name: Joshua Powell
- Login: joshuapowell
- Kind: user
- Location: Pittsburgh, PA
- Company: @broadcom
- Website: https://www.joshuapowell.io/
- Twitter: joshuapowell_io
- Repositories: 3
- Profile: https://github.com/joshuapowell
Researcher and engineer with deep expertise developing data products
Citation (CITATION.cff)
cff-version: 1.2.0 message: "Powell, Joshua I. (2025, July 3). Exploring vector database usage as it applies to LLM and RAG. United States." authors: - family-names: "Powell" given-names: "Joshua" orcid: "https://orcid.org/0000-0002-0894-2399" title: "Exploring vector database usage as it applies to LLM and RAG" version: 1.0.0 doi: "00.0000/00000000.0000.0000000" date-released: 2025-07-03 url: "https://github.com/joshuapowell/quick-start-creating-a-vector-database-for-RAG"
GitHub Events
Total
- Push event: 9
Last Year
- Push event: 9
Dependencies
- chromadb (>=1.0.15,<2.0.0)
- ipykernel (>=6.29.5,<7.0.0)