trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
flashrank
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cross-encoders and more. Created by Prithivi Da, open for PRs & Collaborations.
txtai
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
llama_index
LlamaIndex is the leading framework for building LLM-powered agents over your data.
farm-haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
deepsearch-toolkit
Interact with the Deep Search platform for new knowledge explorations and discoveries
@llm-tools/embedjs
A NodeJS RAG framework to easily work with LLMs and embeddings
https://github.com/khoj-ai/khoj
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
ragoon
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
odin-slides
This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint slides using the Generative Pre-trained Transformer (GPT) of your choice. Leveraging the capabilities of Large Language Models (LLM), odin-slides enables you to turn the lengthiest Word documents into well organized presentations.
https://github.com/csinva/interpretable-embeddings
Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)
odinrunes
Odin Runes, a java-based GPT client, facilitates interaction with your preferred GPT model right through your favorite text editor. There is more: It also facilitates prompt-engineering by extracting context from diverse sources using technologies such as OCR, enhancing overall productivity and saving costs.
https://github.com/apache/hamilton
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
odin-tabs
The Odin Tabs extension is a browser extension that allows you to navigate through your browser tabs using speech recognition and the Large Language Model (LLM) of your choice.
https://github.com/superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
tax-retrieval-benchmark
An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.
https://github.com/epsilla-cloud/vectordb
Epsilla is a high performance Vector Database Management System
https://github.com/adithya-s-k/varag
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
https://github.com/adithya-s-k/rag-saas
⚡Ship RAG Solutions Quickly and effortlessly
https://github.com/daniel-furman/chat-all-in
RAG chatbot built on top of trending M&A news.
fed-rag
A framework for fine-tuning retrieval-augmented generation (RAG) systems.
https://github.com/deepset-ai/haystack-cookbook
👩🏻🍳 A collection of example notebooks using Haystack
https://github.com/deepset-ai/haystack-rag-app
An example of a RAG backend plus UI
quick-start-creating-a-vector-database-for-rag
A playground for vector database exploration using Chroma
rag-evaluation-harnesses
An evaluation suite for Retrieval-Augmented Generation (RAG).
https://github.com/amazon-science/memerag
MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
ragthoven
RAGthoven, a Retrieval Augmented Generation Toolkit that helps you easily set up and execute your RAG experiments
https://github.com/chirayu-tripathi/mongodb-querifier
Improving LLMs MongoDB query generation capability with the help of advanced retrieval augmented generation.
geospatial-rag
AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface
promptfoo
Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
rankify
🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.
rage
RagE (RAG Engine) - A tool supporting the construction and training of components of the Retrieval-Augmented-Generation (RAG) model. It also facilitates the rapid development of Q&A systems and chatbots following the RAG model.
janus-llm
Leveraging LLMs for modernization through intelligent chunking, iterative prompting and reflection, and retrieval augmented generation (RAG).
rag_fundamentals
Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking
https://github.com/amberlee2427/nancy-brain
Nancy's RAG backend and HTTP API/MCP server connectors.
rag-constitucion-chile
Platform to compare Chile's current constitution with its new proposed constitution using LLMs.
smyth-docs
Everything you need to build, deploy, and collaborate with agents. Ride the llama, avoid the drama.
https://github.com/svilupp/aihelpme.jl
Harnessing Julia's Rich Documentation for Tailored AI-Assisted Coding Guidance
wovensnips
WovenSnips: A Lightweight, Free, and Open-source Implementation of Retrieval-Augmented Generation (RAG) using Straico API