Projects | Open Source Science

Updated 10 months ago

trafilatura • Rank 26.3 • Science 77%

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

article-extractor corpus-builder corpus-tools crawler html-to-markdown html2text llm news-aggregator news-crawler nlp rag readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping

Updated 10 months ago

paper-qa • Rank 22.0 • Science 77%

High accuracy RAG for answering questions from scientific documents with citations

ai rag science search

Updated 10 months ago

vidore-benchmark • Rank 14.9 • Science 77%

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

colpali rag retrieval search vision-language-model

Updated 10 months ago

flashrank • Rank 20.1 • Science 67%

Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cross-encoders and more. Created by Prithivi Da, open for PRs & Collaborations.

cross-encoder full-text-search hybrid-search lexical-search rag ranking reranking retrieval-augmented-generation semantic-search vector-database vector-search

Updated 10 months ago

txtai • Rank 22.2 • Science 64%

💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

ai artificial-intelligence embeddings information-retrieval language-model large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search search-engine semantic-search sentence-embeddings transformers txtai vector-database vector-search

Updated 10 months ago

llama_index • Rank 33.9 • Science 49%

LlamaIndex is the leading framework for building LLM-powered agents over your data.

agents application data fine-tuning framework llamaindex llm multi-agents rag vector-database

Engineering (40%)

Updated 10 months ago

farm-haystack • Rank 28.7 • Science 54%

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

agent agents ai gemini generative-ai gpt-4 information-retrieval large-language-models llm machine-learning nlp orchestration python pytorch question-answering rag retrieval-augmented-generation semantic-search summarization transformers

Updated 10 months ago

flexrag • Rank 12.8 • Science 67%

FlexRAG: A RAG Framework for Information Retrieval and Generation.

llms nlp rag

Updated 10 months ago

rdocdump • Rank 8.1 • Science 67%

rdocdump: Dump ‘R’ Package Source, Documentation, and Vignettes into One File

llm r-package rag text

Updated 10 months ago

deepsearch-toolkit • Rank 16.5 • Science 54%

Interact with the Deep Search platform for new knowledge explorations and discoveries

accelerated-discovery deepsearch knowledge-extraction knowledge-graph nlp pdf-converter python rag semantic-retrieval

Updated 10 months ago

beyondllm • Rank 12.5 • Science 54%

Build, evaluate and observe LLM apps

ai artificial-intelligence embeddings evaluate-llm genai generative-ai hacktoberfest hacktoberfest-accepted hacktoberfest2024 large-language-models llm llms rag

Updated 10 months ago

@llm-tools/embedjs • Rank 17.7 • Science 44%

A NodeJS RAG framework to easily work with LLMs and embeddings

ai chatgpt claude cohere embedding embeddings gpt gpt-4 gpt-4o huggingface large-language-models llm mistral ollama openai pinecone rag vector-database vertex-ai

Updated 9 months ago

https://github.com/khoj-ai/khoj • Rank 24.1 • Science 36%

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

agent ai assistant chat chatgpt emacs image-generation llama3 llamacpp llm obsidian obsidian-md offline-llm productivity rag research self-hosted semantic-search stt whatsapp-ai

Updated 9 months ago

vearch • Rank 18.2 • Science 36%

Distributed vector search for AI-native applications

ai-native ai-native-database cloud-native document-retrieval embeddings hybrid-search rag retrieval-augmented-generation vector-database vector-search vectors

Updated 10 months ago

ragoon • Rank 9.3 • Science 44%

High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡

ai embeddings embeddings-similarity faiss generative-ai groq groqapi llama llama-index llm nlp rag retrieval-augmented-generation vector-database vector-search vectorization

Updated 10 months ago

odin-slides • Rank 7.8 • Science 44%

This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint slides using the Generative Pre-trained Transformer (GPT) of your choice. Leveraging the capabilities of Large Language Models (LLM), odin-slides enables you to turn the lengthiest Word documents into well organized presentations.

ai-assistant chatgpt-api generative-ai hacktoberfest large-language-models machine-learning natural-language-processing openai-api portfolio powerpoint powerpoint-automation pptx presentation-tools productivity-tool productivity-tools prompt-engineering rag slide-generator slides writing-tool

Updated 9 months ago

https://github.com/csinva/interpretable-embeddings • Rank 4.7 • Science 46%

Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)

ai artificial-intelligence embeddings encoding-models explainability fmri huggingface language-model llm neural-network neuroscience rag retrieval-augmented-generation transformer xai

Updated 9 months ago

porag • Rank 4.9 • Science 44%

Fully Configurable RAG Pipeline for Bengali Language RAG Applications. Supports both Local and Huggingface Models, Built with Langchain.

ai bengali bengali-nlp chromadb langchain llama3 llm nlp rag transformers

Updated 10 months ago

odinrunes • Rank 4.5 • Science 44%

Odin Runes, a java-based GPT client, facilitates interaction with your preferred GPT model right through your favorite text editor. There is more: It also facilitates prompt-engineering by extracting context from diverse sources using technologies such as OCR, enhancing overall productivity and saving costs.

ai-assistant chatbot-ui chatgpt-app code-assistant custom-gpt gpt-client hacktoberfest llama3 natural-language-processing ollama ollama-client ollama-gui ollama-interface ollama-ui productivity productivity-tool prompt-engineering prompt-toolkit rag writing-tool

Updated 9 months ago

https://github.com/apache/hamilton • Rank 12.1 • Science 36%

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering

Updated 10 months ago

odin-tabs • Rank 3.0 • Science 44%

The Odin Tabs extension is a browser extension that allows you to navigate through your browser tabs using speech recognition and the Large Language Model (LLM) of your choice.

accessibility artificial-intelligence assistive-technology chatgpt-api chrome-extension interaction-design large-language-models machine-learning natural-language-processing openai-api portfolio productivity-tools rag speech-to-text tab-management tab-navigation ui user-interface web-accessibility web-automation

Updated 9 months ago

https://github.com/superduper-io/superduper • Rank 20.6 • Science 26%

Superduper: End-to-end framework for building custom AI applications and agents.

ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search

Updated 10 months ago

tax-retrieval-benchmark • Rank 0.7 • Science 44%

An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.

benchmark droit embeddings fiscal fiscalite information-retrieval mteb rag retrieval retrieval-augmented-generation sbert semantic-search sentence-embeddings sentence-transformers stp tax taxation

Updated 9 months ago

https://github.com/epsilla-cloud/vectordb • Rank 16.8 • Science 26%

Epsilla is a high performance Vector Database Management System

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Updated 10 months ago

ragxplorer • Rank 13.3 • Science 26%

Open-source tool to visualise your RAG 🔮

interactive llm python rag streamlit visualization

Updated 9 months ago

https://github.com/casperdcl/brace • Rank 1.1 • Science 36%

:book: BRACE — Bible retrieval-augmented (Catholic edition)

ai bible llm pages rag study

Updated 9 months ago

https://github.com/adithya-s-k/varag • Rank 7.6 • Science 23%

Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine

colpali multimodal-retrieval rag

Updated 9 months ago

https://github.com/adithya-s-k/rag-saas • Rank 4.9 • Science 13%

⚡Ship RAG Solutions Quickly and effortlessly

ai-saas arize-phoenix llamaindex mongodb qdrant rag saas saas-boilerplate

Updated 9 months ago

https://github.com/daniel-furman/chat-all-in • Rank 1.4 • Science 13%

RAG chatbot built on top of trending M&A news.

chatbot in-context-learning openai rag retrieval-augmented-generation

Updated 8 months ago

https://github.com/deepset-ai/haystack-rag-app • Science 26%

An example of a RAG backend plus UI

generative-ai haystack-ai llm python rag

Updated 10 months ago

scrapegraph-ai • Science 54%

Python scraper based on AI

ai ai-scraping automated-scraper crawler html-to-markdown llm markdown rag scraping scraping-python web-crawler web-crawlers web-scraping

Updated 10 months ago

ColBERT • Science 41%

Efficient late-interaction retrieval systems in Julia!

colbert gen-ai information-retrieval llms machine-learning rag

Updated 10 months ago

geospatial-rag • Science 26%

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

academic-research clip computer-vision earth-observation embeddings geospatial langchain machine-learning multimodal-ai pytorch rag remote-sensing

Updated 10 months ago

parteipapagei • Science 44%

bundestagswahl elections llm rag streamlit

Updated 10 months ago

quick-start-creating-a-vector-database-for-rag • Science 44%

A playground for vector database exploration using Chroma

llm rag vector-database

Updated 10 months ago

wovensnips • Science 44%

WovenSnips: A Lightweight, Free, and Open-source Implementation of Retrieval-Augmented Generation (RAG) using Straico API

api corpus csv markdown pdf rag retrieval-augmented-generation straico txt

Updated 10 months ago

rage • Science 26%

RagE (RAG Engine) - A tool supporting the construction and training of components of the Retrieval-Augmented-Generation (RAG) model. It also facilitates the rapid development of Q&A systems and chatbots following the RAG model.

chatbot-framework embedding llm nlp qa-system rag ranker retrieval-augmented-generation vietnamese-nlp

Updated 10 months ago

ragthoven • Science 26%

RAGthoven, a Retrieval Augmented Generation Toolkit that helps you easily set up and execute your RAG experiments

experiment llm rag

Updated 10 months ago

sre • Science 44%

The Operating System for Agents

agent-framework agents agi ai ai-agents artificial-intelligence autogpt autonomous-agents chatgpt langchain llm llmops mcp multi-agent multi-agent-systems n8n openai orchestration rag retrieval-augmented-generation

Updated 9 months ago

https://github.com/amazon-science/memerag • Science 36%

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

benchmark evaluation rag

Updated 9 months ago

https://github.com/akaiko1/langchain_examples • Science 26%

Practical, minimal examples for building with LangChain

langchain llm llms ollama rag

Updated 9 months ago

https://github.com/chirayu-tripathi/mongodb-querifier • Science 13%

Improving LLMs MongoDB query generation capability with the help of advanced retrieval augmented generation.

database llms mongodb rag retrieval retrieval-augmented-generation

Updated 10 months ago

janus-llm • Science 44%

Leveraging LLMs for modernization through intelligent chunking, iterative prompting and reflection, and retrieval augmented generation (RAG).

chroma chromadb cli langchain llm modernization python rag tree-sitter

Updated 10 months ago

fed-rag • Science 75%

A framework for fine-tuning retrieval-augmented generation (RAG) systems.

deep-learning federated-learning llms machine-learning rag

Updated 8 months ago

https://github.com/deepset-ai/haystack-cookbook • Science 26%

👩🏻‍🍳 A collection of example notebooks using Haystack

agentic agentic-ai agents ai ai-tools genai genai-usecases haystack-ai python rag

Updated 9 months ago

promptfoo • Science 26%

Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework llmops pentesting prompt-engineering prompt-testing prompts rag red-teaming testing vulnerability-scanners

Updated 9 months ago

https://github.com/buaadreamer/easyrag • Science 10%

Easy-to-Use RAG Framework; CCF AIOps International Challenge 2024 Top3 Solution; CCF AIOps 国际挑战赛 2024 季军方案

aiops bge bm25 ccf glm4 gte hyde llm minicpm network-operations qa rag retrieval retrieval-augmented-generation rrf

Updated 9 months ago

https://github.com/cuc-zihang-liu/text-based-rag-framework • Science 26%

基于文本的RAG框架（多种编码器组合）

chatglm embeddings rag text-to-speech

Updated 9 months ago

https://github.com/svilupp/aihelpme.jl • Science 13%

Harnessing Julia's Rich Documentation for Tailored AI-Assisted Coding Guidance

generative-ai julia rag

Updated 10 months ago

rag_fundamentals • Science 36%

Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking

artificial-intelligence machine-learning rag semantic-chunking

Updated 10 months ago

rag-evaluation-harnesses • Science 54%

An evaluation suite for Retrieval-Augmented Generation (RAG).

evaluation lm-evaluation rag retrieval-augmented-generation

Updated 9 months ago

https://github.com/amberlee2427/nancy-brain • Science 36%

Nancy's RAG backend and HTTP API/MCP server connectors.

embeddings http mcp mcp-server python rag rag-chatbot sql

Updated 10 months ago

smyth-docs • Science 26%

Everything you need to build, deploy, and collaborate with agents. Ride the llama, avoid the drama.

a2a agent-builder agent-building agent-framework agentic-ai ai ai-agents llmops mcp rag smyth smythos sre vibe-coding

Updated 9 months ago

https://github.com/csiro/stdm • Science 26%

Self Thinking Data Manifest

ai chatgpt claude llm prompt-engineering rag science-communication

Updated 10 months ago

rag-constitucion-chile • Science 31%

Platform to compare Chile's current constitution with its new proposed constitution using LLMs.

llm openai rag

Updated 10 months ago

rankify • Science 54%

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.

agent ai chatgpt information-retrieval llm nlp question-answering rag ranked-retrieval reranking retrieval retrival-augmented-generation