Turftopic
Turftopic: Topic Modelling with Contextual Representations from Sentence Transformers - Published in JOSS (2025)
UralicNLP
UralicNLP: An NLP Library for Uralic Languages - Published in JOSS (2019)
LangFair
LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases - Published in JOSS (2025)
EcoLogits
EcoLogits: Evaluating the Environmental Impacts of Generative AI - Published in JOSS (2025)
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
lazyllm-llamafactory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
MM-PoE
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models - Published in JOSS (2025)
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
maidr-legacy
[DEPRECATED prototype] Multimodal Access and Interactive Data Representation
gpullama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
embed
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
lida
Automatic Generation of Visualizations and Infographics using Large Language Models
txtai
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
llama_index
LlamaIndex is the leading framework for building LLM-powered agents over your data.
farm-haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
cntext
text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。
tiny_qa_benchmark_pp
Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
medusa-llm
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
learn_prompting
Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community
tree-of-thought-prompting
Using Tree-of-Thought Prompting to boost ChatGPT's reasoning
hallucination-leaderboard
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
vulntrain
A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.
chronos-forecasting
Chronos: Pretrained Models for Probabilistic Time Series Forecasting
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
mllmcelltype
🏆 #1 Multi-LLM consensus framework | 550+ stars | 95% accuracy | 10+ LLM providers | Leading cell annotation tool
alignment-handbook
Robust recipes to align language models with human and AI preferences
llms-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
eval-suite
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
openllm
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
banks
LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. It allows attaching metadata to prompts to ease their management, and versioning is first-class citizen. Banks provides ways to store prompts on disk along with their metadata.
beeai-framework
Build production-ready AI agents in both Python and Typescript.
https://github.com/modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
chinese-llama-alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
medicalgpt
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。
multilspy
multilspy is a lsp client library in Python intended to be used to build applications around language servers.
libre-chat
🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline capable and easy to setup.
sparql-llm
🦜✨ Chat system and reusable components to improve LLMs capabilities when generating SPARQL queries
llm-lct-sequencing
AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building.
https://github.com/flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
basic-memory
AI conversations that actually remember. Never re-explain your project to Claude again. Local-first, integrates with Obsidian. Join our Discord: https://discord.gg/tyvKNccgqN
@llm-tools/embedjs
A NodeJS RAG framework to easily work with LLMs and embeddings
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
monitors4codegen
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to build applications around language servers.
x-anylabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
https://github.com/khoj-ai/khoj
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
https://github.com/zenml-io/zenml
ZenML 🙏: MLOps for Reliable AI: from Classical AI to Agents. https://zenml.io.
https://github.com/gradio-app/fastrtc
The python library for real-time communication
https://github.com/floneum/floneum
Instant, controllable, local pre-trained AI models in Rust
https://github.com/nixtla/nixtla
TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection. Generative pretrained transformer for time series trained on over 100B data points. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code 🚀.
rl-llm-calibration-test
Attempt at replication of the parts of the paper "Language models (mostly) know what they know", on open datasets, and models.
tap_llm_course
Materials for the TAP (Tendencias en Aprendizaje Pronfundo) subject from the Master in Robotics and Artificial Intelligence from the Universidad de León
chinese-llama-alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
xfinder
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
canada-labour-research-assistant
The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered RAG AI assistant proposing Easily Verifiable Direct Quotations (EVDQ) to mitigate hallucinations in answering questions about Canadian labour laws, standards, and regulations. It works entirely offline and locally, guaranteeing the confidentiality of your conversations.
langchain-rdf
🦉 Utilities to improve LLMs capabilities when working with SPARQL endpoints and RDF knowledge graphs, compatible with LangChain
agentica
Agentica: Effortlessly Build Intelligent, Reflective, and Collaborative Multimodal AI Agents! 构建智能的多模态AI Agent。
https://github.com/mobiusml/hqq
Official implementation of Half-Quadratic Quantization (HQQ)