Scientific Software
Updated 10 months ago

LangFair — Peer-reviewed • Rank 14.9 • Science 95%

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases - Published in JOSS (2025)

Updated 10 months ago

kitsune • Rank 4.2 • Science 85%

Kitsune is a next-generation data steward and harmonization tool.

Updated 10 months ago

litgpt • Rank 23.6 • Science 64%

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Updated 10 months ago

kani • Rank 8.3 • Science 77%

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

Updated 10 months ago

spezillm • Rank 8.0 • Science 77%

Large Language Model (LLM) module for the Spezi Ecosystem

Updated 10 months ago

datastew • Rank 8.9 • Science 75%

Python library for intelligent data stewardship using Large Language Model (LLM) embeddings

Updated 10 months ago

dolma • Rank 19.6 • Science 64%

Data and tools for generating and inspecting OLMo pre-training data.

Updated 10 months ago

farm-haystack • Rank 28.7 • Science 54%

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Updated 10 months ago

tensorzero • Rank 23.4 • Science 54%

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Updated 10 months ago

learn_prompting • Rank 13.3 • Science 64%

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

Updated 10 months ago

tree-of-thought-prompting • Rank 8.0 • Science 67%

Using Tree-of-Thought Prompting to boost ChatGPT's reasoning

Updated 10 months ago

symbolicai • Rank 17.3 • Science 54%

A neurosymbolic perspective on LLMs

Artificial Intelligence and Machine Learning (38%)
Updated 10 months ago

optimate • Rank 12.8 • Science 54%

A collection of libraries to optimise AI model performances

Updated 10 months ago

agentlego • Rank 12.6 • Science 54%

Enhance LLM agents with rich tool APIs

Updated 10 months ago

chinese-llama-alpaca-2 • Rank 11.0 • Science 54%

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Updated 10 months ago

llmebench • Rank 10.5 • Science 54%

Benchmarking Large Language Models

Updated 10 months ago

libre-chat • Rank 10.3 • Science 54%

🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline capable and easy to setup.

Updated 10 months ago

surprisal • Rank 10.1 • Science 54%

A unified interface for computing surprisal (log probabilities) from language models! Supports neural, symbolic, and black-box API models.

Updated 10 months ago

magicoder • Rank 9.0 • Science 54%

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

Updated 10 months ago

llm-sandbox • Rank 17.7 • Science 44%

Lightweight and portable LLM sandbox runtime (code interpreter) Python library.

Updated 10 months ago

deeplake • Rank 25.6 • Science 36%

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Updated 10 months ago

chinese-mixtral • Rank 7.1 • Science 54%

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

Updated 10 months ago

monitors4codegen • Rank 7.0 • Science 54%

Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to build applications around language servers.

Updated 10 months ago

q-galore • Rank 5.3 • Science 54%

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Updated 10 months ago

py-alpaca-eval • Rank 12.6 • Science 46%

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Updated 10 months ago

chinese-llama-alpaca • Rank 12.2 • Science 46%

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Updated 10 months ago

openqa-eval • Rank 3.8 • Science 54%

ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Models

Updated 10 months ago

nutcracker • Rank 1.9 • Science 54%

Large Model Evaluation Experiments

Updated 10 months ago

odin-slides • Rank 7.8 • Science 44%

This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint slides using the Generative Pre-trained Transformer (GPT) of your choice. Leveraging the capabilities of Large Language Models (LLM), odin-slides enables you to turn the lengthiest Word documents into well organized presentations.

Updated 10 months ago

stormtrooper • Rank 7.6 • Science 44%

Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.

Updated 10 months ago

climsight • Rank 9.3 • Science 39%

A next-generation climate information system that uses large language models (LLMs) alongside high-resolution climate model data, scientific literature, and diverse databases to deliver accurate, localized, and context-aware climate assessments.

Updated 10 months ago

gemgpt • Rank 1.1 • Science 44%

Explore the power of Gemma model with GemGPT, a project leveraging AI for innovative solutions. Join us in shaping the future of AI!

Updated 10 months ago

upgini • Rank 17.5 • Science 26%

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

Updated 10 months ago

napolab • Rank 7.0 • Science 36%

The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language tasks.

Updated 10 months ago

https://github.com/bowang-lab/bioreason • Rank 5.8 • Science 36%

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Updated 10 months ago

statlingua • Rank 8.0 • Science 26%

Explain Statistical Output with Large Language Models

Updated 10 months ago

https://github.com/amazon-science/auto-cot • Rank 9.1 • Science 23%

Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated)

Updated 10 months ago

cellama • Rank 5.0 • Science 26%

Cell type annotation with local Large Language Models (LLMs) - Ensuring privacy and speed with extensive customized reports

Updated 10 months ago

https://github.com/altunenes/calcarine • Rank 1.4 • Science 26%

Desktop VLM: Real-time FastVLM analysis of video & textures with live compute shaders

Updated 10 months ago

https://github.com/astorfi/llm-alignment-project • Rank 3.5 • Science 23%

A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.

Updated 10 months ago

https://github.com/bigscience-workshop/data-preparation • Rank 8.1 • Science 13%

Code used for sourcing and cleaning the BigScience ROOTS corpus

Updated 10 months ago

https://github.com/astorfi/large-scale-ai-blueprint • Rank 4.0 • Science 13%

A comprehensive guide designed to empower readers with advanced strategies and practical insights for developing, optimizing, and deploying scalable AI models in real-world applications.

Updated 10 months ago

llms-from-scratch • Science 26%

Build your own Large Language Model from scratch with this code repository. Learn the ins and outs of LLMs like GPT. 🚀💻

Updated 10 months ago

https://github.com/chen-yang-liu/promptcc • Science 36%

PyTorch implementation of 'A Decoupling Paradigm With Prompt Learning for Remote Sensing Image Change Captioning'

Updated 10 months ago

llm_seminar_series • Science 67%

Material for the series of seminars on Large Language Models

Updated 10 months ago

xrayglm • Science 54%

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

Updated 10 months ago

loogle • Science 41%

ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models

Updated 10 months ago

phenomics-assistant • Science 44%

LLM retrieval augmented generation agent for the Monarch Knowledge graph.

Updated 10 months ago

mergekit • Science 44%

Tools for merging pretrained Large Language Models and create Mixture of Experts (MoE) from open-source models.

Updated 10 months ago

autonomousssiishare • Science 44%

Implementation and evaluation of an autonomous SSI-enhanced iSHARE framework, enabling decentralized identity management, schema alignment for property matching, and automated generation of alternative verification requests, improving scalability, privacy, and flexibility in data spaces and SSI ecosystems.

Updated 10 months ago

longmem • Science 54%

Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".

Updated 10 months ago

dpl • Science 44%

[NeurIPS 2023] Multi-fidelity hyperparameter optimization with deep power laws that achieves state-of-the-art results across diverse benchmarks.

Updated 10 months ago

aphra • Science 54%

An open-source translation agent designed to enhance the quality of text translations by leveraging large language models

Updated 10 months ago

moe-infinity • Science 54%

PyTorch library for cost-effective, fast and easy serving of MoE models.

Updated 10 months ago

flashrag • Science 67%

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Updated 10 months ago

fuzi.mingcha • Science 57%

夫子•明察司法大模型是由山东大学、浪潮云、中国政法大学联合研发,以 ChatGLM 为大模型底座,基于海量中文无监督司法语料与有监督司法微调数据训练的中文司法大模型。该模型支持法条检索、案例分析、三段论推理判决以及司法对话等功能,旨在为用户提供全方位、高精准的法律咨询与解答服务。

Updated 10 months ago

https://github.com/bigcode-project/selfcodealign • Science 13%

[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation

Updated 10 months ago

https://github.com/agnostiqhq/multi-agent-llm • Science 23%

Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)

Updated 10 months ago

author-profiling-pan2023 • Science 44%

Symbol Team model for PAN@AP 2023 shared task on Profiling Cryptocurrency Influencers with Few-shot Learning

Updated 10 months ago

folktexts • Science 52%

Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!

Updated 10 months ago

vellmes-ai-deception-framework • Science 44%

Interactive, dynamic, and realistic LLM honeypots