Projects | Open Source Science

Scientific Software

Updated 11 months ago

ollamar — Peer-reviewed • Rank 12.7 • Science 93%

ollamar: An R package for running large language models - Published in JOSS (2025)

ai api llm llms ollama ollama-api r

Scientific Software · Peer-reviewed

Updated 11 months ago

litgpt • Rank 23.6 • Science 64%

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai artificial-intelligence deep-learning large-language-models llm llm-inference llms

Updated 11 months ago

kani • Rank 8.3 • Science 77%

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

chatgpt framework function-calling gpt-4 large-language-models llama llms microframework openai tool-use

Updated 11 months ago

oumi • Rank 20.3 • Science 64%

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

dpo evaluation fine-tuning gpt-oss gpt-oss-120b gpt-oss-20b inference llama llms sft vlms

Updated 11 months ago

flexrag • Rank 12.8 • Science 67%

FlexRAG: A RAG Framework for Information Retrieval and Generation.

llms nlp rag

Updated 11 months ago

gpt-researcher • Rank 15.2 • Science 64%

LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.

agent ai automation deepresearch llms mcp mcp-server python research search webscraping

Updated 11 months ago

tensorzero • Rank 23.4 • Science 54%

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt large-language-models llama llm llmops llms machine-learning ml ml-engineering mlops openai python rust

Mathematics (40%)

Updated 11 months ago

guardrails • Rank 1.9 • Science 67%

VSCode extension to help developers set up guardrails around their functions, by helping them disambiguate purpose statements.

llms python specifications vscode-extension

Updated 11 months ago

beyondllm • Rank 12.5 • Science 54%

Build, evaluate and observe LLM apps

ai artificial-intelligence embeddings evaluate-llm genai generative-ai hacktoberfest hacktoberfest-accepted hacktoberfest2024 large-language-models llm llms rag

Updated 11 months ago

distilabel • Rank 11.6 • Science 54%

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation

Updated 11 months ago

https://github.com/modelscope/data-juicer • Rank 19.2 • Science 46%

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

data data-analysis data-pipeline data-processing data-science data-visualization foundation-models instruction-tuning large-language-models llm llms multi-modal pre-training synthetic-data

Updated 11 months ago

https://github.com/lancedb/lance • Rank 28.1 • Science 36%

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Updated 11 months ago

cambrian • Rank 9.0 • Science 54%

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

chatbot clip computer-vision dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning

Updated 11 months ago

linear-relational • Rank 7.8 • Science 54%

Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch

ai huggingface-transformers llms pytorch transformers

Updated 11 months ago

langfun • Rank 17.8 • Science 44%

OO for LLMs

framework llms nlp

Updated 11 months ago

opennmt-py • Rank 24.6 • Science 36%

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

deep-learning language-model llms machine-translation neural-machine-translation pytorch

Updated 11 months ago

contextgem • Rank 15.2 • Science 44%

ContextGem: Effortless LLM extraction from documents

ai contract-analysis data-extraction document-intelligence docx docx2md docx2txt generative-ai legaltech llm llm-extraction llm-framework llm-pipeline llms nlp prompt-engineering text-analysis unstructured-data

Updated 11 months ago

ml-keyframer • Rank 3.8 • Science 54%

Keyframer: Empowering Animation Design Using Large Language Models

animation llms machinelearning

Updated 11 months ago

https://github.com/adaptivemotorcontrollab/amadeusgpt • Rank 11.4 • Science 46%

We turn natural language descriptions of behaviors into machine-executable code

amadeusgpt cebra chatgpt deeplabcut llms segment-anything

Updated 11 months ago

https://github.com/assert-kth/repairllama • Rank 5.5 • Science 46%

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair http://arxiv.org/pdf/2312.15698

apr codellama llama llms lora repair

Updated 11 months ago

tamingllms • Rank 7.2 • Science 44%

Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software

book chatgpt genai huggingface langchain llama llm llms open-source outlines python qwen taming-llms

Updated 11 months ago

symmetry-cli • Rank 4.4 • Science 44%

The client for the Symmetry peer-to-peer inference network. Enabling users to connect with each other, share computational resources, and collect valuable machine learning data.

ai artificial-intelligence encryption holepunching hyperswarm inference llms localllm network p2p vscode-extension

Updated 11 months ago

boilerbot • Rank 1.9 • Science 44%

Official Open-Source Implementation of BoilerBot: A Reliable Task-Oriented Chatbot Enhanced with Large Language Models.

conversational-agent llms nlp

Updated 11 months ago

gemgpt • Rank 1.1 • Science 44%

Explore the power of Gemma model with GemGPT, a project leveraging AI for innovative solutions. Join us in shaping the future of AI!

fine-tuning gem-gpt gemgpt gemma gemma-2b gemma-2b-it gemma-7b gemma-7b-it gpt gpt-gem gptgem large-language-model large-language-models llm llms openai python pytorch

Updated 11 months ago

token-wars-dataviz • Rank 0.0 • Science 44%

A data visualisation in `matplotlib` of the number of parameters in major LLMs as well as the number of tokens of text they were trained on.

llm llms tokenization tokenizer tokens tokenwars

Updated 11 months ago

https://github.com/epsilla-cloud/vectordb • Rank 16.8 • Science 26%

Epsilla is a high performance Vector Database Management System

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Updated 11 months ago

https://github.com/amazon-science/refchecker • Rank 8.2 • Science 33%

RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Language Models.

factuality hallucination llms

Updated 11 months ago

https://github.com/bminixhofer/tokenkit • Rank 3.5 • Science 36%

A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.

distillation jax llms machine-learning tokenization tokenizer-transfer transfer-learning

Updated 10 months ago

https://github.com/google-deepmind/simulation_streams • Rank 3.0 • Science 36%

Simulation Streams is a programming paradigm designed to efficiently control and leverage Large Language Models (LLMs) for complex, dynamic simulations and agentic workflows.

agents ai llms simulations

Updated 11 months ago

keras-gpt-copilot • Rank 6.7 • Science 31%

Integrate an LLM copilot within your Keras model development workflow

deep-learning deep-neural-networks gpt keras llms openai skynet

Updated 11 months ago

statlingua • Rank 8.0 • Science 26%

Explain Statistical Output with Large Language Models

data-science explainability large-language-models llm llms statistics teaching-tools

Updated 11 months ago

https://github.com/spencerpresley/academicmetrics • Rank 7.7 • Science 26%

AI-powered toolkit for analyzing and classifying academic research publications using LLMs and automated data collection. Output options: Mongodb database via providing your databse url. Json. Excel spreadsheet. See README for the quick setup, see documentation for implementation details.

ai automation data-science developer-tools llm llms research-analytics

Updated 10 months ago

https://github.com/deepset-ai/haystack-demos • Rank 7.4 • Science 26%

Fully working applications that demonstrate how to use Haystack to implement various use cases

demo-app haystack haystack-ai llms nlp python question-answering rest-api semantic-search

Updated 11 months ago

tarsier • Rank 20.1 • Science 13%

Vision utilities for web interaction agents 👀

gpt4v llms ocr playwright pypi-package python selenium webscraping

Materials Science (40%)

Updated 11 months ago

https://github.com/amazon-science/mezo_svrg • Rank 3.7 • Science 26%

Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"

deep-learning fine-tuning language-model large-language-models llm-training llms machine-learning machine-learning-algorithms optimization optimization-algorithms svrg variance-reduction zero-order-methods

Updated 11 months ago

https://github.com/bminixhofer/zett • Rank 5.6 • Science 23%

Code for Zero-Shot Tokenizer Transfer

language-model llm llms multilingual tokenization transfer-learning

Updated 11 months ago

https://github.com/adaptivemotorcontrollab/llavaction • Rank 6.9 • Science 20%

action-recognition behavioral-analysis llms mmlms

Updated 11 months ago

https://github.com/citiususc/blinkg • Rank 0.7 • Science 26%

BLINKG: Benchmark for LLM-Integrated Knowledge Graph Generation

benchmark knowledge-graph llms llms-benchmarking mappings

Updated 11 months ago

https://github.com/astorfi/llm-alignment-project • Rank 3.5 • Science 23%

A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.

ai alignment deep-learning generative-ai large-language-models llms machine-learning rlhf template

Updated 11 months ago

https://github.com/ahwang16/grounded-intuition-gpt-vision • Rank 1.6 • Science 20%

Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images

cv gpt-4 grounded-theory hci images llms nlp qualitative-analysis thematic-analysis vision-language

Updated 11 months ago

https://github.com/astorfi/large-scale-ai-blueprint • Rank 4.0 • Science 13%

A comprehensive guide designed to empower readers with advanced strategies and practical insights for developing, optimizing, and deploying scalable AI models in real-world applications.

deep-learning large-language-models large-scale large-scale-ai large-scale-machine-learning llms machinel-learning production-ml

Updated 11 months ago

pkgmatch • Science 26%

Find R packages matching either descriptions or other R packages

embeddings llms natural-language-processing r

Updated 11 months ago

https://github.com/asanchezyali/talking-avatar-with-ai • Science 13%

This project is a digital human that can talk and listen to you. It uses OpenAI's GPT to generate responses, OpenAI's Whisper to transcript the audio, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.

ai-avatars digital-human elevenlabs lip-sync lipsync llms openai-api rhubarb-lip-sync visemes whisper

Updated 11 months ago

https://github.com/chonghin33/semantic-logic-system-1.0 • Science 39%

Semantic Logic System v1.0 — A modular prompt-based semantic framework for LLMs.

future language-as-structure llm-framework llms meta-prompt modular-prompting semantic-logic-system

Updated 11 months ago

https://github.com/cosmaadrian/strawberry-problem • Science 36%

Official repository for "The Strawberry Problem 🍓: Emergence of Character-level Understanding in Tokenized Language Models"

character-understanding cross-attention llms paper tokenization transformer

Updated 10 months ago

https://github.com/dimits-ts/synthetic_moderation_experiments • Science 26%

Experiments relating to synthetic LLM user-agents and LLM facilitators in online discussions

data-analysis dataset-generation llms llms-reasoning nlp

Updated 11 months ago

llms4subjects • Science 49%

The official SemEval 2025 Task 5 - LLMs4Subjects - Shared Task Dataset repository

ai artificial-intelligence dataset large-language-models llms natural-language-processing natural-language-understanding nlp semeval shared-task subject-indexing

Updated 11 months ago

awesome-llm-papers.github.io • Science 26%

A curated collection of the most impactful papers, tools, and resources on Large Language Models (LLMs). Continuously updated to help researchers, developers, and enthusiasts stay on top of LLM advancements.

academic-resources ai-research curated-papers deep-learning foundation-models generative-ai language-models large-language-models llm-tools llms machine-learning nlp open-source transformer-models

Updated 11 months ago

fuzi.mingcha • Science 57%

夫子•明察司法大模型是由山东大学、浪潮云、中国政法大学联合研发，以 ChatGLM 为大模型底座，基于海量中文无监督司法语料与有监督司法微调数据训练的中文司法大模型。该模型支持法条检索、案例分析、三段论推理判决以及司法对话等功能，旨在为用户提供全方位、高精准的法律咨询与解答服务。

chatglm-6b judicial large-language-models legal legal-ai legalai llms nlp pretrained-models

Updated 11 months ago

https://github.com/pixeltable/pixelagent • Science 26%

Pixelagent — Build your own stateful agent framework

agent-engineering agents ai llms python

Updated 11 months ago

https://github.com/safellmhub/hguard-go • Science 26%

Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.

agent-safety ai ai-safety hallucination-detection language-models llms machine-learning middleware prompt-engineering tool-calling toolformer

Updated 11 months ago

awesome-rlaif • Science 54%

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

alignment llms rl rlaif rlhf

Updated 11 months ago

backdoorllm • Science 36%

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

attack backdoor defense llms llms-benchmarking

Updated 11 months ago

https://github.com/amazon-science/factual-confidence-of-llms • Science 13%

Code for paper "Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators"

confidence factual factuality llm llms robustness

Updated 11 months ago

https://github.com/aielte-research/hacksynth • Science 23%

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

ai autonomous-pentesting ctf ctf-tools cybersecurity llms penetration-testing

Updated 11 months ago

https://github.com/akaiko1/langchain_examples • Science 26%

Practical, minimal examples for building with LangChain

langchain llm llms ollama rag

Updated 11 months ago

llmswitcher • Science 44%

Routes to the most performant and cost efficient LLM based on your prompt [ 🚧 WIP ]

llm-agent llm-router llms prompt-engineering

Updated 11 months ago

awesome-japanese-llm • Science 75%

日本語LLMまとめ - Overview of Japanese LLMs

foundation-models generative-ai generative-model generative-models japanese japanese-language japanese-language-model japanese-llm language-model language-models large-language-model large-language-models llm llm-japanese llms multimodal vision-and-language vision-language vision-language-model

Updated 11 months ago

fed-rag • Science 75%

A framework for fine-tuning retrieval-augmented generation (RAG) systems.

deep-learning federated-learning llms machine-learning rag

Updated 11 months ago

llms4subjects • Science 39%

The official GermEval 2025 Task - LLMs4Subjects - Shared Task Dataset Repository

ai artifical-intelligence dataset germeval large-language-models llms natural-language-processing natural-language-understanding nlp shared-task subject-classifications subject-indexing

Updated 11 months ago

ColBERT • Science 41%

Efficient late-interaction retrieval systems in Julia!

colbert gen-ai information-retrieval llms machine-learning rag

Updated 11 months ago

gold-standard-toxicity • Science 67%

Gold Standard for Toxicity and Incivility Project

gold-standard incivil-language llms toxicity toxicity-classification transformers

Updated 11 months ago

self-refine • Science 54%

LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

chatgpt few-shot-learning gpt-35 gpt-4 language-generation large-language-models llms prompting prompts reasoning

Updated 11 months ago

xrayglm • Science 54%

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

large-language-models llms medical multimodal visualglm-6b xray

Updated 11 months ago

mdi-llm • Science 54%

Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT

ai llm-inference llms torch

Updated 11 months ago

ai-for-grant-writing • Science 67%

A curated list of resources for using LLMs to develop more competitive grant applications.

generative-ai grant-proposals grants llms scientific-writing

Updated 11 months ago

https://github.com/alan-turing-institute/prompto • Science 26%

An open source library for asynchronous querying of LLM endpoints

deep-learning hut23 large-language-models llm-eval llm-evaluation llms machine-learning natural-language-processing nlp python transformer transformers

Updated 11 months ago

intro-llms-python • Science 44%

Introduction to Large Language Models in Python

carpentries-incubator english large-language-models lesson llms pre-alpha python

Updated 11 months ago

chembench • Science 36%

How good are LLMs at chemistry?

benchmark chemistry llm llms llms-benchmarking machine-learning materials-science safety

Updated 11 months ago

curlora • Science 67%

The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.

ai catastrophic-forgetting continual-learning fine-tuning generative-ai llms matrix-decompositions

Updated 11 months ago

pie-perf • Science 36%

Training language models to make programs faster

code-generation code-optimization llms optimization software-engineering

Updated 11 months ago

gptlint • Science 44%

A linter with superpowers! 🔥 Use LLMs to enforce best practices across your codebase.

best-practices gpt linter llms static-analysis

Updated 11 months ago

euroeval • Science 54%

The robust European language model benchmark.

danish-language dutch-language english-language european evaluation-framework faroese-language finnish-language french-language german-language icelandic-language italian-language llms nlp-machine-learning norwegian-language portuguese-language spanish-language swedish-language

Updated 11 months ago

https://github.com/aiplanethub/ai-stacks • Science 26%

Explore a variety of use cases and example stacks powered by LLMs, seamlessly integrated into the GenAI Stack platform.

genai genaistack llm llms llmstack

Updated 11 months ago

lcm-1.13-whitepaper • Science 57%

This project contains the original white paper for Language Construct Modeling (LCM) v1.13, authored by Vincent Shing Hin Chong. It introduces a novel framework for prompt-layered semantic control in large language models (LLMs), built upon the Meta Prompt Layering (MPL) structure. LCM formalizes a modular system of prompt orchestration, enabling

artificial-intelligence language-construct-modeling llm-framework llms meta-prompt modular-prompting

Updated 11 months ago

awesome-llms • Science 44%

🤓 A collection of AWESOME structured summaries of Large Language Models (LLMs)

ai-models awesome awesome-list knowledge-graph large-language-models llms structured-data structured-text

Updated 11 months ago

https://github.com/alexisvassquez/juniper2.0 • Science 13%

Conversational AI model and educational tool

ai annotation django-framework educational-tool finetuning-llms gpt-2 gpt-4 huggingface-transformers json llms python python3

Updated 11 months ago

https://github.com/chirayu-tripathi/mongodb-querifier • Science 13%

Improving LLMs MongoDB query generation capability with the help of advanced retrieval augmented generation.

database llms mongodb rag retrieval retrieval-augmented-generation

Updated 11 months ago

https://github.com/asanchezyali/library • Science 13%

deep-learning llms machine-learning npl

Updated 11 months ago

llmstack • Science 26%

No-code multi-agent framework to build LLM Agents, workflows and applications with your data

agents ai ai-agents-framework generative-ai llm-agents llm-chain llm-framework llmops llms no-code-ai platform

Updated 11 months ago

scisum • Science 31%

Resources for the paper Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? (ACL 2024).

llms machine-learning nlp summarization

Updated 11 months ago

llms4om • Science 54%

LLMs4OM: Matching Ontologies with Large Language Models

llms ontology-mapping ontology-matching transformers

Updated 11 months ago

hackathon-leaderboard • Science 44%

Automated Leaderboard System for Hackathon Evaluation Using Large Language Models

ai bedrock hackathon llms text-classification text-clustering

Updated 11 months ago

search-agents • Science 54%

Code for the paper 🌳 Tree Search for Language Model Agents

agents llms machine-learning

Updated 11 months ago

interactive-learning-duo • Science 44%

Unlock Your Coding Potential: Explore, Learn, and Code with Our Interactive Python and SQL ChatBot!

1st-semester 1st-year llms python sql sqlite uw wne

Updated 11 months ago

semexp-thesis • Science 77%

The Semantic Workspace: Augmenting Exploratory Programming with Integrated Generative AI Tools [Master's Thesis]

ai exploratory-programming gpt-4o latex llms programming-tools smalltalk squeak thesis

Updated 11 months ago

https://github.com/OpenDCAI/DataFlow • Science 26%

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

data data-agent data-cleaning data-pipelines data-processing data-science data-synthesis gradio-interface llms operators quick-data-processing sglang-bankend vllm-backend