Scientific Software
Updated 9 months ago

LangFair — Peer-reviewed • Rank 14.9 • Science 95%

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases - Published in JOSS (2025)

Updated 9 months ago

mlflow • Rank 35.0 • Science 36%

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

Updated 9 months ago

propertyeval • Rank 0.7 • Science 57%

PropertyEval: Synthesizing Thorough Test Cases for LLM Code Generation Benchmarks using Property-Based Testing

Updated 9 months ago

nutcracker • Rank 1.9 • Science 54%

Large Model Evaluation Experiments

Updated 9 months ago

milu • Science 41%

MILU (Multi-task Indic Language Understanding Benchmark) is a comprehensive evaluation dataset designed to assess the performance of LLMs across 11 Indic languages.

Updated 9 months ago

promptfoo • Science 26%

Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Updated 9 months ago

https://github.com/cedrickchee/vibe-jet • Science 26%

A browser-based 3D multiplayer flying game with arcade-style mechanics, created using the Gemini 2.5 Pro through a technique called "vibe coding"

Updated 9 months ago

tgcsm-circuit • Science 44%

The original containment framework for recursion-stable cognition, collapse-resistant logic, and LLM self-reflection.

Updated 9 months ago

https://github.com/ai4bharat/anudesh • Science 13%

An open source platform to annotate data for Large language models - at scale