Projects

Scientific Software

Updated 11 months ago

LangFair — Peer-reviewed • Rank 14.9 • Science 95%

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases - Published in JOSS (2025)

ai ai-safety artificial-intelligence bias bias-detection ethical-ai fairness fairness-ai fairness-ml fairness-testing large-language-models llm llm-evaluation llm-evaluation-framework llm-evaluation-metrics python responsible-ai

Engineering Mathematics (42%)

Scientific Software · Peer-reviewed

Updated 11 months ago

deepeval • Rank 28.2 • Science 54%

The LLM Evaluation Framework

evaluation-framework evaluation-metrics llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated 11 months ago

promptfoo • Science 26%

Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework llmops pentesting prompt-engineering prompt-testing prompts rag red-teaming testing vulnerability-scanners

Updated 11 months ago

llm-fhir-eval • Science 57%

Benchmarking Large Language Models for FHIR

evals fhir fhir-llm fhir-resources fhirpath llm llm-evaluation-framework

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

LangFair — Peer-reviewed • Rank 14.9 • Science 95%

deepeval • Rank 28.2 • Science 54%

promptfoo • Science 26%

llm-fhir-eval • Science 57%