Jury
Jury: A Comprehensive Evaluation Toolkit - Published in JOSS (2024)
sahi
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
chronos-forecasting
Chronos: Pretrained Models for Probabilistic Time Series Forecasting
mlm-bias
Measuring Biases in Masked Language Models for PyTorch Transformers. Support for multiple social biases and evaluation measures.
eval-suite
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
knn-transformers
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
detoxify
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
@llm-tools/embedjs
A NodeJS RAG framework to easily work with LLMs and embeddings
hugsvision
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
https://github.com/csinva/tree-prompt
Tree prompting: easy-to-use scikit-learn interface for improved prompting.
tamingllms
Taming LLMs: A Practical Guide to LLM Pitfalls with Open Source Software
https://github.com/huggingface/agents-course
This repository contains the Hugging Face Agents Course.
https://github.com/csinva/interpretable-embeddings
Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)
pizzly
Pizzly, financial market analysis combining technical indicators with LLMs, featuring real-time data processing and AI-powered market insights ⚡️
https://github.com/awslabs/generative-ai-cdk-constructs
AWS Generative AI CDK Constructs are sample implementations of AWS CDK for common generative AI patterns.
logfire-callback
A callback for logging training events from Hugging Face's Transformers to Logfire 🤗
napolab
The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language tasks.
https://github.com/cahya-wirawan/rwkv-tokenizer
A fast RWKV Tokenizer written in Rust
spacy-wrap
spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.
https://github.com/cakecrusher/mimicbot
Mimicbot enables the effortless yet modular creation of an AI chat bot model that imitates another person's manner of speech.
https://github.com/buaadreamer/dlpy
Programming Language for Deep Learning in Python
https://github.com/explosion/spacy-transformers
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
https://github.com/adithya-s-k/companionllm
CompanionLLM - A framework to finetune LLMs to be your own sentient conversational companion
https://github.com/beomi/gemma-easylm
Train GEMMA on TPU/GPU! (Codebase for training Gemma-Ko Series)
mergekit
Tools for merging pretrained Large Language Models and create Mixture of Experts (MoE) from open-source models.
minicorpus
Investigating, reproducing, and improving MiniPile with PyTorch and HuggingFace
partial-embedding-matrix-adaptation
Vocabulary-level memory efficiency for language model fine-tuning.
layoutreader
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
https://github.com/ammarlodhi255/chest-xray-report-generation-app-with-chatbot-end-to-end-implementation
AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.
speechbrain
A PyTorch-based Speech Toolkit
hfcommunity
HFCommunity offers an offline up-to-date relational database built from the data available at the Hugging Face Hub, providing queriable data about the repositories hosted in the Hub
open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
simplyretrieve
Lightweight chat AI platform featuring custom knowledge, open-source LLMs, prompt-engineering, retrieval analysis. Highly customizable. For Retrieval-Centric & Retrieval-Augmented Generation.
https://github.com/andstor/verified-smart-contracts
:page_facing_up: Verified Ethereum Smart Contract dataset
https://github.com/andstor/verified-smart-contracts-audit
:bug: Verified smart contract dataset with vulnerability labeling
legalkit-pipeline
Publication pipeline for French legal codes on 🤗 Datasets from LegiFrance with concurrent upload and dynamic REAMDE.md.
automated-brain-explanations
Generating and validating natural-language explanations for the brain.
FMAT
😷 The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language.
moe-infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
2uml
A user-friendly tool that generates UMLs starting from natural language inputs.
https://github.com/amazon-science/lc-plm
LC-PLM: long-context protein language model based on BiMamba-S architecture