lazyllm-llamafactory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
alignment-handbook
Robust recipes to align language models with human and AI preferences
distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
chinese-llama-alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
py-alpaca-eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://github.com/astorfi/llm-alignment-project
A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.
llm-reliability
Code for the paper "Larger and more instructable language models become less reliable"
https://github.com/cyberagentailab/annotation-efficient-po
Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"
https://github.com/cyberagentailab/filtered-dpo
Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model
cogment-verse
Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)