Updated 9 months ago

mlora-cli • Rank 11.3 • Science 64%

An Efficient "Factory" to Build Multiple LoRA Adapters

Updated 9 months ago

alignment-handbook • Rank 16.9 • Science 54%

Robust recipes to align language models with human and AI preferences

Updated 9 months ago

distilabel • Rank 11.6 • Science 54%

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Updated 9 months ago

chinese-llama-alpaca-2 • Rank 11.0 • Science 54%

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Updated 9 months ago

py-alpaca-eval • Rank 12.6 • Science 46%

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Updated 9 months ago

https://github.com/astorfi/llm-alignment-project • Rank 3.5 • Science 23%

A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.

Updated 9 months ago

llm-reliability • Science 49%

Code for the paper "Larger and more instructable language models become less reliable"

Updated 9 months ago

https://github.com/cyberagentailab/annotation-efficient-po • Science 23%

Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"

Updated 9 months ago

https://github.com/cyberagentailab/filtered-dpo • Science 23%

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

Updated 9 months ago

cogment-verse • Science 44%

Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)

Updated 9 months ago

awesome-rlaif • Science 54%

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)