Projects

Updated 11 months ago

lazyllm-llamafactory • Rank 25.6 • Science 77%

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

agent ai deepseek fine-tuning gemma gpt instruction-tuning large-language-models llama llama3 llm lora moe nlp peft qlora quantization qwen rlhf transformers

Updated 11 months ago

mlora-cli • Rank 11.3 • Science 64%

An Efficient "Factory" to Build Multiple LoRA Adapters

baichuan chatglm dpo finetune gpu llama llama2 llm lora mlora peft rlhf

Updated 11 months ago

alignment-handbook • Rank 16.9 • Science 54%

Robust recipes to align language models with human and AI preferences

llm rlhf transformers

Updated 11 months ago

distilabel • Rank 11.6 • Science 54%

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation

Updated 11 months ago

chinese-llama-alpaca-2 • Rank 11.0 • Science 54%

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn

Updated 11 months ago

py-alpaca-eval • Rank 12.6 • Science 46%

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Updated 11 months ago

argilla • Rank 13.1 • Science 36%

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

active-learning ai annotation-tool developer-tools gpt-4 human-in-the-loop langchain llm machine-learning mlops natural-language-processing nlp rlhf text-annotation text-labeling weak-supervision weakly-supervised-learning

Updated 10 months ago

https://github.com/astorfi/llm-alignment-project • Rank 3.5 • Science 23%

A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.

ai alignment deep-learning generative-ai large-language-models llms machine-learning rlhf template

Updated 11 months ago

cogment-verse • Science 44%

Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)

cogment human-in-the-loop-learning reinforcement-learning rlhf

Updated 11 months ago

awesome-rlaif • Science 54%

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

alignment llms rl rlaif rlhf

Updated 11 months ago

llm-reliability • Science 49%

Code for the paper "Larger and more instructable language models become less reliable"

bloom evaluation gpt llama llm reliability rlhf scaling supervision

Updated 10 months ago

https://github.com/cyberagentailab/annotation-efficient-po • Science 23%

Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"

alignment llm rlhf

Updated 10 months ago

https://github.com/cyberagentailab/filtered-dpo • Science 23%

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

alignment dpo rlhf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

lazyllm-llamafactory • Rank 25.6 • Science 77%

mlora-cli • Rank 11.3 • Science 64%

alignment-handbook • Rank 16.9 • Science 54%

distilabel • Rank 11.6 • Science 54%

chinese-llama-alpaca-2 • Rank 11.0 • Science 54%

py-alpaca-eval • Rank 12.6 • Science 46%

argilla • Rank 13.1 • Science 36%

https://github.com/astorfi/llm-alignment-project • Rank 3.5 • Science 23%

cogment-verse • Science 44%

awesome-rlaif • Science 54%

llm-reliability • Science 49%

https://github.com/cyberagentailab/annotation-efficient-po • Science 23%

https://github.com/cyberagentailab/filtered-dpo • Science 23%