https://github.com/cyberagentailab/annotation-efficient-po

Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary

Keywords

alignment llm rlhf

Last synced: 5 months ago · JSON representation

Repository

Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"

Basic Info

Host: GitHub
Owner: CyberAgentAILab
License: mit
Language: Python
Default Branch: master
Homepage: https://arxiv.org/abs/2405.13541
Size: 19.5 MB

Statistics

Stars: 8
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Topics

alignment llm rlhf

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

Annotation-Efficient Preference Optimization

This repository implements the Annotation-Efficient Preference Optimization (AEPO) algorithm.

The code is tested on Ubuntu 20.04 using Python 3.9 and CUDA 11.0 (Docker image nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04).

Install

You can install aepo via pip. pip install aepo

Source install is available too. Clone this repository and run pip install .. git clone git@github.com:CyberAgentAILab/annotation-efficient-po.git cd annotation-efficient-po pip install .

Usage

The command line interface is available. The input dataset can be csv file or a dataset uploaded to Huggingface Hub. The dataset should have a column named prompt or instruction. aepo recognize it as the user prompt given to the system and the rest of the columns to be the responses generated by the system.

I prepared an example dataset in dataset/alpaca_samples.csv. The csv file includes 128 responses generated by HuggingFaceH4/mistral-7b-sft-beta for each instruction of the alpaca_human_preference split of tatsu-lab/alpaca_farm. You can try aepo using this dataset with the following command:

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10

--num_responses is the number of input responses you use. The dataset has to have responses larger than or equal to --num_responses. --num_annotations is the number of responses after the subsampling process. It is also the number of times the reward model is queried per instruction.

Example: Running AEPO

You can generate a pair of responses for each instruction using aepo using the following command.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10

To subsample four responses for e.g., LiPO, set --num_annotations to four.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 4 --num_instructions 10

Example: Running West-of-N over 8 samples

West-of-N is a strategy to pick the Best-of-N as the chosen response, and Worst-of-N as a rejected response. It is shown to be effective for DPO and reward modeling. You can run West-of-N using this package by setting --num_annotations == --num_responses.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10

This command will generate a dataset with 8 responses, ranked by their rewards. If you only need the best and worst of the N samples, then use --west_of_n option.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10 --west_of_n

This will pick the best and worst responses as the chosen and rejected. The rest of the responses are discarded. It would be useful to construct a pairwise preference dataset.

Reference

Jinnai, Y., Honda, U. (2024). Annotation-Efficient Preference Optimization for Language Model Alignment. arXiv preprint arXiv:2405.13541.

Bibtex:

@misc{jinnai2024annotationefficient, title={Annotation-Efficient Preference Optimization for Language Model Alignment}, author={Yuu Jinnai and Ukyo Honda}, year={2024}, eprint={2405.13541}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Contact

For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.

Acknowledgements

AlpacaFarm dataset is licensed under Attribution-NonCommercial 4.0 International.

Owner

Name: CyberAgent AI Lab
Login: CyberAgentAILab
Kind: organization
Location: Japan

Website: https://cyberagent.ai/ailab/
Twitter: cyberagent_ai
Repositories: 7
Profile: https://github.com/CyberAgentAILab

GitHub Events

Total

Watch event: 6

Last Year

Watch event: 6

Packages

Total packages: 1
Total downloads:
- pypi 42 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: aepo

Annotation Efficient Preference Optimization

Homepage: https://github.com/CyberAgentAILab/annotation-efficient-po
Documentation: https://CyberAgentAILab.github.io/annotation-efficient-po
License: mit
Latest release: 0.1.6
published over 1 year ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 42 Last month

Rankings

Dependent packages count: 10.9%

Average: 36.3%

Dependent repos count: 61.6%

Maintainers (1)

ddyuudd

Last synced: 6 months ago

Dependencies

pyproject.toml pypi

pytest ^5.2 develop
datasets ^2.10.0
evaluate ^0.4.2
llm-blender ^0.0.2
numpy ^1.26.4
pandas ^2.0.0
python ^3.9
torch ^2.3.0
tqdm ^4.66.4
transformers ^4.40.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cyberagentailab/annotation-efficient-po

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Annotation-Efficient Preference Optimization

Install

Usage

Example: Running AEPO

Example: Running West-of-N over 8 samples

Reference

Contact

Acknowledgements

Owner

GitHub Events

Total

Last Year

Packages

pypi.org: aepo

Rankings

Maintainers (1)

Dependencies