https://github.com/cyberagentailab/annotation-efficient-po
Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Keywords
Repository
Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"
Basic Info
- Host: GitHub
- Owner: CyberAgentAILab
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://arxiv.org/abs/2405.13541
- Size: 19.5 MB
Statistics
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Annotation-Efficient Preference Optimization

This repository implements the Annotation-Efficient Preference Optimization (AEPO) algorithm.
The code is tested on Ubuntu 20.04 using Python 3.9 and CUDA 11.0 (Docker image nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04).
Install
You can install aepo via pip.
pip install aepo
Source install is available too. Clone this repository and run pip install ..
git clone git@github.com:CyberAgentAILab/annotation-efficient-po.git
cd annotation-efficient-po
pip install .
Usage
The command line interface is available. The input dataset can be csv file or a dataset uploaded to Huggingface Hub. The dataset should have a column named prompt or instruction. aepo recognize it as the user prompt given to the system and the rest of the columns to be the responses generated by the system.
I prepared an example dataset in dataset/alpaca_samples.csv.
The csv file includes 128 responses generated by HuggingFaceH4/mistral-7b-sft-beta for each instruction of the alpaca_human_preference split of tatsu-lab/alpaca_farm.
You can try aepo using this dataset with the following command:
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10
--num_responses is the number of input responses you use. The dataset has to have responses larger than or equal to --num_responses. --num_annotations is the number of responses after the subsampling process. It is also the number of times the reward model is queried per instruction.
Example: Running AEPO
You can generate a pair of responses for each instruction using aepo using the following command.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10
To subsample four responses for e.g., LiPO, set --num_annotations to four.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 4 --num_instructions 10
Example: Running West-of-N over 8 samples
West-of-N is a strategy to pick the Best-of-N as the chosen response, and Worst-of-N as a rejected response. It is shown to be effective for DPO and reward modeling.
You can run West-of-N using this package by setting --num_annotations == --num_responses.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10
This command will generate a dataset with 8 responses, ranked by their rewards. If you only need the best and worst of the N samples, then use --west_of_n option.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10 --west_of_n
This will pick the best and worst responses as the chosen and rejected. The rest of the responses are discarded. It would be useful to construct a pairwise preference dataset.
Reference
Bibtex:
@misc{jinnai2024annotationefficient,
title={Annotation-Efficient Preference Optimization for Language Model Alignment},
author={Yuu Jinnai and Ukyo Honda},
year={2024},
eprint={2405.13541},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Contact
For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.
Acknowledgements
AlpacaFarm dataset is licensed under Attribution-NonCommercial 4.0 International.
Owner
- Name: CyberAgent AI Lab
- Login: CyberAgentAILab
- Kind: organization
- Location: Japan
- Website: https://cyberagent.ai/ailab/
- Twitter: cyberagent_ai
- Repositories: 7
- Profile: https://github.com/CyberAgentAILab
GitHub Events
Total
- Watch event: 6
Last Year
- Watch event: 6
Packages
- Total packages: 1
-
Total downloads:
- pypi 42 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 4
- Total maintainers: 1
pypi.org: aepo
Annotation Efficient Preference Optimization
- Homepage: https://github.com/CyberAgentAILab/annotation-efficient-po
- Documentation: https://CyberAgentAILab.github.io/annotation-efficient-po
- License: mit
-
Latest release: 0.1.6
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- pytest ^5.2 develop
- datasets ^2.10.0
- evaluate ^0.4.2
- llm-blender ^0.0.2
- numpy ^1.26.4
- pandas ^2.0.0
- python ^3.9
- torch ^2.3.0
- tqdm ^4.66.4
- transformers ^4.40.2