https://github.com/cyberagentailab/regularized-bon
Code of "Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment" (2025).
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Repository
Code of "Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment" (2025).
Basic Info
- Host: GitHub
- Owner: CyberAgentAILab
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://arxiv.org/abs/2404.01054
- Size: 55.7 KB
Statistics
- Stars: 14
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Regularized Best-of-N
Implementation of Regularized Best-of-N (RBoN).
The code is tested on Ubuntu 20.04 using Python 3.8 and CUDA 11.0 (Docker image nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04).
git clone git@github.com:CyberAgentAILab/regularized-bon
cd regularized-bon
pip install -r requirements.txt
Usage
Running RBoN takes multiple steps.
- First you generate a set of responses using sample.sh. We use the same set of samples generated for all the algorithms for fair comparison.
- Compute Wasserstein distance and KL divergence using computewd.sh and computelogprob.sh.
- Compute the reward of the responses.
- Run mbr/compute_rbon.py to compute MBR-BoN (RBoN-WD) and RBoN-KL.
You get the CSV file in the results/ directory.
Sampling candidates
By default, it runs using openai-community/gpt2. Add -m [MODEL NAME IN HUGGINGFACE HUB] to change the language model.
./experiments/sample.sh -d alpaca -s [NUMBER OF SAMPLES]
Due to the backward compatibility in my codebase, sample.py has to select a prompt file even for tasks like AlpacaFarm that don't have a prompt shared prompt for the task. To this end, we have a dummy.txt which is a blank file so that we can select this blank file to say that we don't have a shared prompt for the task.
Computing Wasserstein distance
./experiments/compute_wd.sh -d alpaca -s [NUMBER OF SAMPLES]
Computing log probability
./experiments/compute_logprob.sh -d alpaca -s [NUMBER OF SAMPLES]
Computing the reward of the samples
./experiments/compute_reward.sh -d alpaca -s [NUMBER OF SAMPLES] -i stanfordnlp/SteamSHP-flan-t5-large
./experiments/compute_reward.sh -d alpaca -s [NUMBER OF SAMPLES] -i OpenAssistant/reward-model-deberta-v3-large-v2
Computing MBR-BoN and RBoN_KL
python3 mbr/compute_rbon.py --dataset alpaca --ncandidates [NUMBER OF SAMPLES]
Reference
Jinnai, Y., Morimura, T., Ariu, K., and Abe, K. Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment. 2025.
Bibtex:
@misc{jinnai2025regularizedbestofnsamplingminimum,
title={Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment},
author={Yuu Jinnai and Tetsuro Morimura and Kaito Ariu and Kenshi Abe},
year={2025},
eprint={2404.01054},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2404.01054},
}
Contact
For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.
Owner
- Name: CyberAgent AI Lab
- Login: CyberAgentAILab
- Kind: organization
- Location: Japan
- Website: https://cyberagent.ai/ailab/
- Twitter: cyberagent_ai
- Repositories: 7
- Profile: https://github.com/CyberAgentAILab
GitHub Events
Total
- Issues event: 2
- Watch event: 9
- Delete event: 1
- Issue comment event: 2
- Push event: 6
- Pull request event: 2
- Create event: 1
Last Year
- Issues event: 2
- Watch event: 9
- Delete event: 1
- Issue comment event: 2
- Push event: 6
- Pull request event: 2
- Create event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- a7217339 (1)
Pull Request Authors
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- absl-py *
- accelerate *
- bert_score *
- bitsandbytes *
- datasets *
- einops *
- evaluate *
- google-cloud-storage *
- nltk ==3.8.1
- peft *
- py7zr *
- rouge-score ==0.1.2
- sacremoses ==0.0.53
- scikit-learn-extra *
- sortedcontainers *
- subword-nmt *
- torchmetrics *
- transformers *
- unbabel-comet *