https://github.com/cyberagentailab/srbon

https://github.com/cyberagentailab/srbon

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: CyberAgentAILab
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 31.3 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Stochastic Regularized Best-of-N Sampling (SRBoN)

This repository contains the implementation of Stochastic Regularized Best-of-N Sampling (SRBoN).

The code is tested using Python 3.8 and CUDA 11.0

Setup Instructions

Step 1: Environment Setup

Create a virtual environment, then install the required dependencies:

```bash

Create virtual environment

python3 -m venv env source env/bin/activate

Install dependencies

pip install -r requirements.txt ```

Step 2: Sample Collection and Metric Computation

Collect Samples

To collect samples from the model, use the following command. You can specify the dataset, model, and number of samples:

bash bash ./experiments/sample.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

Compute Metrics

You can compute various utility metrics such as log probability, Wasserstein distance, and token length using the following scripts:

```bash

Compute log probability

bash ./experiments/computelogprob.sh -d [DATASETS] -m [MODEL] -s [NUMBEROF_SAMPLES]

Compute Wasserstein distance

bash ./experiments/computewd.sh -d [DATASETS] -m [MODEL] -s [NUMBEROF_SAMPLES]

Compute token length

bash ./experiments/computelength.sh -d [DATASETS] -m [MODEL] -s [NUMBEROF_SAMPLES] ```

Compute Reward Values

To compute reward values for specific datasets, you can use the following command. Here, specify the dataset, number of samples, and the reward type:

bash bash ./experiments/compute_reward.sh -d [DATASETS] -s [NUMBER_OF_SAMPLES] -i [REWARD_TYPE]

Step 3: Running SRBoN

Finally, to compute the SRBoN values, run the following script:

bash python3 stochastic_rbon/stochastic_rbon.py --dataset [DATASETS] --ncandidates [NUMBER_OF_SAMPLES]

Examples

Below is an example of running the SRBoN pipeline using the alpaca dataset, the HuggingFaceH4/mistral-7b-sft-beta model, and 100 samples.

```bash

Collect 100 samples from the model

bash ./experiments/sample.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

Compute log probabilities for the collected samples

bash ./experiments/compute_logprob.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

Compute Wasserstein distance for the samples

bash ./experiments/compute_wd.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

Compute the token length of the samples

bash ./experiments/compute_length.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100 ```

For computing reward values using different reward models:

```bash

Compute reward values using the OpenAssistant reward model

bash ./experiments/compute_reward.sh -d alpaca -s 100 -i OpenAssistant/reward-model-deberta-v3-large-v2

Compute reward values using the openbmb reward model

bash ./experiments/compute_reward.sh -d alpaca -s 100 -i openbmb/Eurus-RM-7b ```

Finally, running the SRBoN computation with 100 candidates:

bash python3 stochastic_rbon/stochastic_rbon.py --dataset alpaca --ncandidates 100

Reference

Yuki, I., Jinnai, Y., Morimura, T., Abe, K., Ariu, K., Sakamoto, M., and Uchibe, E. "Evaluation of Best-of-N Sampling Strategies for Language Model Alignment."

Bibtex: bash @article{ ichihara2025evaluation, title={Evaluation of Best-of-N Sampling Strategies for Language Model Alignment}, author={Yuki Ichihara and Yuu Jinnai and Tetsuro Morimura and Kenshi Abe and Kaito Ariu and Mitsuki Sakamoto and Eiji Uchibe}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2025}, url={https://openreview.net/forum?id=H4S4ETc8c9}, note={} }

Owner

  • Name: CyberAgent AI Lab
  • Login: CyberAgentAILab
  • Kind: organization
  • Location: Japan

GitHub Events

Total
  • Watch event: 5
  • Push event: 1
  • Public event: 1
Last Year
  • Watch event: 5
  • Push event: 1
  • Public event: 1

Dependencies

requirements.txt pypi
  • POT *
  • absl-py *
  • accelerate *
  • bert_score *
  • bitsandbytes *
  • datasets *
  • einops *
  • evaluate *
  • google-cloud-storage *
  • llm-blender *
  • nltk ==3.8.1
  • peft *
  • py7zr *
  • rouge-score ==0.1.2
  • sacremoses ==0.0.53
  • scikit-learn-extra *
  • sortedcontainers *
  • subword-nmt *
  • torchmetrics *
  • tqdm *
  • transformers *
  • unbabel-comet *