https://github.com/cyberagentailab/diverse-mbr

Code of "Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding" 2024

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

Code of "Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding" 2024

Basic Info

Host: GitHub
Owner: CyberAgentAILab
License: mit
Language: Python
Default Branch: master
Homepage: https://aclanthology.org/2024.findings-acl.503/
Size: 141 KB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

Diverse Minimum Bayes Risk Decoding

This repository contains the code for the experiments in Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding.

The code is tested on Ubuntu 20.04 using Python 3.8 and CUDA 11.0 (Docker image nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04). The code is provided mostly as is with little effort on refactoring.

Installation

git clone git@github.com:CyberAgentAILab/diverse-mbr cd diverse-mbr pip install -r requirements.txt

Usage

The code runs in two steps. 1. sample.sh samples candidates. 2. run_mbr.sh computes the MBR candidate from the candidates sampled.

Sampling candidates

./experiments/sample.sh -d [DATASET] -s [NUMBER OF SAMPLES]

Computing Diverse MBR and KMBR

./experiments/run_mbr.sh -d [DATASET] -s [NUMBER OF SAMPLES] -a [ALGORITHM]

Example on WMT'19 En-De

Use sacrebleu to prepare the benchmark dataset. mkdir -p ./dataset/wmt19-text sacrebleu -t wmt19 -l en-de --echo src > ./dataset/wmt19-text/wmt19.en-de.en sacrebleu -t wmt19 -l en-de --echo ref > ./dataset/wmt19-text/wmt19.en-de.de
Sample candidates on WMT'19 En-De

./experiments/sample.sh -d wmt19.en-de

Computing Diverse MBR and K-Medoid MBR on WMT'19 En-De

./experiments/run_mbr.sh -d wmt19.en-de -m wmt19-en-de -a diverse

Reference

Yuu Jinnai, Ukyo Honda, Tetsuro Morimura, and Peinan Zhang. 2024. Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding. In Findings of the Association for Computational Linguistics ACL 2024, pages 8494–8525, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.

Bibtex: @inproceedings{jinnai-etal-2024-generating, title = "Generating Diverse and High-Quality Texts by Minimum {B}ayes Risk Decoding", author = "Jinnai, Yuu and Honda, Ukyo and Morimura, Tetsuro and Zhang, Peinan", editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Findings of the Association for Computational Linguistics ACL 2024", month = aug, year = "2024", address = "Bangkok, Thailand and virtual meeting", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.findings-acl.503", pages = "8494--8525", }

Contact

For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.

Acknowledgements

MS COCO dataset is licensed under a Creative Commons BY 4.0.

Owner

Name: CyberAgent AI Lab
Login: CyberAgentAILab
Kind: organization
Location: Japan

Website: https://cyberagent.ai/ailab/
Twitter: cyberagent_ai
Repositories: 7
Profile: https://github.com/CyberAgentAILab

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Dependencies

requirements.txt pypi

absl-py *
accelerate *
bert_score ==0.3.13
bitsandbytes ==0.40.2
datasets *
einops *
evaluate *
google-cloud-storage *
nltk ==3.8.1
peft ==0.7.1
py7zr *
rouge-score ==0.1.2
sacremoses ==0.0.53
scikit-learn-extra ==0.3.0
sortedcontainers *
subword-nmt ==0.3.8
torchmetrics ==0.10.3
transformers *
unbabel-comet *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cyberagentailab/diverse-mbr

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Diverse Minimum Bayes Risk Decoding

Installation

Usage

Sampling candidates

Computing Diverse MBR and KMBR

Example on WMT'19 En-De

Reference

Contact

Acknowledgements

Owner

GitHub Events

Total

Last Year

Dependencies