Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Keywords
Repository
A library for minimum Bayes risk (MBR) decoding
Basic Info
Statistics
- Stars: 45
- Watchers: 4
- Forks: 7
- Open Issues: 0
- Releases: 7
Topics
Metadata Files
README.md
mbrs is a library for minimum Bayes risk (MBR) decoding.
Paper | Reference docs | Citation | Release notes
Installation
You can install from PyPi:
bash
pip install mbrs
For developers, it can be installed from the source.
bash
git clone https://github.com/naist-nlp/mbrs.git
cd mbrs/
pip install ./
For uv users:
bash
git clone https://github.com/naist-nlp/mbrs.git
cd mbrs/
uv sync
Quick start
mbrs provides two interfaces: command-line interface (CLI) and Python API.
Command-line interface
Command-line interface can run MBR decoding from command-line. Before
running MBR decoding, you can generate hypothesis sentences with
mbrs-generate:
bash
mbrs-generate \
sources.txt \
--output hypotheses.txt \
--lang_pair en-de \
--model facebook/m2m100_418M \
--num_candidates 1024 \
--sampling eps --epsilon 0.02 \
--batch_size 8 --sampling_size 8 --fp16 \
--report_format rounded_outline
Beam search can also be used by replacing
--sampling eps --epsilon 0.02 with --beam_size 10.
Next, MBR decoding and other decoding methods can be executed with
mbrs-decode. This example regards the hypothesis set as the
pseudo-reference set.
bash
mbrs-decode \
hypotheses.txt \
--num_candidates 1024 \
--nbest 1 \
--source sources.txt \
--references hypotheses.txt \
--output translations.txt \
--report report.txt --report_format rounded_outline \
--decoder mbr \
--metric comet \
--metric.model Unbabel/wmt22-comet-da \
--metric.batch_size 64 --metric.fp16 true
You can pass the arguments using a configuration yaml file via
--config_path option. See
docs for the
details.
Finally, you can evaluate the score with mbrs-score:
bash
mbrs-score \
hypotheses.txt \
--sources sources.txt \
--references hypotheses.txt \
--format json \
--metric bleurt \
--metric.batch_size 64 --metric.fp16 true
Python API
This is the example of COMET-MBR via Python API.
``` python from mbrs.metrics import MetricCOMET from mbrs.decoders import DecoderMBR
SOURCE = "ありがとう" HYPOTHESES = ["Thanks", "Thank you", "Thank you so much", "Thank you.", "thank you"]
Setup COMET.
metriccfg = MetricCOMET.Config( model="Unbabel/wmt22-comet-da", batchsize=64, fp16=True, ) metric = MetricCOMET(metric_cfg)
Setup MBR decoding.
decodercfg = DecoderMBR.Config() decoder = DecoderMBR(decodercfg, metric)
Decode by COMET-MBR.
This example regards the hypotheses themselves as the pseudo-references.
Args: (hypotheses, pseudo-references, source)
output = decoder.decode(HYPOTHESES, HYPOTHESES, source=SOURCE, nbest=1)
print(f"Selected index: {output.idx}") print(f"Output sentence: {output.sentence}") print(f"Expected score: {output.score}") ```
List of implemented methods
Metrics
Currently, the following metrics are supported:
- BLEU (Papineni et al., 2002):
bleu - TER (Snover et al.,
2006):
ter - chrF (Popović et al., 2015):
chrf - COMET (Rei et al.,
2020):
comet - COMETkiwi (Rei et al.,
2022):
cometkiwi - XCOMET (Guerreiro et al., 2023):
xcomet - XCOMET-lite (Larionov et al., 2024):
xcometwith--metric.model="myyycroft/XCOMET-lite" - BLEURT (Sellam et al.,
2020):
bleurt(thanks to \@lucadiliello) - MetricX (Juraska et al., 2023;
Juraska et al., 2024):
metricx - BERTScore (Zhang et al., 2020):
bertscore
Decoders
The following decoding methods are implemented:
- N-best reranking:
rerank - MBR decoding:
mbr
Specifically, the following methods of MBR decoding are included:
- Expectation estimation:
- Monte Carlo estimation (Eikema and Aziz, 2020; Eikema and Aziz, 2022)
- Model-based estimation (Jinnai et al.,
2024):
--reference_lprobsoption
- Efficient methods:
- Confidence-based pruning (Cheng and Vlachos,
2023) :
pruning_mbr - Reference aggregation (DeNero et al.,
2009; Vamvas and Sennrich,
2024):
aggregate_mbr- N-gram aggregation on BLEU (DeNero et al., 2009)
- N-gram aggregation on chrF (Vamvas and Sennrich, 2024)
- Embedding aggregation on COMET (Vamvas and Sennrich, 2024; Deguchi et al., 2024)
- Centroid-based MBR (Deguchi et al.,
2024):
centroid_mbr - Probabilistic MBR (Trabelsi et al.,
2024):
probabilistic_mbr
- Confidence-based pruning (Cheng and Vlachos,
2023) :
Selectors
The final output list is selected according to these selectors:
- N-best selection:
nbest - Diverse selection (Jinnai et al., 2024):
diverse
Related projects
- mbr
- Highly integrated with huggingface
transformers by
customizing
generate()method of model implementation. - If you are looking for an MBR decoding library that is fully integrated into transformers, this might be a good choice.
- Our mbrs works standalone; thus, not only transformers but also fairseq or LLM outputs via API can be used.
- Highly integrated with huggingface
transformers by
customizing
Citation
If you use this software, please cite:
bibtex
@inproceedings{deguchi-etal-2024-mbrs,
title = "mbrs: A Library for Minimum {B}ayes Risk Decoding",
author = "Deguchi, Hiroyuki and
Sakai, Yusuke and
Kamigaito, Hidetaka and
Watanabe, Taro",
editor = "Hernandez Farias, Delia Irazu and
Hope, Tom and
Li, Manling",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.37",
pages = "351--362",
}
License
This library is mainly developed by Hiroyuki Deguchi and published under the MIT-license.
Owner
- Name: naist-nlp
- Login: naist-nlp
- Kind: organization
- Repositories: 1
- Profile: https://github.com/naist-nlp
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Deguchi
given-names: Hiroyuki
orcid: https://orcid.org/0000-0003-2127-6607
- family-names: Yusuke
given-names: Sakai
- family-names: Hidetaka
given-names: Kamigaito
- family-names: Taro
given-names: Watanabe
title: "mbrs: A Library for Minimum Bayes Risk Decoding"
date-released: 2024-06-16
preferred-citation:
type: misc
authors:
- family-names: Deguchi
given-names: Hiroyuki
orcid: https://orcid.org/0000-0003-2127-6607
- family-names: Yusuke
given-names: Sakai
- family-names: Hidetaka
given-names: Kamigaito
- family-names: Taro
given-names: Watanabe
title: "mbrs: A Library for Minimum Bayes Risk Decoding"
eprint: 2408.04167
archivePrefix: arXiv
primaryClass: cs.CL
url: https://arxiv.org/abs/2408.04167
month: 8
year: 2024
GitHub Events
Total
- Release event: 4
- Watch event: 20
- Delete event: 10
- Push event: 29
- Pull request event: 25
- Fork event: 7
- Create event: 17
Last Year
- Release event: 4
- Watch event: 20
- Delete event: 10
- Push event: 29
- Pull request event: 25
- Fork event: 7
- Create event: 17
Packages
- Total packages: 1
-
Total downloads:
- pypi 154 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 8
- Total maintainers: 1
pypi.org: mbrs
A library for minimum Bayes risk (MBR) decoding.
- Documentation: https://mbrs.readthedocs.io/
- License: mit
-
Latest release: 0.1.7
published 9 months ago
Rankings
Maintainers (1)
Dependencies
- mypy ^1.8.0 develop
- ptpython ^3.0.25 develop
- pytest ^7.4.4 develop
- pytest-cov ^4.1.0 develop
- ruff ^0.4.4 develop
- numpy ^1.26.3
- python ^3.10
- sacrebleu ^2.4.0
- simple-parsing ^0.1.5
- tabulate ^0.9.0
- torch ^2.1.2
- tqdm ^4.66.1
- unbabel-comet ^2.2.1
- actions/checkout v4 composite
- actions/setup-python v3 composite
- JRubics/poetry-publish v2.0 composite
- actions/checkout v4 composite