https://github.com/awslabs/mlm-scoring
Python library & examples for Masked Language Model Scoring (ACL 2020)
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary
Keywords
Repository
Python library & examples for Masked Language Model Scoring (ACL 2020)
Basic Info
- Host: GitHub
- Owner: awslabs
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://www.aclweb.org/anthology/2020.acl-main.240/
- Size: 22.9 MB
Statistics
- Stars: 342
- Watchers: 14
- Forks: 60
- Open Issues: 11
- Releases: 0
Topics
Metadata Files
README.md
Masked Language Model Scoring
This package uses masked LMs like BERT, RoBERTa, and XLM to score sentences and rescore n-best lists via pseudo-log-likelihood scores, which are computed by masking individual words. We also support autoregressive LMs like GPT-2. Example uses include: - Speech Recognition: Rescoring an ESPnet LAS model (LibriSpeech) - Machine Translation: Rescoring a Transformer NMT model (IWSLT'15 en-vi) - Linguistic Acceptability: Unsupervised ranking within linguistic minimal pairs (BLiMP)
Paper: Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff. "Masked Language Model Scoring", ACL 2020.

Installation
Python 3.6+ is required. Clone this repository and install:
bash
pip install -e .
pip install torch mxnet-cu102mkl # Replace w/ your CUDA version; mxnet-mkl if CPU only.
Some models are via GluonNLP and others are via 🤗 Transformers, so for now we require both MXNet and PyTorch. You can now import the library directly:
```python
from mlm.scorers import MLMScorer, MLMScorerPT, LMScorer
from mlm.models import get_pretrained
import mxnet as mx
ctxs = [mx.cpu()] # or, e.g., [mx.gpu(0), mx.gpu(1)]
MXNet MLMs (use names from mlm.models.SUPPORTED_MLMS)
model, vocab, tokenizer = getpretrained(ctxs, 'bert-base-en-cased') scorer = MLMScorer(model, vocab, tokenizer, ctxs) print(scorer.scoresentences(["Hello world!"]))
>> [-12.410664200782776]
print(scorer.scoresentences(["Hello world!"], pertoken=True))
>> [[None, -6.126736640930176, -5.501412391662598, -0.7825151681900024, None]]
EXPERIMENTAL: PyTorch MLMs (use names from https://huggingface.co/transformers/pretrained_models.html)
model, vocab, tokenizer = getpretrained(ctxs, 'bert-base-cased') scorer = MLMScorerPT(model, vocab, tokenizer, ctxs) print(scorer.scoresentences(["Hello world!"]))
>> [-12.411025047302246]
print(scorer.scoresentences(["Hello world!"], pertoken=True))
>> [[None, -6.126738548278809, -5.501765727996826, -0.782496988773346, None]]
MXNet LMs (use names from mlm.models.SUPPORTED_LMS)
model, vocab, tokenizer = getpretrained(ctxs, 'gpt2-117m-en-cased') scorer = LMScorer(model, vocab, tokenizer, ctxs) print(scorer.scoresentences(["Hello world!"]))
>> [-15.995375633239746]
print(scorer.scoresentences(["Hello world!"], pertoken=True))
>> [[-8.293947219848633, -6.387561798095703, -1.3138668537139893]]
``` (MXNet and PyTorch interfaces will be unified soon!)
Scoring
Run mlm score --help to see supported models, etc. See examples/demo/format.json for the file format. For inputs, "score" is optional. Outputs will add "score" fields containing PLL scores.
There are three score types, depending on the model:
- Pseudo-log-likelihood score (PLL): BERT, RoBERTa, multilingual BERT, XLM, ALBERT, DistilBERT
- Maskless PLL score: same (add --no-mask)
- Log-probability score: GPT-2
We score hypotheses for 3 utterances of LibriSpeech dev-other on GPU 0 using BERT base (uncased):
bash
mlm score \
--mode hyp \
--model bert-base-en-uncased \
--max-utts 3 \
--gpus 0 \
examples/asr-librispeech-espnet/data/dev-other.am.json \
> examples/demo/dev-other-3.lm.json
Rescoring
One can rescore n-best lists via log-linear interpolation. Run mlm rescore --help to see all options. Input one is a file with original scores; input two are scores from mlm score.
We rescore acoustic scores (from dev-other.am.json) using BERT's scores (from previous section), under different LM weights:
bash
for weight in 0 0.5 ; do
echo "lambda=${weight}"; \
mlm rescore \
--model bert-base-en-uncased \
--weight ${weight} \
examples/asr-librispeech-espnet/data/dev-other.am.json \
examples/demo/dev-other-3.lm.json \
> examples/demo/dev-other-3.lambda-${weight}.json
done
The original WER is 12.2% while the rescored WER is 8.5%.
Maskless finetuning
One can finetune masked LMs to give usable PLL scores without masking. See LibriSpeech maskless finetuning.
Development
Run pip install -e .[dev] to install extra testing packages. Then:
- To run unit tests and coverage, run
pytest --cov=src/mlmin the root directory.
Owner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
- Watch event: 10
- Fork event: 2
Last Year
- Watch event: 10
- Fork event: 2
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 21
- Total pull requests: 3
- Average time to close issues: about 1 month
- Average time to close pull requests: 2 days
- Total issue authors: 18
- Total pull request authors: 3
- Average comments per issue: 1.48
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 18 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gerardb7 (2)
- mfelice (2)
- Tanesan (2)
- pstroe (1)
- ajd12342 (1)
- BarahFazili (1)
- david-waterworth (1)
- trangtv57 (1)
- orenpapers (1)
- ksoky (1)
- yuchenlin (1)
- dsorato (1)
- VP007-py (1)
- aflah02 (1)
- sb1992 (1)
Pull Request Authors
- ju-resplande (1)
- zolastro (1)
- ruanchaves (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- gluonnlp *
- mosestokenizer *
- regex *
- sacrebleu *
- transformers *