https://github.com/amazon-science/machine-translation-gender-eval
Data and code for the MT-GenEval benchmark
https://github.com/amazon-science/machine-translation-gender-eval
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.6%) to scientific vocabulary
Repository
Data and code for the MT-GenEval benchmark
Basic Info
Statistics
- Stars: 9
- Watchers: 12
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
MT-GenEval
This repository contains the data and code for the MT-GenEval benchmark, which evaluates gender translation accuracy on English -> {Arabic, French, German, Hindi, Italian, Portuguese, Russian, Spanish}. The MT-GenEval benchmark was released in the EMNLP 2022 paper MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation by Anna Currey, Maria Nadejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, and Georgiana Dinu.
Citing
@inproceedings{currey-etal-2022-mtgeneval,
title = "{MT-GenEval}: {A} Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation",
author = "Currey, Anna and
Nadejde, Maria and
Pappagari, Raghavendra and
Mayer, Mia and
Lauly, Stanislas, and
Niu, Xing and
Hsu, Benjamin and
Dinu, Georgiana",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
publisher = "Association for Computational Linguistics",
url = ""https://arxiv.org/pdf/2211.01355.pdf,
}
Data
The data is originally sourced from Wikipedia.
We include sentence-level development and test segments in data/sentences/ and inter-sentence test segments in data/context/.
Compute accuracy
To compute accuracy, use accuracy_metric.py script.
Example usage for English-Russian contextual test dataset is as follows
python3 accuracy_metric.py \
--target_lang ru \
--dataset contextual \
--data_split test \
--hyp PATH_FOR_YOUR_SYSTEM_TRANSLATIONS
Example usage for English-Russian counterfactual test dataset is as follows
python3 accuracy_metric.py \
--target_lang ru \
--dataset counterfactual \
--data_split test \
--hyp_masculine PATH_FOR_YOUR_SYSTEM_TRANSLATIONS_FOR_MASCULINE_SEGMENTS \
--hyp_feminine PATH_FOR_YOUR_SYSTEM_TRANSLATIONS_FOR_FEMININE_SEGMENTS
Security
See CONTRIBUTING for more information.
License
The data and code are released under the CC-BY-SA-3.0 License. See LICENSE for details.
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Issues event: 1
- Watch event: 1
- Issue comment event: 1
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 1
- Issue comment event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 1
- Average time to close issues: 16 days
- Average time to close pull requests: 16 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.5
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- st-vincent1 (1)
- Sijia324 (1)
Pull Request Authors
- pappagari (1)