https://github.com/ai4bharat/indicmt-eval

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages, ACL 2023

https://github.com/ai4bharat/indicmt-eval

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages, ACL 2023

Basic Info
  • Host: GitHub
  • Owner: AI4Bharat
  • Language: HTML
  • Default Branch: master
  • Homepage:
  • Size: 8.64 MB
Statistics
  • Stars: 4
  • Watchers: 4
  • Forks: 0
  • Open Issues: 3
  • Releases: 0
Created about 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.md

IndicMT-Eval

This repository contains the code for the paper "IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages" to appear at ACL 2023

Contents

Overview

We contribute a Multidimensional Quality Metric (MQM) dataset for Indian languages created by taking outputs generated by 7 popular MT systems and asking human annotators to judge the quality of the translations using the MQM style guidelines. Using this rich set of annotated data, we show the performance of 16 metrics of various types on evaluating en-xx translations for 5 Indian languages. We provide an updated metric called Indic-COMET which not only shows stronger correlations with human judgement on Indian languages, but is also more robust to perturbations.

Please find more details of this work in our paper (link coming soon).

MQM Dataset

The MQM annotated dataset collected with the help of language experts for the 5 Indian lamguages (Hindi, Tamil, Marathi, Malayalam, Gujarati) can be downloaded from here (link coming soon).

An example of an MQM annotation containing the source, reference and the translated output with error spans as demarcated by the annotator looks like the following: MQM-example

More details regarding the instructions provided and the procedures followed for annotations are present in the paper.

Setup

Load the data

The easiest method to access / view the data is to visit this link More details in data folder cd data

Indic Comet

We load the pretrained encoder and initialize it with either XLM-Roberta, COMET-DA or COME-MQM weights. During training, we divide the model parameters into two groups: the encoder parameters, that include the encoder model and the regressor parameters, that include the parameters from the top feed-forward network. We apply gradual unfreezing and discriminative learning rates, meaning that the encoder model is frozen for one epoch while the feed-forward is optimized with a learning rate. After the first epoch, the entire model is fine-tuned with a different learning rate. Since we are fine-tuning on a small dataset, we make use of early stopping with a patience of 3. The best saved checkpoint is decided using the overall Kendall-tau correlation on the test set. We use the COMET repository for training and our checkpoints are compatible with their setup.

Download the best checkpoint here

| MQM | DA | | ---- | --- | | indic-comet-mqm | indic-comet-da | | hparams.yaml | hparamas.yaml |

Other Metrics

We followed the implementation of metrics with the help of the following repositories: For BLEU, METEOR, ROUGE-L, CIDEr, Embedding Averaging, Greedy Matching, and Vector Extrema, we use the implementation provided by Sharma et al. (2017). For chrF++, TER, BERTScore, and BLEURT, we use the repository of Castro Ferreira et al. (2020). For SMS, WMDo, and Mover-Score, we use the implementation provided by Fabbri et al. (2020). For all the remaining task-specific metrics, we use the official codes from the respective papers.


The python file code/evaluate.py runs all of these metrics on the given dataset.

Citation

If you find IndicMTEval useful in your research or work, please consider citing our paper. ``` @article{DBLP:journals/corr/abs-2212-10180, author = {Ananya B. Sai and Tanay Dixit and Vignesh Nagarajan and Anoop Kunchukuttan and Pratyush Kumar and Mitesh M. Khapra and Raj Dabre}, title = {IndicMT Eval: {A} Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages}, journal = {CoRR}, volume = {abs/2212.10180}, year = {2022} }

@article{singh2024good, title={How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?}, author={Singh, Anushka and Sai, Ananya B and Dabre, Raj and Puduppully, Ratish and Kunchukuttan, Anoop and Khapra, Mitesh M}, journal={arXiv preprint arXiv:2406.03893}, year={2024} } ```

Owner

  • Name: AI4Bhārat
  • Login: AI4Bharat
  • Kind: organization
  • Email: opensource@ai4bharat.org
  • Location: India

Artificial-Intelligence-For-Bhārat : Building open-source AI solutions for India!

GitHub Events

Total
  • Watch event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sofia100 (1)
  • babangain (1)
  • nachiketh89 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

COMET/poetry.lock pypi
  • aiohttp 3.9.1
  • aiosignal 1.3.1
  • async-timeout 4.0.3
  • attrs 23.2.0
  • certifi 2023.11.17
  • charset-normalizer 3.3.2
  • colorama 0.4.6
  • coverage 5.5
  • entmax 1.2
  • filelock 3.13.1
  • frozenlist 1.4.1
  • fsspec 2023.12.2
  • huggingface-hub 0.20.2
  • idna 3.6
  • importlib-metadata 7.0.1
  • jinja2 3.1.2
  • joblib 1.3.2
  • jsonargparse 3.13.1
  • lightning-utilities 0.10.0
  • lxml 5.1.0
  • markdown 3.5.1
  • markupsafe 2.1.3
  • mpmath 1.3.0
  • multidict 6.0.4
  • networkx 3.1
  • numpy 1.24.4
  • nvidia-cublas-cu12 12.1.3.1
  • nvidia-cuda-cupti-cu12 12.1.105
  • nvidia-cuda-nvrtc-cu12 12.1.105
  • nvidia-cuda-runtime-cu12 12.1.105
  • nvidia-cudnn-cu12 8.9.2.26
  • nvidia-cufft-cu12 11.0.2.54
  • nvidia-curand-cu12 10.3.2.106
  • nvidia-cusolver-cu12 11.4.5.107
  • nvidia-cusparse-cu12 12.1.0.106
  • nvidia-nccl-cu12 2.18.1
  • nvidia-nvjitlink-cu12 12.3.101
  • nvidia-nvtx-cu12 12.1.105
  • packaging 23.2
  • pandas 2.0.3
  • portalocker 2.8.2
  • protobuf 4.25.1
  • python-dateutil 2.8.2
  • pytorch-lightning 2.1.3
  • pytz 2023.3.post1
  • pywin32 306
  • pyyaml 6.0.1
  • regex 2023.12.25
  • requests 2.31.0
  • sacrebleu 2.4.0
  • safetensors 0.4.1
  • scikit-learn 1.3.2
  • scipy 1.9.3
  • sentencepiece 0.1.99
  • setuptools 69.0.3
  • six 1.16.0
  • sphinx-markdown-tables 0.0.15
  • sympy 1.12
  • tabulate 0.9.0
  • threadpoolctl 3.2.0
  • tokenizers 0.15.0
  • torch 2.1.2
  • torchmetrics 0.10.3
  • tqdm 4.66.1
  • transformers 4.36.2
  • triton 2.1.0
  • typing-extensions 4.9.0
  • tzdata 2023.4
  • urllib3 2.1.0
  • yarl 1.9.4
  • zipp 3.17.0
COMET/pyproject.toml pypi
  • coverage ^5.5 develop
  • scikit-learn ^1.0 develop
  • sphinx-markdown-tables 0.0.15 develop
  • entmax ^1.1
  • huggingface-hub >=0.19.3,<1.0
  • jsonargparse 3.13.1
  • numpy ^1.20.0
  • pandas >=1.4.1
  • protobuf ^4.24.4
  • python ^3.8.0
  • pytorch-lightning ^2.0.0
  • sacrebleu ^2.0.0
  • scipy ^1.5.4
  • sentencepiece ^0.1.96
  • torch >=1.6.0
  • torchmetrics ^0.10.2
  • transformers ^4.17