https://github.com/alan-turing-institute/arc-mtqe

Critical Error Detection for Machine Translation

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary

Last synced: 7 months ago · JSON representation

Repository

Critical Error Detection for Machine Translation

Basic Info

Host: GitHub
Owner: alan-turing-institute
License: mit
Language: Python
Default Branch: main
Homepage: https://zenodo.org/records/14639667
Size: 9.06 MB

Statistics

Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

Critical Error Detection for Machine Translation

Code to train and evaluate models for detecting critical errors in machine translations using only the original source text and the machine translated text as described in Knight et al. (2025).

Background
Approaches
Structure of this repository
Getting started
Useful links and files
Development

Background

The goal of critical error detection (CED) is to identify translated text that deviates in meaning from the original text. CED was introduced at the Conference on Machine Translation (WMT) 2021 quality estimation (QE) subtask (Specia et al.,2021), which also released a unique dataset of authentic critical error annotations in translations of Wikipedia comments. See also Knight et al. (2024) for a literature review on machine translation quality estimation (MTQE) including CED.

Approaches

Trained models

We used COMETKiwi-22 (Rei et al., 2022), which outputs quality scores between 0 and 1 (1=perfect translation).

For the baseline, we picked a binarisation threshold using the WMT dev data and used it to binarise COMETKiwi-22 predictions on the test data.

We also adapted COMETKiwi-22 for binary classification in the CEDModel class. Broadly, we tried two main training strategies: - Fine-tune CEDModel with the WMT released authentic training data - Pre-train the CEDModel with syntethic data from the DEMETR dataset (Karpinska et al., 2022) and then fine-tune with the WMT authentic data

See the notes/ directory for an overview of the different training strategies and the scripts/README file on how to train models.

LLM prompts

We tried three LLM prompts:
- The basic prompt asks if the translation has the same meaning as the original text
- GEMBA-MQM from Kocmi and Federmann (2024)
- Using the original WMT annotator guidelines from Specia et al.,2021

Structure of this repository

├── configs/ -- configs used for training experiments │ ├── ... ├── notes/ -- includes overview of training strategies │ ├── ... ├── notebooks/ -- plots and tables of results │ ├── ... ├── predictions/ced_data/ -- predictions on the test (and dev) data │ ├── ... ├── scripts/ -- training, prediction and evaluation code │ ├── ... ├── src/ -- model and prompt implementations │ ├── ...

Getting started

Set up

Clone this repository and change the current working directory.

bash git clone https://github.com/alan-turing-institute/ARC-MTQE.git cd ARC-MTQE

Install dependencies and pre-commit hooks with Poetry:

bash make setup

Data

Download and preprocess datasets:

bash make data

This adds the following directories:

├── data/ │ ├── ... -- downloaded data files │ ├── preprocessed/ -- preprocessed data used in experiments

See the notes/ directory for an overview of the datasets that will be downloaded when this command is run.

HuggingFace

To use COMETKiwi, you need a HuggingFace account and access token (they're under https://huggingface.co/settings/tokens in your account settings). Log in to the HuggingFace CLI which will request the token:

bash poetry run huggingface-cli login

To use any of the COMET models, you must also acknowledge their license on the HuggingFace page: - COMETKiwi-22

WandB

We use WandB to track experiments. It is necessary to login first (you should only need to do this once). The below code will prompt you for an API key, which you can find in the User Settings:

python import wandb wandb.login()

OpenAI

To make predictions using GPT, you need an OpenAI API key saved as an environment variable named OPENAIAPIKEY. To do this in a Mac terminal:

export OPENAI_API_KEY="your_api_key"

Training, predictions and evaluation

Follow instructions in the scripts/README.

Useful links and files

Overview of available COMET models.
Notes on the COMET codebase that our trained CEDModel inherits from.
Instructions for using Baskerville's Tier 2 HPC service to train models.

Development

The code base could be updated to use models other than COMETKiwi-22. This would require an update to the loadmodelfrom_file which is currently hard-coded to download COMETKiwi-22:

python model_path = download_model("Unbabel/wmt22-cometkiwi-da")

This could be updated to allow for the pre-trained QE model to be changed to, for example, COMETKiwi-23-XL or COMETKiwi-23-XXL.

This would also require updating the encoder related hyperparameters in the config file (e.g., encoder_model: XLM-RoBERTa-XL).

Owner

Name: The Alan Turing Institute
Login: alan-turing-institute
Kind: organization
Email: info@turing.ac.uk

Website: https://turing.ac.uk
Repositories: 477
Profile: https://github.com/alan-turing-institute

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total

Push event: 2

Last Year

Push event: 2

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 121
Total pull requests: 74
Average time to close issues: 19 days
Average time to close pull requests: 9 days
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 1.83
Average comments per pull request: 1.43
Merged pull requests: 64
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 108
Pull requests: 72
Average time to close issues: 17 days
Average time to close pull requests: 9 days
Issue authors: 2
Pull request authors: 2
Average comments per issue: 1.81
Average comments per pull request: 1.42
Merged pull requests: 62
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

radka-j (42)
joannacknight (16)

Pull Request Authors

radka-j (25)
joannacknight (19)

Top Labels

Issue Labels

enhancement (10) documentation (8) trained model (6) infrastructure (5) data (4) bug (2) baseline (1) LLMs (1) write-up (1)

Pull Request Labels

Dependencies

pyproject.toml pypi

black ^23.9.1 develop
flake8 ^6.1.0 develop
ipykernel ^6.25.2 develop
isort ^5.12.0 develop
pre-commit ^3.4.0 develop
pytest ^7.4.2 develop
pytest-mock ^3.11.1 develop
GitPython ^3.1.43
huggingface-hub ^0.21.3
ipywidgets ^8.1.1
jupyter ^1.0.0
matplotlib ^3.8.0
openai ^1.14.0
python >=3.9,<4.0.0
pyyaml ^6.0.1
scikit-learn ^1.4.0
unbabel-comet ^2.2.1
wandb ^0.16.4

https://github.com/alan-turing-institute/arc-mtqe

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Critical Error Detection for Machine Translation

Table of contents

Background

Approaches

Trained models

LLM prompts

Structure of this repository

Getting started

Set up

Data

HuggingFace

WandB

OpenAI

Training, predictions and evaluation

Useful links and files

Development

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies