mateo-demo

MAchine Translation Evaluation Online (MATEO)

https://github.com/bramvanroy/mateo-demo

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

bertscore bleu bleurt chrf clarin comet machine-translation machine-translation-evaluation machine-translation-metrics streamlit ter
Last synced: 6 months ago · JSON representation ·

Repository

MAchine Translation Evaluation Online (MATEO)

Basic Info
  • Host: GitHub
  • Owner: BramVanroy
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://mateo.ivdnt.org/
  • Size: 451 KB
Statistics
  • Stars: 21
  • Watchers: 1
  • Forks: 3
  • Open Issues: 3
  • Releases: 2
Topics
bertscore bleu bleurt chrf clarin comet machine-translation machine-translation-evaluation machine-translation-metrics streamlit ter
Created almost 3 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

MAchine Translation Evaluation Online (MATEO)

HF Spaces shield License shield Code style black Built with Streamlit

We present MAchine Translation Evaluation Online (MATEO), a project that aims to facilitate machine translation (MT) evaluation by means of an easy-to-use interface that can evaluate given machine translations with a battery of automatic metrics. It caters to both experienced and novice users who are working with MT, such as MT system builders, and also researchers from Social Sciences and Humanities, and teachers and students of (machine) translation.

MATEO can be accessed on this website, hosted by the CLARIN B center at Instituut voor de Nederlandse Taal. It is also available on Hugging Face Spaces.

If you use the MATEO interface for your work, please cite our project paper!

Vanroy, B., Tezcan, A., & Macken, L. (2023). MATEO: MAchine Translation Evaluation Online. In M. Nurminen, J. Brenner, M. Koponen, S. Latomaa, M. Mikhailov, F. Schierl, … H. Moniz (Eds.), Proceedings of the 24th Annual Conference of the European Association for Machine Translation (pp. 499–500). Tampere, Finland: European Association for Machine Translation (EAMT).

bibtex @inproceedings{vanroy-etal-2023-mateo, title = "{MATEO}: {MA}chine {T}ranslation {E}valuation {O}nline", author = "Vanroy, Bram and Tezcan, Arda and Macken, Lieve", booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation", month = jun, year = "2023", address = "Tampere, Finland", publisher = "European Association for Machine Translation", url = "https://aclanthology.org/2023.eamt-1.52", pages = "499--500", }

Self-hosting

The MATEO website is provided for free as a hosted application. That means that you, or anyone else, can use it. The implication is that it is possible that the service will be slow depending on the usage of the system. As such, specific attention was paid to making it easy for you to set up your own instance that you can use!

Duplicating a Hugging Face Spaces

MATEO is also running on the free platform of 🤗 Hugging Face in a so-called 'Space'. If you have an account (free) on that platform, you can easily duplicate the running MATEO instance to your own profile. That means that you can create a private duplication of the MATEO interface just for you and free of charge! You can simply click this link or, if that does not work, follow these steps:

  1. Go to the Space;
  2. in the top right (below your profile picture) you should click on the three vertical dots;
  3. choose 'Duplicate space', et voilà!, a new space should now be running on your own profile

Install locally with Python

You can clone and install the library on your own device (laptop, computer, server). I recommend to run this in a new virtual environment. It requires python >= 3.10.

Run the following commands:

shell git clone https://github.com/BramVanroy/mateo-demo.git cd mateo-demo python -m pip install .

Added in v1.6: an optional, advanced option has been added in version 1.6 that allows you to insert arbitrary HTML in the <head> of your web app. If you do not need that functionality you can skip this step.

This can be useful in case you want to add analytics tracking, for instance. Due to the nature of Streamlit, adding st.markdown may not work as you might expect since every user interaction will trigger a re-render of the page. In the case of counting page views, it may incorrectly inflate those numbers. Therefore, we directly patch Streamlit's index.html file. Note that this is a grave security risk - never add anything to the HTML file that you do not understand!!

To add your HTML contents to the head, e.g. a <script src="...>, create an HTML file with your content, e.g. my_content.html. Then run this script:

shell python scripts/patch_index_html.py --input_file my_content.html

This will back up the original script, too. If you ever want to restore that back up, just run:

shell python scripts/patch_index_html.py --restore

Now we can run MATEO!

shell cd src/mateo_st streamlit run 01_🎈_MATEO.py

The streamlit server will then start on your own computer. You can access the website via a local address, http://localhost:8501 by default.

Configuration options specific to Streamlit can be found here. They are more related to server-side configurations that you typically do not need when you are running this directly through Python. But you may need them when you are using Docker, e.g. setting the --server.port that streamlit is running on (see Docker).

A number of command-line arguments are available to change the interface to your needs.

shell --use_cuda whether to use CUDA for translation task (CUDA for metrics not supported) (default: False) --demo_mode when demo mode is enabled, only a limited range of neural check-points are available. So all metrics are available but not all of the checkpoints. (default: False)

These can be passed to the Streamlit launcher by adding a -- after the streamlit command and streamlit-specific options, followed by any of the options above.

For instance, if you want to run streamlit specifically on port 1234 and you want to use the demo mode, you can modify your command to look like this:

shell streamlit run 01_🎈_MATEO.py --server.port 1234 -- --demo_mode

Note the separating -- in the middle so that streamlit can distinguish between streamlit's own options and the MATEO configuration parameters.

Running with Docker

MATEO is easily run with Docker. For more information see the instructions in docker/instructions.md.

Tests

The tests are run using pytest and playwright. To ensure that the right dependencies are installed, you can run

shell python -m pip install -e .[dev]

Then, install the appropriate chromium version for playwright. You can do this by running the following command.

shell playwright install --with-deps chromium

Now you can run the tests by running the following command in the root directory of the project.

shell python -m pytest

Notes

Using CUDA

Using CUDA for the metrics is currently not supported. However, it is possible to use CUDA for the translation task. This can be done by setting the --use_cuda flag when running the Streamlit server. This will enable the use of CUDA for the translation task, but not for the metrics. The reason for this is the memory consumption since streamlit creates a separate instance for each user, the GPU may run OOM quickly and moving on/off devices is not feasible.

I have not found a solution for this yet. A queueing system would solve the issue with a separate backend and dedicated workers, but that defeats the purpose of having a simple, easy-to-use interface. It would also lead to the requirement of strong data for longer, which many users may not want to, considering that I've received many questions whether I save their data on disk (I don't - the current approach processes everything in memory).

Acknowledgements

This project was kickstarted by a Sponsorship project from the European Association for Machine Translation, and a substantial follow-up grant by the support of CLARIN.eu.

EAMT logo CLARIN logo

Owner

  • Name: Bram Vanroy
  • Login: BramVanroy
  • Kind: user
  • Location: Belgium
  • Company: @CCL-KULeuven @instituutnederlandsetaal

👋 My name is Bram and I work on natural language processing and machine translation (evaluation) but I also spend a lot of time in this open-source world 🌍

Citation (CITATION)

@inproceedings{vanroy-etal-2023-mateo,
    title = "{MATEO}: {MA}chine {T}ranslation {E}valuation {O}nline",
    author = "Vanroy, Bram  and
      Tezcan, Arda  and
      Macken, Lieve",
    booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation",
    month = jun,
    year = "2023",
    address = "Tampere, Finland",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2023.eamt-1.52",
    pages = "499--500",
}

GitHub Events

Total
  • Create event: 11
  • Release event: 3
  • Issues event: 3
  • Watch event: 6
  • Issue comment event: 16
  • Push event: 22
  • Fork event: 2
Last Year
  • Create event: 11
  • Release event: 3
  • Issues event: 3
  • Watch event: 6
  • Issue comment event: 16
  • Push event: 22
  • Fork event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 7
  • Total pull requests: 9
  • Average time to close issues: about 19 hours
  • Average time to close pull requests: less than a minute
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 1.14
  • Average comments per pull request: 0.0
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 8.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • BramVanroy (4)
  • clang88 (1)
Pull Request Authors
  • BramVanroy (13)
Top Labels
Issue Labels
enhancement (4) help wanted (1)
Pull Request Labels

Dependencies

docker/hf-spaces/Dockerfile docker
  • ubuntu latest build
pyproject.toml pypi
  • BLEURT @ git+https://github.com/google-research/bleurt.git@cebe7e6f996b40910cfaa520a63db47807e3bf5c
  • Levenshtein *
  • XlsxWriter ==3.1.7
  • altair ==5.1.2
  • bert-score ==0.3.13
  • datasets ==2.14.5
  • evaluate ==0.4.1
  • optimum ==1.13.2
  • pandas ==2.1.1
  • plotly ==5.17.0
  • sacrebleu [ja,ko]==2.3.1
  • sentencepiece ==0.1.97
  • streamlit ==1.27.2
  • tensorflow ==2.14.0
  • torch ==2.1.0
  • transformers ==4.33.3
  • unbabel-comet ==2.1.0
docker/cpu/Dockerfile docker
  • python 3.11-slim-bookworm build
docker/gpu/Dockerfile docker
  • nvidia/cuda 12.1.1-runtime-ubuntu22.04 build