ml-translate-vis

Angler: Machine Translation Visualization (CHI 2023)

https://github.com/apple/ml-translate-vis

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary

Keywords

data-visualization machine-translation visual-analytics

Last synced: 6 months ago · JSON representation ·

Repository

Angler: Machine Translation Visualization (CHI 2023)

Basic Info

Host: GitHub
Owner: apple
License: other
Language: TypeScript
Default Branch: main
Homepage: https://apple.github.io/ml-translate-vis/
Size: 40.6 MB

Statistics

Stars: 67
Watchers: 7
Forks: 0
Open Issues: 0
Releases: 0

Topics

data-visualization machine-translation visual-analytics

Created almost 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License Code of conduct Citation

Angler: Machine Translation Visualization

Angler is an interactive visualization system that helps machine translation (MT) engineers and researchers explore and curate challenge sets to improve their models and data. Challenge sets (sometimes called "golden sets" or "aggressor tests") are often small, curated sets of important data samples, which ML practitioners use to validate and monitor an ML model's behavior. We used Angler to understand how ML practitioners prioritize model improvements when the input space is infinite and obtaining reliable signals of model quality is expensive.

This code accompanies the research paper:

Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson, Zijie J. Wang, Dominik Moritz, Mary Beth Kery, Fred Hohman
ACM Conference on Human Factors in Computing Systems (CHI), 2023.
Paper, Interactive demo, Code, *Contributed equally

How to Use Angler?

Main Features

Visually explore machine translation data over time
Compare translation datasets (e.g., usage log data versus training data)
Surface potentially interesting and critical data samples with two sources:
- Model's unfamiliar topics
- Failure cases of model unit tests

Table View

Each row in the table represents a challenge set. Each set contains English-to-Chinese translation pairs from a translation dataset composed from 4 open source datasets: scientific_papers (corpus of full-text scientific articles), tatoeba (open-source translation data), umass_global (English language tweets from 2014-2016), and wmt_chat (customer service chat histories). These sets are generated because they contain unfamiliar topics or failed certain unit tests.

To sort the challenge sets, click any metric in the table header. The metrics include:

Challenge Set Preview

To preview a challenge set, click any row in teh Table View. The preview includes 100 sentences and the most representative keywords from the challenge set.

Detail View

To see more details about a particular challenge set, click the Show Details button to open the Detail View.

The Detail View provides multiple visualizations to help users explore a particular challenge set. The visualizations include:

To focus on sentences with interesting attributes (e.g., from a particular time, with low familiarity, or from a specific dataset), users can create filters by brushing or clicking throughout the visualizations.

Development

To build and develop Angler locally:

```bash

Install dependencies

npm install

Start a localhost server

npm run dev

Navigate to localhost:5173 in any browser

```

Contributing

When making contributions, refer to the CONTRIBUTING guidelines and read the CODE OF CONDUCT.

BibTeX

To cite our paper, please use:

bibtex @inproceedings{robertson2023angler, title={Angler: Helping Machine Translation Practitioners Prioritize Model Improvements}, author={Robertson, Samantha and Wang, Zijie J. and Moritz, Dominik and Kery, Mary Beth and Hohman, Fred}, booktitle={Proceedings of the SIGCHI Conference on Human Factors in Computing Systems}, year={2023}, organization={ACM}, doi={10.1145/3544548.3580790} }

License

This code is released under the LICENSE terms.

Owner

Name: Apple
Login: apple
Kind: organization
Location: Cupertino, CA

Website: https://apple.com
Repositories: 305
Profile: https://github.com/apple

Citation (CITATION.cff)

cff-version: 1.2.0
message: 'If you use this software, please cite it as below.'
authors:
  - family-names: 'Robertson'
    given-names: 'Samantha'
  - family-names: 'Wang'
    given-names: 'Zijie J.'
  - family-names: 'Moritz'
    given-names: 'Dominik'
  - family-names: 'Kery'
    given-names: 'Mary Beth'
  - family-names: 'Hohman'
    given-names: 'Fred'
title: 'Angler: Helping Machine Translation Practitioners Prioritize Model Improvements'
version: 1.0.0
date-released: 2023-04-07
url: 'https://github.com/apple/ml-translate-vis'
preferred-citation:
  type: conference-article
  authors:
    - family-names: 'Robertson'
      given-names: 'Samantha'
    - family-names: 'Wang'
      given-names: 'Zijie J.'
    - family-names: 'Moritz'
      given-names: 'Dominik'
    - family-names: 'Kery'
      given-names: 'Mary Beth'
    - family-names: 'Hohman'
      given-names: 'Fred'
  doi: '10.1145/3544548.3580790'
  journal: 'Proceedings of the SIGCHI Conference on Human Factors in Computing Systems'
  title: 'Angler: Helping Machine Translation Practitioners Prioritize Model Improvements'
  year: 2023
  organization: 'ACM'

GitHub Events

Total

Watch event: 3

Last Year

Watch event: 3

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 2 minutes
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0