argos-train

Training scripts for Argos Translate

https://github.com/argosopentech/argos-train

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Training scripts for Argos Translate

Basic Info
Statistics
  • Stars: 138
  • Watchers: 3
  • Forks: 24
  • Open Issues: 20
  • Releases: 0
Created almost 6 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Argos Train

Argos Train trains an OpenNMT PyTorch Transformer model and a SentencePiece tokenizer and packages them with Stanza data as an Argos Translate package. Argos Translate packages, which are zip archives with a .argosmodel extension, can be used with Argos Translate and LibreTranslate.

Pre-trained Argos Translate packages are available for download. If you have trained packages you're willing to share please get in contact so that they can be published on the Argos Translate package index.

LibreTranslate/Locomotive has similar functionality to Argos Train and can also be used to train translation models.

Training example

From inside argosopentech/argostrain Docker container:

``` su argosopentech cd /home/argosopentech export HOME="/home/argosopentech" source ~/argos-train-init

...

$ argos-train From code (ISO 639): en To code (ISO 639): es From name: English To name: Spanish Version: 1.0

...

Package saved to /home/argosopentech/argos-train/run/en_es.argosmodel ```

Data

Data from data-index.json is used for training. Argos Translate primarily uses data from the Opus project.

To train a model with custom data add your data to data-index.json after running argos-train-init with a link to download your custom data package. Data packages are zipped directories with a .argosdata extension (example) that contain a source and target file with parallel data in corresponding lines and a metadata.json file. The data packages are downloaded with HTTP and you will need to run a web server like Nginx to host custom data.

You can also manually load data by putting your data at run/source and run/target and setting data_exists=True in bin/argos-train.

You can use this project to automatically download data from Opus.

Docker

Docker image available at argosopentech/argostrain.

``` docker run -it argosopentech/argostrain /bin/bash

```

Run training

``` argos-train

```

Environment

CUDA required, tested on vast.ai.

Vast.ai seems to reckognize the CUDA version of the Docker container incorrectly so you may need to check the "Incompatible Machines" option if you're using vast.ai.

Manually creating an Argos Translate package

If you don't want to use Argos Train you can manually train a model with OpenNMT and package it for Argos Translate. Argos Translate packages are a zip archive with a .argosmodel extension containing; a CTranslate2 model, a SentencePiece model, a Stanza 1.1.1 model, and a metadata file. Reference the training script at bin/argos-train for more information.

Documentation

Contributing

Contributions are welcome! Please make a pull request.

Roadmap

License

Licensed under either the MIT or Creative Commons CC0 License

Owner

  • Name: Argos Open Tech
  • Login: argosopentech
  • Kind: user
  • Location: Ithaca, NY

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Argos Translate
message: >-
  Open-source offline translation library written in
  Python
type: software
authors:
  - given-names: P.J.
    family-names: Finlay
    email: admin@argosopentech.com
    affiliation: Argos Open Tech

GitHub Events

Total
  • Issues event: 4
  • Watch event: 20
  • Issue comment event: 4
  • Push event: 25
  • Fork event: 3
Last Year
  • Issues event: 4
  • Watch event: 20
  • Issue comment event: 4
  • Push event: 25
  • Fork event: 3

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 498
  • Total Committers: 7
  • Avg Commits per committer: 71.143
  • Development Distribution Score (DDS): 0.078
Past Year
  • Commits: 24
  • Committers: 3
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.125
Top Committers
Name Email Commits
P.J. Finlay p****t@g****m 459
Argos Open Technologies, LLC a****n@a****m 32
mmokhi m****i@f****g 3
Talles Airan a****s@g****m 1
Piero Toffanin pt@m****m 1
Dingedi 6****i 1
Andrii Liakh l****i@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 52
  • Total pull requests: 12
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 28
  • Total pull request authors: 5
  • Average comments per issue: 1.63
  • Average comments per pull request: 0.25
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 6
  • Pull request authors: 0
  • Average comments per issue: 0.17
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • PJ-Finlay (5)
  • argosopentech (4)
  • johnpaulbin (2)
  • Allesanddro (2)
  • robinj (1)
  • endingisnight (1)
  • MordicusEtCubitus (1)
  • geekscrapy (1)
  • ahirsbrunner (1)
  • ayushi-kukreja47 (1)
  • dingedi (1)
  • JanCizmar (1)
  • menachem-dev (1)
  • Mamooshe (1)
  • leogitpro (1)
Pull Request Authors
  • mmokhi (3)
  • tallesairan (1)
  • dingedi (1)
  • liakhandrii (1)
Top Labels
Issue Labels
enhancement (6) help wanted (6) good first issue (6) bug (2)
Pull Request Labels

Dependencies

requirements.txt pypi
  • ctranslate2 ==2.10.1
  • stanza ==1.1.1
Dockerfile docker
  • ubuntu latest build
requirements-dev.txt pypi
  • black * development
  • isort * development
setup.py pypi