Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Repository
Training scripts for Argos Translate
Basic Info
- Host: GitHub
- Owner: argosopentech
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://www.argosopentech.com
- Size: 341 KB
Statistics
- Stars: 138
- Watchers: 3
- Forks: 24
- Open Issues: 20
- Releases: 0
Metadata Files
README.md
Argos Train
Argos Train trains an OpenNMT PyTorch Transformer model and a SentencePiece tokenizer and packages them with Stanza data as an Argos Translate package. Argos Translate packages, which are zip archives with a .argosmodel extension, can be used with Argos Translate and LibreTranslate.
Pre-trained Argos Translate packages are available for download. If you have trained packages you're willing to share please get in contact so that they can be published on the Argos Translate package index.
LibreTranslate/Locomotive has similar functionality to Argos Train and can also be used to train translation models.
Training example
From inside argosopentech/argostrain Docker container:
``` su argosopentech cd /home/argosopentech export HOME="/home/argosopentech" source ~/argos-train-init
...
$ argos-train From code (ISO 639): en To code (ISO 639): es From name: English To name: Spanish Version: 1.0
...
Package saved to /home/argosopentech/argos-train/run/en_es.argosmodel ```
Data
Data from data-index.json is used for training. Argos Translate primarily uses data from the Opus project.
To train a model with custom data add your data to data-index.json after running argos-train-init with a link to download your custom data package. Data packages are zipped directories with a .argosdata extension (example) that contain a source and target file with parallel data in corresponding lines and a metadata.json file. The data packages are downloaded with HTTP and you will need to run a web server like Nginx to host custom data.
You can also manually load data by putting your data at run/source and run/target and setting data_exists=True in bin/argos-train.
You can use this project to automatically download data from Opus.
Docker
Docker image available at argosopentech/argostrain.
``` docker run -it argosopentech/argostrain /bin/bash
```
Run training
``` argos-train
```
Environment
CUDA required, tested on vast.ai.
Vast.ai seems to reckognize the CUDA version of the Docker container incorrectly so you may need to check the "Incompatible Machines" option if you're using vast.ai.
Manually creating an Argos Translate package
If you don't want to use Argos Train you can manually train a model with OpenNMT and package it for Argos Translate. Argos Translate packages are a zip archive with a .argosmodel extension containing; a CTranslate2 model, a SentencePiece model, a Stanza 1.1.1 model, and a metadata file. Reference the training script at bin/argos-train for more information.
Documentation
Contributing
Contributions are welcome! Please make a pull request.
Roadmap
License
Licensed under either the MIT or Creative Commons CC0 License
Owner
- Name: Argos Open Tech
- Login: argosopentech
- Kind: user
- Location: Ithaca, NY
- Website: www.argosopentech.com
- Twitter: argosopentech
- Repositories: 85
- Profile: https://github.com/argosopentech
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Argos Translate
message: >-
Open-source offline translation library written in
Python
type: software
authors:
- given-names: P.J.
family-names: Finlay
email: admin@argosopentech.com
affiliation: Argos Open Tech
GitHub Events
Total
- Issues event: 4
- Watch event: 20
- Issue comment event: 4
- Push event: 25
- Fork event: 3
Last Year
- Issues event: 4
- Watch event: 20
- Issue comment event: 4
- Push event: 25
- Fork event: 3
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| P.J. Finlay | p****t@g****m | 459 |
| Argos Open Technologies, LLC | a****n@a****m | 32 |
| mmokhi | m****i@f****g | 3 |
| Talles Airan | a****s@g****m | 1 |
| Piero Toffanin | pt@m****m | 1 |
| Dingedi | 6****i | 1 |
| Andrii Liakh | l****i@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 52
- Total pull requests: 12
- Average time to close issues: 2 months
- Average time to close pull requests: about 1 month
- Total issue authors: 28
- Total pull request authors: 5
- Average comments per issue: 1.63
- Average comments per pull request: 0.25
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 6
- Pull request authors: 0
- Average comments per issue: 0.17
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- PJ-Finlay (5)
- argosopentech (4)
- johnpaulbin (2)
- Allesanddro (2)
- robinj (1)
- endingisnight (1)
- MordicusEtCubitus (1)
- geekscrapy (1)
- ahirsbrunner (1)
- ayushi-kukreja47 (1)
- dingedi (1)
- JanCizmar (1)
- menachem-dev (1)
- Mamooshe (1)
- leogitpro (1)
Pull Request Authors
- mmokhi (3)
- tallesairan (1)
- dingedi (1)
- liakhandrii (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ctranslate2 ==2.10.1
- stanza ==1.1.1
- ubuntu latest build
- black * development
- isort * development