https://github.com/cdli-gh/unsupervised_nmt
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: cdli-gh
- Language: Python
- Default Branch: master
- Size: 98.3 MB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Sumerian - English Machine Translation
Implementation for Dual Learning for Machine Translation for the Sumerian-English machine translation using pytorch. NMT models used as here are heavily depend on pcyin/pytorch_nmt.]
Dataset
Parallel dataset for the project is taken from CDLI GSoC-2019 Sumerian-English NMT project. Monolingual Sumerian data is available on CDLI Daily Bulk Data Dump, and monolingual English data extracted from Europarl: Parallel Corpus
Installation
Clone the repository. (We are assuming you have python version 3.6.x and pip is installed on your linux system) (Optional)If not, please use the below command, this will create a new environment using conda.
conda create -n env python=3.6
conda activate env
All dependencies can be installed via:
pip install -r requirements.txt
NOTE: If you have MemoryError in the install try to use:
pip install -r requirements.txt --no-cache-dir
Note that Project currently support PyTorch >= 1.4.
Please check the version before processding.
python -c "import torch; print(torch.__version__)"
Dual Learning Step
During the reinforcement learning process, it will gain rewards from language models and translation models, and update the translation models. \ You can find more details in the paper.
- Training \ You can simply use this script, you have to modify the path and name to your models.
- Test \ To use the trained models, you can just treat it as NMT models.
Test (Basic)
Firstly, we trained our basic model with 10K bilingual Sumerian-English pair. Then, we set up a dual-learning game, and trained two models using reinforcement technique.
- Reward
- language model reward: average over square rooted length of string
- final reward:
rk = 0.06 x r1 + 0.94 x r2
Owner
- Name: CDLI
- Login: cdli-gh
- Kind: organization
- Email: cdli@orinst.ox.ac.uk
- Location: Los Angeles, Oxford, Berlin
- Website: https://cdli.ucla.edu
- Repositories: 83
- Profile: https://github.com/cdli-gh
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0