https://github.com/cdli-gh/unsupervised_nmt

https://github.com/cdli-gh/unsupervised_nmt

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: cdli-gh
  • Language: Python
  • Default Branch: master
  • Size: 98.3 MB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 6 years ago · Last pushed about 6 years ago
Metadata Files
Readme

README.md

Sumerian - English Machine Translation

Implementation for Dual Learning for Machine Translation for the Sumerian-English machine translation using pytorch. NMT models used as here are heavily depend on pcyin/pytorch_nmt.]

Dataset

Parallel dataset for the project is taken from CDLI GSoC-2019 Sumerian-English NMT project. Monolingual Sumerian data is available on CDLI Daily Bulk Data Dump, and monolingual English data extracted from Europarl: Parallel Corpus

Installation

Clone the repository. (We are assuming you have python version 3.6.x and pip is installed on your linux system) (Optional)If not, please use the below command, this will create a new environment using conda.

conda create -n env python=3.6 conda activate env All dependencies can be installed via: pip install -r requirements.txt NOTE: If you have MemoryError in the install try to use: pip install -r requirements.txt --no-cache-dir Note that Project currently support PyTorch >= 1.4. Please check the version before processding. python -c "import torch; print(torch.__version__)"

Dual Learning Step

During the reinforcement learning process, it will gain rewards from language models and translation models, and update the translation models. \ You can find more details in the paper.

  • Training \ You can simply use this script, you have to modify the path and name to your models.
  • Test \ To use the trained models, you can just treat it as NMT models.

Test (Basic)

Firstly, we trained our basic model with 10K bilingual Sumerian-English pair. Then, we set up a dual-learning game, and trained two models using reinforcement technique.

  • Reward
    • language model reward: average over square rooted length of string
    • final reward: rk = 0.06 x r1 + 0.94 x r2

Owner

  • Name: CDLI
  • Login: cdli-gh
  • Kind: organization
  • Email: cdli@orinst.ox.ac.uk
  • Location: Los Angeles, Oxford, Berlin

GitHub Events

Total
Last Year

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 4
  • Total Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Your Name y****u@e****m 4

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels