interlinear-translation
Morphology-enhanced neural models for Ancient Greek interlinear translation, achieving 35-38% BLEU improvements for English and Polish translations. Includes custom T5 implementations and training code. [LoResLM@COLING2025]
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Keywords
Repository
Morphology-enhanced neural models for Ancient Greek interlinear translation, achieving 35-38% BLEU improvements for English and Polish translations. Includes custom T5 implementations and training code. [LoResLM@COLING2025]
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
loreslm-interlinear-translation
This repository contains resources for the paper "Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek" presented at the LoResLM@COLING2025 workshop.
Quick Links
Overview
We present a novel approach to interlinear translation from Ancient Greek to English and Polish using morphology-enhanced neural models. Our experiments involved fine-tuning T5-family models in 144 configurations, achieving significant improvements:
- English: 35% BLEU score improvement (44.67 → 60.40)
- Polish: 38% BLEU score improvement (42.92 → 59.33)
Resources
Code
- morpht5 - A package including morphology-enhanced T5 model implementations
- Training Code - Scripts used for model training and evaluation
Models
Model performance summaries by target language: - English Models - Polish Models
License
This work is licensed under CC BY-NC-SA 4.0.
Citation
bibtex
@inproceedings{rapacz-smywinski-pohl-2025-low,
title = "Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for {A}ncient {G}reek",
author = "Rapacz, Maciej and
Smywi{\'n}ski-Pohl, Aleksander",
editor = "Hettiarachchi, Hansi and
Ranasinghe, Tharindu and
Rayson, Paul and
Mitkov, Ruslan and
Gaber, Mohamed and
Premasiri, Damith and
Tan, Fiona Anting and
Uyangodage, Lasitha",
booktitle = "Proceedings of the First Workshop on Language Models for Low-Resource Languages",
month = jan,
year = "2025",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.loreslm-1.11/",
pages = "145--165",
abstract = "Contemporary machine translation systems prioritize fluent, natural-sounding output with flexible word ordering. In contrast, interlinear translation maintains the source text`s syntactic structure by aligning target language words directly beneath their source counterparts. Despite its importance in classical scholarship, automated approaches to interlinear translation remain understudied. We evaluated neural interlinear translation from Ancient Greek to English and Polish using four transformer-based models: two Ancient Greek-specialized (GreTa and PhilTa) and two general-purpose multilingual models (mT5-base and mT5-large). Our approach introduces novel morphological embedding layers and evaluates text preprocessing and tag set selection across 144 experimental configurations using a word-aligned parallel corpus of the Greek New Testament. Results show that morphological features through dedicated embedding layers significantly enhance translation quality, improving BLEU scores by 35{\%} (44.67 {\textrightarrow} 60.40) for English and 38{\%} (42.92 {\textrightarrow} 59.33) for Polish compared to baseline models. PhilTa achieves state-of-the-art performance for English, while mT5-large does so for Polish. Notably, PhilTa maintains stable performance using only 10{\%} of training data. Our findings challenge the assumption that modern neural architectures cannot benefit from explicit morphological annotations. While preprocessing strategies and tag set selection show minimal impact, the substantial gains from morphological embeddings demonstrate their value in low-resource scenarios."
}
Owner
- Name: Maciej Rapacz
- Login: mrapacz
- Kind: user
- Repositories: 10
- Profile: https://github.com/mrapacz
Citation (citation.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek"
authors:
- given-names: "Maciej"
family-names: "Rapacz"
- given-names: "Aleksander"
family-names: "Smywiński-Pohl"
version: "1.0.0"
date-released: "2025-01"
abstract: "Contemporary machine translation systems prioritize fluent, natural-sounding output with flexible word ordering. In contrast, interlinear translation maintains the source text`s syntactic structure by aligning target language words directly beneath their source counterparts. Despite its importance in classical scholarship, automated approaches to interlinear translation remain understudied. We evaluated neural interlinear translation from Ancient Greek to English and Polish using four transformer-based models: two Ancient Greek-specialized (GreTa and PhilTa) and two general-purpose multilingual models (mT5-base and mT5-large). Our approach introduces novel morphological embedding layers and evaluates text preprocessing and tag set selection across 144 experimental configurations using a word-aligned parallel corpus of the Greek New Testament. Results show that morphological features through dedicated embedding layers significantly enhance translation quality, improving BLEU scores by 35% (44.67 → 60.40) for English and 38% (42.92 → 59.33) for Polish compared to baseline models. PhilTa achieves state-of-the-art performance for English, while mT5-large does so for Polish. Notably, PhilTa maintains stable performance using only 10% of training data. Our findings challenge the assumption that modern neural architectures cannot benefit from explicit morphological annotations. While preprocessing strategies and tag set selection show minimal impact, the substantial gains from morphological embeddings demonstrate their value in low-resource scenarios."
keywords:
- "low-resource"
- "interlinear translation"
- "neural models"
- "Ancient Greek"
- "morphology"
- "machine translation"
- "natural language processing"
- "nlp"
- "deep-learning"
- "transformers"
- "neural-networks"
- "coling"
- "t5"
- "coling2025"
license: "CC-BY-NC-SA-4.0"
repository-code: "https://github.com/mrapacz/interlinear-translation"
preferred-citation:
type: conference-paper
authors:
- given-names: "Maciej"
family-names: "Rapacz"
- given-names: "Aleksander"
family-names: "Smywiński-Pohl"
title: "Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek"
year: 2025
booktitle: "Proceedings of the First Workshop on Language Models for Low-Resource Languages"
publisher: "Association for Computational Linguistics"
address: "Abu Dhabi, United Arab Emirates"
editors:
- given-names: "Hansi"
family-names: "Hettiarachchi"
- given-names: "Tharindu"
family-names: "Ranasinghe"
- given-names: "Paul"
family-names: "Rayson"
- given-names: "Ruslan"
family-names: "Mitkov"
- given-names: "Mohamed"
family-names: "Gaber"
- given-names: "Damith"
family-names: "Premasiri"
- given-names: "Fiona Anting"
family-names: "Tan"
- given-names: "Lasitha"
family-names: "Uyangodage"
month: 1
pages: "145--165"
url: "https://aclanthology.org/2025.loreslm-1.11/"
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Dependencies
- 204 dependencies
- ipdb ^0.13.13 develop
- ipython ^8.29.0 develop
- jupyterlab ^4.2.5 develop
- matplotlib ^3.9.2 develop
- mypy ^1.13.0 develop
- pre-commit ^4.0.1 develop
- pytest ^8.3.3 develop
- ruff ^0.7.1 develop
- seaborn ^0.13.2 develop
- datasets ^3.0.2
- evaluate ^0.4.3
- levenshtein ^0.26.1
- loguru ^0.7.2
- more-itertools ^10.5.0
- neptune ^1.12.0
- numpy <2
- prettytable ^3.12.0
- python ^3.11
- rouge-score ^0.1.2
- sacrebleu ^2.4.3
- sentencepiece ^0.2.0
- tensorboard ^2.18.0
- torch 2.2.2
- transformers 4.31.0
- typer ^0.12.5