https://github.com/bramvanroy/mantis

Segmentation interface for the TPR-DB to manually tokenize and sentence segment

https://github.com/bramvanroy/mantis

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Segmentation interface for the TPR-DB to manually tokenize and sentence segment

Basic Info
  • Host: GitHub
  • Owner: BramVanroy
  • License: apache-2.0
  • Language: JavaScript
  • Default Branch: main
  • Size: 381 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Mantis

Segmentation interface for the TPR-DB to manually tokenize and sentence segment your data

Some parts of this repository are still under construction. For questions, feel free to reach out.

Installation

For the backend, install the requirements.txt file. For the frontend, run npm start in its respective folder.

Citation

Vanroy, B. and Macken, L. (2022). LeConTra: A Learner Corpus of English-to-Dutch News Translation. In Proceedings of the Language Resources and Evaluation Conference, pages 1807-1816, Marseille, France. European Language Resources Association.

```bibtex @InProceedings{vanroy-macken:2022:LREC, author = {Vanroy, Bram and Macken, Lieve}, title = {LeConTra: A Learner Corpus of English-to-Dutch News Translation}, booktitle = {Proceedings of the Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {1807--1816}, abstract = {We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master's programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.}, url = {https://aclanthology.org/2022.lrec-1.192} }

Owner

  • Name: Bram Vanroy
  • Login: BramVanroy
  • Kind: user
  • Location: Belgium
  • Company: @CCL-KULeuven @instituutnederlandsetaal

👋 My name is Bram and I work on natural language processing and machine translation (evaluation) but I also spend a lot of time in this open-source world 🌍

Citation (CITATION)

@InProceedings{vanroy-macken:2022:LREC,
  author    = {Vanroy, Bram  and  Macken, Lieve},
  title     = {LeConTra: A Learner Corpus of English-to-Dutch News Translation},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {1807--1816},
  abstract  = {We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master’s programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.},
  url       = {https://aclanthology.org/2022.lrec-1.192}
}

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • BramVanroy (2)
Top Labels
Issue Labels
Pull Request Labels