https://github.com/amazon-science/idioms-incontext-mt

idioms in context dataset

https://github.com/amazon-science/idioms-incontext-mt

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.7%) to scientific vocabulary

Keywords

idiomatic-expressions llm-evaluation machine-translation
Last synced: 9 months ago · JSON representation

Repository

idioms in context dataset

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: other
  • Default Branch: main
  • Homepage:
  • Size: 225 KB
Statistics
  • Stars: 5
  • Watchers: 10
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
idiomatic-expressions llm-evaluation machine-translation
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Idioms in Context Dataset

This repository contains the "Idioms in Context" dataset used in our ACL 2024 paper: The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities.

Description

The dataset consists of idiomatic expressions in context and their human-written translations. It covers 2 language pairs (English-German and English-Russian) with 3 translation directions: 1. English → German 2. German → English 3. Russian → English

The dataset is designed to evaluate the performance of large language models and machine translation systems in handling idiomatic expressions, which can be challenging due to their non-literal meanings.

Usage

If you use this dataset in your work, please cite our paper:

@misc{stap2024-idioms, title={The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities}, author={David Stap and Eva Hasler and Bill Byrne and Christof Monz and Ke Tran}, year={2024}, eprint={2405.20089}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2405.20089}, }

Security

See CONTRIBUTING for more information.

License

This dataset is licensed under the CC-BY-NC-4.0 License.

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
  • Watch event: 5
Last Year
  • Watch event: 5

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels