https://github.com/google-deepmind/xquad

https://github.com/google-deepmind/xquad

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: google-deepmind
  • Default Branch: master
  • Size: 3.13 MB
Statistics
  • Stars: 195
  • Watchers: 11
  • Forks: 39
  • Open Issues: 3
  • Releases: 0
Created over 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme

README.md

XQuAD

XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.

For more information on how the dataset was created, refer to our paper, On the Cross-lingual Transferability of Monolingual Representations.

All files are in json format following the SQuAD dataset format. A parallel example in XQuAD in English, Spanish, and Chinese can be seen in the image below. The full dataset consists of 240 such parallel instances in 11 languages.

Update: Added SQuAD v1.1 professionally translated to Romanian.

An example from XQuAD

Data

This directory contains files in the following languages: - Arabic: xquad.ar.json - German: xquad.de.json - Greek: xquad.el.json - English: xquad.en.json - Spanish: xquad.es.json - Hindi: xquad.hi.json - Russian: xquad.ru.json - Thai: xquad.th.json - Turkish: xquad.tr.json - Vietnamese: xquad.vi.json - Chinese: xquad.zh.json - Romanian: xquad.ro.json (newly added; not included in the original XQuAD)

As the dataset is based on SQuAD v1.1, there are no unanswerable questions in the data. We chose this setting so that models can focus on cross-lingual transfer.

We show the average number of tokens per paragraph, question, and answer for each language in the table below. The statistics were obtained using Jieba for Chinese and the Moses tokenizer for the other languages.

| | en | es | de | el | ru | tr | ar | vi | th | zh | hi | |-----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | Paragraph | 142.4 | 160.7 | 139.5 | 149.6 | 133.9 | 126.5 | 128.2 | 191.2 | 158.7 | 147.6 | 232.4 | | Question | 11.5 | 13.4 | 11.0 | 11.7 | 10.0 | 9.8 | 10.7 | 14.8 | 11.5 | 10.5 | 18.7 | | Answer | 3.1 | 3.6 | 3.0 | 3.3 | 3.1 | 3.1 | 3.1 | 4.5 | 4.1 | 3.5 | 5.6 |

Training and evaluation

In order to evaluate on XQuAD, models should be trained on the SQuAD v1.1 training file. which can be downloaded from here. Model validation similarly should be conducted on the SQuAD v1.1 validation file.

For evaluation, we use the official SQuAD evaluate-v1.1.py script, which can be obtained from here. Note that the SQuAD evaluation script normalises the answer based on heuristics that are specific to English. We have observed language-specific normalisation heuristics only to have a marginal impact on performance, which is why we use the English SQuAD v1.1 evaluation script for convenience.

Baselines

We show results using baseline methods in the tables below. We directly fine-tune mBERT and XLM-R Large on the English SQuAD v1.1 training data and evaluate them via zero-shot transfer on the XQuAD test datasets. For translate-train, we fine-tune mBERT on the SQuAD v1.1 training data, which we automatically translate to the target language. For translate-test, we fine-tune BERT-Large on the SQuAD v1.1 training set and evaluate it on the XQuAD test set of the target language, which we automatically translate to English. Note that results with translate-test are not directly comparable as we drop a small number (less than 3%) of the test examples.

F1 scores:

| Model | en | ar | de | el | es | hi | ru | th | tr | vi | zh | ro | avg | |-----------------------|------|------|------|------|------|------|------|------|------|------|------|------|------| | mBERT | 83.5 | 61.5 | 70.6 | 62.6 | 75.5 | 59.2 | 71.3 | 42.7 | 55.4 | 69.5 | 58.0 | 72.7 | 65.2 | | XLM-R Large | 86.5 | 68.6 | 80.4 | 79.8 | 82.0 | 76.7 | 80.1 | 74.2 | 75.9 | 79.1 | 59.3 | 83.6 | 77.2 | | Translate-train mBERT | 83.5 | 68.0 | 75.6 | 70.0 | 80.2 | 69.6 | 75.0 | 36.9 | 68.9 | 75.6 | 66.2 | - | 70.0 | | Translate-test BERT-L | 87.9 | 73.7 | 79.8 | 79.4 | 82.0 | 74.9 | 79.9 | 64.6 | 67.4 | 76.3 | 73.7 | - | 76.3 |

EM scores:

| Model | en | ar | de | el | es | hi | ru | th | tr | vi | zh | ro | avg | |-----------------------|------|------|------|------|------|------|------|------|------|------|------|------|------| | mBERT | 72.2 | 45.1 | 54.0 | 44.9 | 56.9 | 46.0 | 53.3 | 33.5 | 40.1 | 49.6 | 48.3 | 59.9 | 50.3 | | XLM-R Large | 75.7 | 49.0 | 63.4 | 61.7 | 63.9 | 59.7 | 64.3 | 62.8 | 59.3 | 59.0 | 50.0 | 69.7 | 61.5 | | Translate-train mBERT | 72.2 | 51.1 | 60.7 | 53.0 | 63.1 | 55.4 | 59.7 | 33.5 | 54.8 | 56.2 | 56.6 | - | 56.0 | | Translate-test BERT-L | 77.1 | 58.8 | 66.7 | 65.5 | 68.4 | 60.1 | 66.7 | 50.0 | 49.6 | 61.5 | 59.1 | - | 62.1 |

Best practices

XQuAD is intended as an evaluation corpus for zero-shot cross-lingual transfer. Evaluation on the test data should ideally only be conducted at the very end of the experimentation in order to avoid overfitting to the data.

If you are evaluating on XQuAD in the zero-shot setting, please state explicitly your experimental settings, particularly what monolingual and cross-lingual data you used for pre-training and fine-tuning your model.

Reference

If you use this dataset, please cite [1]:

[1] Artetxe, M., Ruder, S., & Yogatama, D. (2019). On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856.

@article{Artetxe:etal:2019, author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama}, title = {On the cross-lingual transferability of monolingual representations}, journal = {CoRR}, volume = {abs/1910.11856}, year = {2019}, archivePrefix = {arXiv}, eprint = {1910.11856} }

The Romanian version of this data is part of LiRo, a benchmark for Romanian natural language understanding tasks: @inproceedings{ dumitrescu2021liro, title={LiRo: Benchmark and leaderboard for Romanian language tasks}, author={Stefan Daniel Dumitrescu and Petru Rebeja and Beata Lorincz and Mihaela Gaman and Andrei Avram and Mihai Ilie and Andrei Pruteanu and Adriana Stan and Lorena Rosia and Cristina Iacobescu and Luciana Morogan and George Dima and Gabriel Marchidan and Traian Rebedea and Madalina Chitez and Dani Yogatama and Sebastian Ruder and Radu Tudor Ionescu and Razvan Pascanu and Viorica Patraucean}, booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}, year={2021}, url={https://openreview.net/forum?id=JH61CD7afTv} }

License

This dataset is distributed under the CC BY-SA 4.0 license.

This is not an officially supported Google product.

Owner

  • Name: Google DeepMind
  • Login: google-deepmind
  • Kind: organization

GitHub Events

Total
  • Watch event: 23
Last Year
  • Watch event: 22

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 10
  • Total Committers: 3
  • Avg Commits per committer: 3.333
  • Development Distribution Score (DDS): 0.5
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Sebastian Ruder r****r@g****m 5
dyogatama d****a 4
Louise Deason d****n@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 5
  • Total pull requests: 0
  • Average time to close issues: 5 months
  • Average time to close pull requests: N/A
  • Total issue authors: 5
  • Total pull request authors: 0
  • Average comments per issue: 0.8
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tungnt55 (1)
  • josecannete (1)
  • ofrimasad (1)
  • Liangtaiwan (1)
  • aiquinones (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels