https://github.com/google-deepmind/xquad

Last synced: 8 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: google-deepmind
Default Branch: master
Size: 3.13 MB

Statistics

Stars: 195
Watchers: 11
Forks: 39
Open Issues: 3
Releases: 0

Created over 6 years ago · Last pushed over 4 years ago

Metadata Files

Readme

XQuAD

XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.

For more information on how the dataset was created, refer to our paper, On the Cross-lingual Transferability of Monolingual Representations.

All files are in json format following the SQuAD dataset format. A parallel example in XQuAD in English, Spanish, and Chinese can be seen in the image below. The full dataset consists of 240 such parallel instances in 11 languages.

Update: Added SQuAD v1.1 professionally translated to Romanian.

An example from XQuAD

Data

This directory contains files in the following languages: - Arabic: xquad.ar.json - German: xquad.de.json - Greek: xquad.el.json - English: xquad.en.json - Spanish: xquad.es.json - Hindi: xquad.hi.json - Russian: xquad.ru.json - Thai: xquad.th.json - Turkish: xquad.tr.json - Vietnamese: xquad.vi.json - Chinese: xquad.zh.json - Romanian: xquad.ro.json (newly added; not included in the original XQuAD)

As the dataset is based on SQuAD v1.1, there are no unanswerable questions in the data. We chose this setting so that models can focus on cross-lingual transfer.

We show the average number of tokens per paragraph, question, and answer for each language in the table below. The statistics were obtained using Jieba for Chinese and the Moses tokenizer for the other languages.

| | en | es | de | el | ru | tr | ar | vi | th | zh | hi | |-----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | Paragraph | 142.4 | 160.7 | 139.5 | 149.6 | 133.9 | 126.5 | 128.2 | 191.2 | 158.7 | 147.6 | 232.4 | | Question | 11.5 | 13.4 | 11.0 | 11.7 | 10.0 | 9.8 | 10.7 | 14.8 | 11.5 | 10.5 | 18.7 | | Answer | 3.1 | 3.6 | 3.0 | 3.3 | 3.1 | 3.1 | 3.1 | 4.5 | 4.1 | 3.5 | 5.6 |

Training and evaluation

In order to evaluate on XQuAD, models should be trained on the SQuAD v1.1 training file. which can be downloaded from here. Model validation similarly should be conducted on the SQuAD v1.1 validation file.

For evaluation, we use the official SQuAD evaluate-v1.1.py script, which can be obtained from here. Note that the SQuAD evaluation script normalises the answer based on heuristics that are specific to English. We have observed language-specific normalisation heuristics only to have a marginal impact on performance, which is why we use the English SQuAD v1.1 evaluation script for convenience.

Baselines

We show results using baseline methods in the tables below. We directly fine-tune mBERT and XLM-R Large on the English SQuAD v1.1 training data and evaluate them via zero-shot transfer on the XQuAD test datasets. For translate-train, we fine-tune mBERT on the SQuAD v1.1 training data, which we automatically translate to the target language. For translate-test, we fine-tune BERT-Large on the SQuAD v1.1 training set and evaluate it on the XQuAD test set of the target language, which we automatically translate to English. Note that results with translate-test are not directly comparable as we drop a small number (less than 3%) of the test examples.

F1 scores:

| Model | en | ar | de | el | es | hi | ru | th | tr | vi | zh | ro | avg | |-----------------------|------|------|------|------|------|------|------|------|------|------|------|------|------| | mBERT | 83.5 | 61.5 | 70.6 | 62.6 | 75.5 | 59.2 | 71.3 | 42.7 | 55.4 | 69.5 | 58.0 | 72.7 | 65.2 | | XLM-R Large | 86.5 | 68.6 | 80.4 | 79.8 | 82.0 | 76.7 | 80.1 | 74.2 | 75.9 | 79.1 | 59.3 | 83.6 | 77.2 | | Translate-train mBERT | 83.5 | 68.0 | 75.6 | 70.0 | 80.2 | 69.6 | 75.0 | 36.9 | 68.9 | 75.6 | 66.2 | - | 70.0 | | Translate-test BERT-L | 87.9 | 73.7 | 79.8 | 79.4 | 82.0 | 74.9 | 79.9 | 64.6 | 67.4 | 76.3 | 73.7 | - | 76.3 |

EM scores:

| Model | en | ar | de | el | es | hi | ru | th | tr | vi | zh | ro | avg | |-----------------------|------|------|------|------|------|------|------|------|------|------|------|------|------| | mBERT | 72.2 | 45.1 | 54.0 | 44.9 | 56.9 | 46.0 | 53.3 | 33.5 | 40.1 | 49.6 | 48.3 | 59.9 | 50.3 | | XLM-R Large | 75.7 | 49.0 | 63.4 | 61.7 | 63.9 | 59.7 | 64.3 | 62.8 | 59.3 | 59.0 | 50.0 | 69.7 | 61.5 | | Translate-train mBERT | 72.2 | 51.1 | 60.7 | 53.0 | 63.1 | 55.4 | 59.7 | 33.5 | 54.8 | 56.2 | 56.6 | - | 56.0 | | Translate-test BERT-L | 77.1 | 58.8 | 66.7 | 65.5 | 68.4 | 60.1 | 66.7 | 50.0 | 49.6 | 61.5 | 59.1 | - | 62.1 |

Best practices

XQuAD is intended as an evaluation corpus for zero-shot cross-lingual transfer. Evaluation on the test data should ideally only be conducted at the very end of the experimentation in order to avoid overfitting to the data.

If you are evaluating on XQuAD in the zero-shot setting, please state explicitly your experimental settings, particularly what monolingual and cross-lingual data you used for pre-training and fine-tuning your model.

Reference

If you use this dataset, please cite [1]:

[1] Artetxe, M., Ruder, S., & Yogatama, D. (2019). On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856.

@article{Artetxe:etal:2019, author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama}, title = {On the cross-lingual transferability of monolingual representations}, journal = {CoRR}, volume = {abs/1910.11856}, year = {2019}, archivePrefix = {arXiv}, eprint = {1910.11856} }

The Romanian version of this data is part of LiRo, a benchmark for Romanian natural language understanding tasks: @inproceedings{ dumitrescu2021liro, title={LiRo: Benchmark and leaderboard for Romanian language tasks}, author={Stefan Daniel Dumitrescu and Petru Rebeja and Beata Lorincz and Mihaela Gaman and Andrei Avram and Mihai Ilie and Andrei Pruteanu and Adriana Stan and Lorena Rosia and Cristina Iacobescu and Luciana Morogan and George Dima and Gabriel Marchidan and Traian Rebedea and Madalina Chitez and Dani Yogatama and Sebastian Ruder and Radu Tudor Ionescu and Razvan Pascanu and Viorica Patraucean}, booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}, year={2021}, url={https://openreview.net/forum?id=JH61CD7afTv} }

License

This dataset is distributed under the CC BY-SA 4.0 license.

This is not an officially supported Google product.

Owner

Name: Google DeepMind
Login: google-deepmind
Kind: organization

Website: https://www.deepmind.com/
Repositories: 245
Profile: https://github.com/google-deepmind

GitHub Events

Total

Watch event: 23

Last Year

Watch event: 22

Committers

Last synced: about 1 year ago

All Time

Total Commits: 10
Total Committers: 3
Avg Commits per committer: 3.333
Development Distribution Score (DDS): 0.5

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Sebastian Ruder	r**r@g**m	5
dyogatama	d****a	4
Louise Deason	d**n@g**m	1

Committer Domains (Top 20 + Academic)

google.com: 2

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 5
Total pull requests: 0
Average time to close issues: 5 months
Average time to close pull requests: N/A
Total issue authors: 5
Total pull request authors: 0
Average comments per issue: 0.8
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

tungnt55 (1)
josecannete (1)
ofrimasad (1)
Liangtaiwan (1)
aiquinones (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/google-deepmind/xquad

Science Score: 10.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

XQuAD

Data

Training and evaluation

Baselines

Best practices

Reference

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels