https://github.com/amazon-science/nearest-neighbor-crosslingual-classification
https://github.com/amazon-science/nearest-neighbor-crosslingual-classification
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 43.9 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Nearest Neighbour Few-Shot Learning for Cross-lingual Classification
Even though large pre-trained multilingual models (e.g. mBERT, XLM-R) have led to significant performance gains on a wide range of cross-lingual NLP tasks, success on many downstream tasks still relies on the availability of sufficient annotated data. Traditional fine-tuning of pre-trained models using only a few target samples can cause over-fitting. This can be quite limiting as most languages in the world are under-resourced. In this work, we investigate cross-lingual adaptation using a simple nearest neighbor few-shot (<15 samples) inference technique for classification tasks. We experiment using a total of 16 distinct languages across two NLP tasks- XNLI and PAWS-X. Our approach consistently improves traditional fine-tuning using only a handful of labeled samples in target locales. We also demonstrate its generalization capability across tasks.
Implementation
This repository contains the code for Nearest Neighbour Few-Shot Learning for Cross-lingual Classification.
Implementation of the Nearest Neighbor Few-Shot Learning approach and instructions on running the code will be available here, soon.
Environment Creation Commands
We provide a *.yml file of our environment. Install environment by,
conda env create -f scripts/few-shot.yml
Download data
To download xnli and pawsx run,
bash scripts/download_data.sh
Download Pretrained Models
Model | Description | Dataset | Checkpoints
---|---|---|---
XLMR-R large | Full model Finetuning with english data | XNLI | Please Contact 1/2
XLMR-R large | Full model Finetuning with english data | PAWSX | Please Contact 1/2
Inside the project create a folder named dumped.
mkdir -p dumped
Move the downloaded Pretrained Models to dumped.
mv pawsx-xlmr-baseline-fp16 dumped/
mv xnli-xlmr-baseline-fp16 dumped/
Run experiment
For re-producing Table 1 results,
bash scripts/exp_scripts/xnli/FewShotBenchmark/xlmr-baseline-few-shot-benchmark.sh
For re-producing Table 2 results,
bash scripts/exp_scripts/pawsx/FewShotBenchmark/xlmr-baseline-few-shot-benchmark.sh
for re-producing Table 3 results,
bash scripts/exp_scripts/pawsx/FewShotBenchmark/xlmr-baseline-cross-task-xnli-benchmark.sh
if you want to start evaluation of multiple seed models in multiple GPU, you can do that by following,
bash scripts/run.sh
Accumulate result
You can accumulate results by,
python scripts/extract_answer.py --folder_path dumped/xnli-xlmr-baseline-cross-lingual-transfer --shot 5 --lang en es --pkl_res_file_name "task-xnli-src_lang-en-tgt_lang-{}-lr_rate-0.0000075-shot-{}-seed-{}"
python scripts/extract_answer.py --folder_path dumped/-xlmr-baseline-cross-lingual-transfer --shot 5 --lang "en" "de" "fr" --pkl_res_file_name "task-pawsx-src_lang-en-tgt_lang-{}-lr_rate-0.0000075-shot-{}-seed-{}"
python scripts/extract_answer.py --folder_path dumped/pawsx-xlmr-cross-task-fewshot-benchmark --shot 5 --lang "en" "de" "fr" --pkl_res_file_name "task-pawsx-src_lang-en-tgt_lang-{}-lr_rate-0.0000075-shot-{}-seed-{}"
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Citation
When using this repository, please cite the following work:
bibtex
@misc{bari2021nearest,
title={Nearest Neighbour Few-Shot Learning for Cross-lingual Classification},
author={M Saiful Bari and Batool Haider and Saab Mansour},
year={2021},
eprint={2109.02221},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- czq1999 (1)