https://github.com/amazon-science/nearest-neighbor-crosslingual-classification

https://github.com/amazon-science/nearest-neighbor-crosslingual-classification

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 43.9 KB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Created almost 5 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Nearest Neighbour Few-Shot Learning for Cross-lingual Classification

Even though large pre-trained multilingual models (e.g. mBERT, XLM-R) have led to significant performance gains on a wide range of cross-lingual NLP tasks, success on many downstream tasks still relies on the availability of sufficient annotated data. Traditional fine-tuning of pre-trained models using only a few target samples can cause over-fitting. This can be quite limiting as most languages in the world are under-resourced. In this work, we investigate cross-lingual adaptation using a simple nearest neighbor few-shot (<15 samples) inference technique for classification tasks. We experiment using a total of 16 distinct languages across two NLP tasks- XNLI and PAWS-X. Our approach consistently improves traditional fine-tuning using only a handful of labeled samples in target locales. We also demonstrate its generalization capability across tasks.

Implementation

This repository contains the code for Nearest Neighbour Few-Shot Learning for Cross-lingual Classification.

Implementation of the Nearest Neighbor Few-Shot Learning approach and instructions on running the code will be available here, soon.

Environment Creation Commands

We provide a *.yml file of our environment. Install environment by,

conda env create -f scripts/few-shot.yml

Download data

To download xnli and pawsx run,

bash scripts/download_data.sh

Download Pretrained Models

Model | Description | Dataset | Checkpoints ---|---|---|--- XLMR-R large | Full model Finetuning with english data | XNLI | Please Contact 1/2 XLMR-R large | Full model Finetuning with english data | PAWSX | Please Contact 1/2 Inside the project create a folder named dumped. mkdir -p dumped Move the downloaded Pretrained Models to dumped.

mv pawsx-xlmr-baseline-fp16 dumped/ mv xnli-xlmr-baseline-fp16 dumped/

Run experiment

For re-producing Table 1 results,

bash scripts/exp_scripts/xnli/FewShotBenchmark/xlmr-baseline-few-shot-benchmark.sh

For re-producing Table 2 results,

bash scripts/exp_scripts/pawsx/FewShotBenchmark/xlmr-baseline-few-shot-benchmark.sh

for re-producing Table 3 results,

bash scripts/exp_scripts/pawsx/FewShotBenchmark/xlmr-baseline-cross-task-xnli-benchmark.sh

if you want to start evaluation of multiple seed models in multiple GPU, you can do that by following,

bash scripts/run.sh

Accumulate result

You can accumulate results by,

python scripts/extract_answer.py --folder_path dumped/xnli-xlmr-baseline-cross-lingual-transfer --shot 5 --lang en es --pkl_res_file_name "task-xnli-src_lang-en-tgt_lang-{}-lr_rate-0.0000075-shot-{}-seed-{}" python scripts/extract_answer.py --folder_path dumped/-xlmr-baseline-cross-lingual-transfer --shot 5 --lang "en" "de" "fr" --pkl_res_file_name "task-pawsx-src_lang-en-tgt_lang-{}-lr_rate-0.0000075-shot-{}-seed-{}" python scripts/extract_answer.py --folder_path dumped/pawsx-xlmr-cross-task-fewshot-benchmark --shot 5 --lang "en" "de" "fr" --pkl_res_file_name "task-pawsx-src_lang-en-tgt_lang-{}-lr_rate-0.0000075-shot-{}-seed-{}"

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Citation

When using this repository, please cite the following work:

bibtex @misc{bari2021nearest, title={Nearest Neighbour Few-Shot Learning for Cross-lingual Classification}, author={M Saiful Bari and Batool Haider and Saab Mansour}, year={2021}, eprint={2109.02221}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • czq1999 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels