https://github.com/bigscience-workshop/xmtf

Crosslingual Generalization through Multitask Finetuning

Keywords

bloom bloomz instruction-tuning language-models large-language-models mt0 multilingual-nlp multitask-learning t5 zero-shot-learning

Last synced: 11 months ago · JSON representation

Repository

Crosslingual Generalization through Multitask Finetuning

Basic Info

Host: GitHub
Owner: bigscience-workshop
License: apache-2.0
Language: Jupyter Notebook
Default Branch: master
Homepage: https://arxiv.org/abs/2211.01786
Size: 28.6 MB

Statistics

Stars: 533
Watchers: 6
Forks: 39
Open Issues: 11
Releases: 0

Topics

bloom bloomz instruction-tuning language-models large-language-models mt0 multilingual-nlp multitask-learning t5 zero-shot-learning

Created almost 4 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

Crosslingual Generalization through Multitask Finetuning

This repository provides an overview of all components used for the creation of BLOOMZ & mT0 and xP3 introduced in the paper Crosslingual Generalization through Multitask Finetuning. Link to 25min video on the paper by Samuel Albanie; Link to 4min video on the paper by Niklas Muennighoff.

Data
Models
Create xP3
Train models
- BLOOMZ
- mT0
Evaluate models
- Rank Evaluation
- Generation Evaluation
Plots & Tables
- Plots
- Tables
Citation

Data

Name	Explanation	Example models
xP3x	Mixture of 17 tasks in 277 languages with English prompts	WIP - Join us at Project Aya @C4AI to help!
xP3	Mixture of 13 training tasks in 46 languages with English prompts	BLOOMZ & mT0-13B
xP3mt	Mixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English)	BLOOMZ-MT & mT0-13B-MT
xP3all	xP3 + our evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts
xP3megds	Megatron-DeepSpeed processed version of xP3	BLOOMZ
P3	Repreprocessed version of the English-only P3 with 8 training tasks	BLOOMZ-P3 & mT0-13B-P3

Models

Multitask finetuned on xP3. Recommended for prompting in English.
Parameters	300M	580M	1.2B	3.7B	13B	560M	1.1B	1.7B	3B	7.1B	176B
Finetuned Model	mt0-small	mt0-base	mt0-large	mt0-xl	mt0-xxl	bloomz-560m	bloomz-1b1	bloomz-1b7	bloomz-3b	bloomz-7b1	bloomz
Multitask finetuned on xP3mt. Recommended for prompting in non-English.
Finetuned Model					mt0-xxl-mt					bloomz-7b1-mt	bloomz-mt
Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models!
Finetuned Model					mt0-xxl-p3					bloomz-7b1-p3	bloomz-p3
Original pretrained checkpoints. Not recommended.
Pretrained Model	mt5-small	mt5-base	mt5-large	mt5-xl	mt5-xxl	bloom-560m	bloom-1b1	bloom-1b7	bloom-3b	bloom-7b1	bloom

Create xP3(x)

We have processed & uploaded xP3. If you want to recreate it, follow these steps:

Get promptsource: For xP3mt git clone -b xp3mt https://github.com/Muennighoff/promptsource.git, for xP3 git clone -b tr13 https://github.com/Muennighoff/promptsource.git & install cd promptsource; pip install -e .
Get packages pip install -q datasets iso-639
Get the creation script & edit it if necessary:
For xP3mt, set USE_ENGLISH_PROMPTS = False in the beginning
For xP3, set USE_ENGLISH_PROMPTS = True in the beginning
Run the script, such as via python prepare_xp3.py or a SLURM script

For the new extension of xP3, xP3x, the process is largely the same except:

Install the xp3x branch instead i.e. pip install git+https://github.com/Muennighoff/promptsource.git@xp3x
The creation script is in this repository & named create_xp3x.py.

xP3x is a superset of xP3, so unless you want to reproduce the paper, we recommend always using xP3x (or xP3mt if you want machine-translated prompts).

Train models

BLOOMZ

Download the pretrained model checkpoint, which is of shape PP=12, TP=4, DP=4. If you'd like to reshape the model you will also need to download the universal checkpoint. If you want to continue finetuning, you should use our finetuned checkpoint, which is of shape PP=72, TP=1, DP=4.
Setup the training code: git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed & follow its setup guide to create an environment with necessary packages.
Download the Megatron-DeepSpeed processed xP3megds or repreprocess it for Megatron-DeepSpeed yourself by downloading xP3, removing the merged_{lang}.jsonl files & preprocess it using the script here.
Setup & run the training script: We use SLURM scripts available at bigscience-workshop/bigscience/train/tr13-mtf and referred to as xp3capmixnewcodelonglossseq. E.g. this is the script launched to train bloomz. Important parts of the script to modify are:
#SBATCH variables, such as nodes, gpus, time, etc. - Our SLURM guide is here
source $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0 to point to your own conda environment setup via Megatron-DeepSpeed
PATH environment variables, notably
- TRAIN_DATA_PATH & VALID_DATA_PATH, which point to files pointing to your processed training and validation data. We provide our files in this repository (xp3capmixnewcodelong_train.txt & xp3capmixnewcodelong_validation.txt), but you will likely want to change the paths inside. The percentages per language are based on how much each language makes up in xP3 with code being slightly upsampled.
PPSIZE=72, TPSIZE=1 & BATCH SIZE & co specifying the layout. This will depend on the hardware available to you. If you change, you may have to reshape the model. For reshaping you need to use the universal checkpoint and use the --universal flag in the script. We recommend saving a new checkpoint right after & then continuing training without --universal, which will be faster.
If you want to restart from a saved checkpoint (e.g. after training a few steps like above), make sure to remove the --no-load-optim & --reset-progress flags
After training, you can convert the checkpoint to transformers format using the script here

Helpful resources: - Blog post - BLOOM community tab, such as here

mT0

Follow the finetuning instructions here making sure to use pretrained mT5 models & the xP3 dataset.

Helpful resources: - T5X paper

Evaluate models

Evaluation results are all available in this repository: https://huggingface.co/datasets/bigscience/evaluation-results under the respective models. Below we explain how to run evaluation.

Rank Evaluation

We evaluate the models on Rank Evaluation on XCOPA, XNLI, XStoryCloze & XWinograd:

Get promptsource fork: git clone -b xp3mt https://github.com/Muennighoff/promptsource.git & cd promptsource; pip install -e .
Get t-zero fork: git clone -b muennighoff/upgrdps https://github.com/Muennighoff/t-zero.git & cd t-zero; pip install -e .
Download model & run evaluation script, for example for bloomz.

Generation Evaluation

We evaluate generation on translation & summarization during training for validation:

Get promptsource fork: git clone -b xp3mt https://github.com/Muennighoff/promptsource & cd promptsource; pip install -e .
Get bigscience-workshop/lm-evaluation-harness: git clone https://github.com/bigscience-workshop/lm-evaluation-harness. The script for the 7.1B model, for example, is here.

We also evaluate code generation on HumanEval:

Get code evaluation code git clone https://github.com/loubnabnl/bloom-code-evaluation & go through its setup.
Set prepend_eos to False in code_eval.py at complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=True, **gen_kwargs) i.e. complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=False, **gen_kwargs).
Download model & run evaluation script swapping out MODEL_CKPT for your path, for example for bloomz use this.

Plots & Tables

Plots

Figure 1: plotstables/xp3_taxonomy.drawio & plotstables/xp3_taxonomy.pdf
Figure 2: plotstables/xp3_languages.ipynb & colab
Figure 3: plotstables/xp3_variants.pdf & drawings
Figure 4: plotstables/xp3_generalization_bar.pdf & colab
Figure 5: plotstables/lang_generalization & colab
Figure 6: plotstables/scale.pdf & colab
Figure 7: plotstables/validation.pdf & colab
Figure 8: plotstables/pretraining_sizes.pdf & colab
Figure 9: plotstables/english_task_generalization.pdf & colab
Figure 10: plotstables/task_generalization.pdf & colab
Figure 11: plotstables/roots_xp3_languages.pdf & colab requiring some of the files in plotstables/contamination
Figure 12: plotstables/examples/bloom_code_example.py & plotstables/examples/bloom_code_light.pdf & plotstables/examples/bloomz_code_light.pdf; The raw code files can be found here & here
Figure 13 - Figure 16: plotstables/examples/*.pdf & plotstables/examples/generations.drawio

Tables

Table 1: Colab & Colab for complex version
Table 2: Adapted from the Codex paper
Table 3: Manual
Table 4: plotstables/compute_codegen_len.ipynb for generations & plotstables/countcode.py for xP3
Table 5: Manual
Table 6: Manual
Table 7: plotstables/levenshtein.py
Table 8: Same as Table 1 with languages swapped from L1 to L2
Table 9: Colab
Table 10: Colab
Prompt Appendix: https://github.com/albanie/promptformattingin_latex

Citation

bibtex @article{muennighoff2022crosslingual, title={Crosslingual generalization through multitask finetuning}, author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others}, journal={arXiv preprint arXiv:2211.01786}, year={2022} }

Owner

Name: BigScience Workshop
Login: bigscience-workshop
Kind: organization
Email: bigscience-contact@googlegroups.com

Website: https://bigscience.huggingface.co
Twitter: BigScienceW
Repositories: 28
Profile: https://github.com/bigscience-workshop

Research workshop on large language models - The Summer of Language Models 21

GitHub Events

Total

Watch event: 20
Fork event: 3

Last Year

Watch event: 20
Fork event: 3

Committers

Last synced: about 1 year ago

All Time

Total Commits: 30
Total Committers: 3
Avg Commits per committer: 10.0
Development Distribution Score (DDS): 0.067

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Niklas Muennighoff	n**f@g**m	28
Alham Fikri Aji	a**1@g**m	1
Niklas Muennighoff	m**f@N**e	1

Committer Domains (Top 20 + Academic)

niklass-air.home: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 32
Total pull requests: 3
Average time to close issues: 9 days
Average time to close pull requests: about 1 month
Total issue authors: 18
Total pull request authors: 2
Average comments per issue: 3.16
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sh0tcall3r (3)
MattYoon (2)
mkw18 (2)
huybery (1)
LiuShixing (1)
junwang-wish (1)
hatvn (1)
Mahyar-Ali (1)
lbourdois (1)
qazwsx042 (1)
fpcsong (1)
raihan0824 (1)
noanti (1)
hmubarak (1)
dsj96 (1)

https://github.com/bigscience-workshop/xmtf

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Crosslingual Generalization through Multitask Finetuning

Data

Models

Create xP3(x)

Train models

BLOOMZ

mT0

Evaluate models

Rank Evaluation

Generation Evaluation

Plots & Tables

Plots

Tables

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels