https://github.com/amazon-science/univeral-prompt-production-systems

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: amazon-science
License: apache-2.0
Language: Python
Default Branch: main
Size: 74.2 KB

Statistics

Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created almost 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License Code of conduct

On Conditional and Compositional Language Model Differentiable Prompting --- PRompt Production System (PRoPS) code

The source code uses the huggingface implementation of transformers adapted for multitask training. The code in this repo replicates results of the paper On Conditional and Compositional Language Model Differentiable Prompting. Our paper was accepted at IJCAI 2023 (https://arxiv.org/abs/2307.01446).

Requirements

Python 3.7
Pytorch 1.9
Huggingface transformers 4.9.1

Note: Newer versions of the requirements should work, but was not tested.

Using a virual environment

```bash

Create a virtual environment

python3.7 -m venv promptxp source promptxp/bin/activate

Install the requirements

pip install requirements.txt

If you are using an environment that have torch already installed use "requirements.txt"

```

Run files

There are two run scripts that are equivalen: 1. run.py that runs the huggingface trainer 2. run_lightning.py that uses the huggingface trainer to set-up the data and model but uses the lightning trainer We found the lightning model to be faster

Run scripts

All run scripts can be found in the .\scripts\launch_{cnndm, xsum, scan, multi_nmt} directories. For example:

bash chmod +x lightning_bartlarge_prefix_upps_xsum_summ.sh ./lightning_bartlarge_prefix_upps_xsum_summ.sh

To run the out-of-topic xsum_news dataset, you need to replace

--tasks_file_path ./task_files/xsum_trg_summ.yml

by

--tasks_file_path ./task_files/xsum_trg_summ_news.yml

in the launch scripts.

All scripts with the word inter in the namenclature are used to pretrain the prompt generator on intermediate tasks. See \task_files\cnndm_inter_all.yml and \task_files\xsum_inter_all.yml for the list of tasks used. Make sure to change your --output_dir.

Once trained, this model is used for 0, 5, 50, 100, 200, 500 shot experiments using --max_train_samples. Note that I increase --max_source_length to 768. Make sure to add path of intermediate task pretrained model checkpoint in --model_name_or_path. See lightning_bartlarge_xsum_summ_low_res.sh for an example. For an example of how to test the intermediate task finetuned model, see /scripts/launch_cnndm/lightning_bartlarge_prefix_cnndm_summ_test.sh. All run scripts assume that the data is in /home/Datasets and that the models/log files are saved in /home/Models. If that's not the case, you will need to manually change paths in taskfiles folder for yml files in the 'taskfiles' directory and in the sh files in the scripts directory.

Downloading Datasets

Datasets below require additional preprocessing (see preprocessing section). You can skip the next two sections and find the preprocessed data links in the "Preprocessed Data" section

| Type | Link | | ------ | ------ | | XSUM and XSUM-news | lisacolab/xsumdata| | Topic CNN-DM | Amazon Alexa S3 link | | Extractive CNN-DM | Microsoft download| | NewsQA | Google Drive | | SCAN | Github link |

Translation and semantic parsing datasets can directly accessed via the Huggingface datahub.

Preprocessing Data

Scripts to preprocess data can be found in ./scripts/preprocessing. You will need to download domain information or article types (sports, business, politics, ...) from here: TODO: please make this link public

Preprocessing files mainly put all data in a csv format while putting together the meta data (article label, news outlet). However, for the new xsum_ner, xsum_paraphrase and xsum_extractive datasets that we used to test task composition, further data processing was done:

xsum_ner requires extracting most common entities, which is used as an output, and the corresponding entity label, used as an input.
xsum_paraphrase requires a pretrained translation model. We randomly extract passages from the input article and perform backtranslation (en>ar>fr>zh>en) to create abstractive passages.
xsum_extractive requires using BertScores to extract the most relevant sentence.

Preprocessed Data

All preprocessed datasets can be found here: TODO: please make this link public.

Usage

``` usage: run.py [-h] --modelnameorpath MODELNAMEORPATH [--modelvariation MODELVARIATION] [--configname CONFIGNAME] [--tokenizername TOKENIZERNAME] [--lengthpenalty LENGTHPENALTY] [--numbeams NUMBEAMS] [--cachedir CACHEDIR] [--freezeencoderlayers FREEZEENCODERLAYERS] [--freezedecoderlayers FREEZEDECODERLAYERS] [--unfreezeupproj [UNFREEZEUPPROJ]] [--unfreezeqproj [UNFREEZEQPROJ]] [--unfreezevproj [UNFREEZEVPROJ]] [--unfreezeattn [UNFREEZEATTN]] [--nousefasttokenizer] [--usefasttokenizer [USEFASTTOKENIZER]] [--numprompts NUMPROMPTS] [--promptlayers PROMPTLAYERS] [--promptattentionheads PROMPTATTENTIONHEADS] [--promptdmodel PROMPTDMODEL] [--promptffndim PROMPTFFNDIM] [--promptdropout PROMPTDROPOUT] [--prompthiddencondition [PROMPTHIDDENCONDITION]] [--encoderselfprefixlayerids ENCODERSELFPREFIXLAYERIDS [ENCODERSELFPREFIXLAYERIDS ...]] [--decoderselfprefixlayerids DECODERSELFPREFIXLAYERIDS [DECODERSELFPREFIXLAYERIDS ...]] [--decodercrossprefixlayerids DECODERCROSSPREFIXLAYERIDS [DECODERCROSSPREFIXLAYERIDS ...]] [--tokkrules TOKKRULES] [--nencrecurrence NENCRECURRENCE] [--ndecrecurrence NDECRECURRENCE] [--datasetname DATASETNAME] [--datasetconfigname DATASETCONFIGNAME] [--tasksfilepath TASKSFILEPATH] [--textcolumn TEXTCOLUMN] [--summarycolumn SUMMARYCOLUMN] [--overwritecache [OVERWRITECACHE]] [--preprocessingnumworkers PREPROCESSINGNUMWORKERS] [--maxsourcelength MAXSOURCELENGTH] [--maxtargetlength MAXTARGETLENGTH] [--mintargetlength MINTARGETLENGTH] [--maxdescriptorlength MAXDESCRIPTORLENGTH] [--valmaxtargetlength VALMAXTARGETLENGTH] [--padtomaxlength [PADTOMAXLENGTH]] [--forcedbostoken FORCEDBOSTOKEN] [--maxtrainsamples MAXTRAINSAMPLES] [--maxevalsamples MAXEVALSAMPLES] [--maxpredictsamples MAXPREDICTSAMPLES] [--noignorepadtokenforloss] [--ignorepadtokenforloss [IGNOREPADTOKENFOR_LOSS]] [--sourceprefix SOURCEPREFIX] [--removedomains [REMOVEDOMAINS]] --outputdir OUTPUTDIR [--overwriteoutputdir [OVERWRITEOUTPUTDIR]] [--dotrain [DOTRAIN]] [--doeval [DOEVAL]] [--dopredict [DOPREDICT]] [--evaluationstrategy {no,steps,epoch}] [--predictionlossonly [PREDICTIONLOSS_ONLY]] [--perdevicetrainbatchsize PERDEVICETRAINBATCHSIZE] [--perdeviceevalbatchsize PERDEVICEEVALBATCHSIZE] [--pergputrainbatchsize PERGPUTRAINBATCHSIZE] [--pergpuevalbatchsize PERGPUEVALBATCHSIZE] [--gradientaccumulationsteps GRADIENTACCUMULATIONSTEPS] [--evalaccumulationsteps EVALACCUMULATIONSTEPS] [--learningrate LEARNINGRATE] [--weightdecay WEIGHTDECAY] [--adambeta1 ADAMBETA1] [--adambeta2 ADAMBETA2] [--adamepsilon ADAMEPSILON] [--maxgradnorm MAXGRADNORM] [--numtrainepochs NUMTRAINEPOCHS] [--maxsteps MAXSTEPS] [--lrschedulertype {linear,cosine,cosinewithrestarts,polynomial,constant,constantwithwarmup}] [--warmupratio WARMUPRATIO] [--warmupsteps WARMUPSTEPS] [--log_level {debug,info,warning,error,critical,passive}] [--loglevelreplica {debug,info,warning,error,critical,passive}] [--nologoneachnode] [--logoneachnode [LOGONEACHNODE]] [--loggingdir LOGGINGDIR] [--logging_strategy {no,steps,epoch}] [--loggingfirststep [LOGGINGFIRSTSTEP]] [--loggingsteps LOGGINGSTEPS] [--save_strategy {no,steps,epoch}] [--savesteps SAVESTEPS] [--savetotallimit SAVETOTALLIMIT] [--saveoneachnode [SAVEONEACHNODE]] [--nocuda [NOCUDA]] [--seed SEED] [--fp16 [FP16]] [--fp16optlevel FP16OPTLEVEL] [--fp16_backend {auto,amp,apex}] [--fp16fulleval [FP16FULLEVAL]] [--localrank LOCALRANK] [--tpunumcores TPUNUMCORES] [--tpumetricsdebug [TPUMETRICSDEBUG]] [--debug DEBUG] [--dataloaderdroplast [DATALOADERDROPLAST]] [--evalsteps EVALSTEPS] [--dataloadernumworkers DATALOADERNUMWORKERS] [--pastindex PASTINDEX] [--runname RUNNAME] [--disabletqdm DISABLETQDM] [--noremoveunused_columns] [--removeunusedcolumns [REMOVEUNUSEDCOLUMNS]] [--labelnames LABELNAMES [LABELNAMES ...]] [--loadbestmodelatend [LOADBESTMODELATEND]] [--metricforbestmodel METRICFORBESTMODEL] [--greaterisbetter GREATERISBETTER] [--ignoredataskip [IGNOREDATASKIP]] [--shardedddp SHARDEDDDP] [--deepspeed DEEPSPEED] [--labelsmoothingfactor LABELSMOOTHINGFACTOR] [--adafactor [ADAFACTOR]] [--groupbylength [GROUPBYLENGTH]] [--lengthcolumnname LENGTHCOLUMNNAME] [--reportto REPORTTO [REPORTTO ...]] [--ddpfindunusedparameters DDPFINDUNUSEDPARAMETERS] [--nodataloaderpin_memory] [--dataloaderpinmemory [DATALOADERPINMEMORY]] [--noskipmemorymetrics] [--skipmemorymetrics [SKIPMEMORY_METRICS]] [--uselegacypredictionloop [USELEGACYPREDICTIONLOOP]] [--pushtohub [PUSHTOHUB]] [--resumefromcheckpoint RESUMEFROMCHECKPOINT] [--pushtohubmodelid PUSHTOHUBMODELID] [--pushtohuborganization PUSHTOHUBORGANIZATION] [--pushtohubtoken PUSHTOHUBTOKEN] [--mpparameters MPPARAMETERS] [--sortishsampler [SORTISHSAMPLER]] [--predictwithgenerate [PREDICTWITHGENERATE]] [--uselightning [USELIGHTNING]] [--lightningcheckpoint LIGHTNINGCHECKPOINT] [--everynepochs EVERYNEPOCHS] [--disableexplogger [DISABLEEXP_LOGGER]] [--expworkspacename EXPWORKSPACENAME] [--expprojectname EXPPROJECTNAME] [--exploggerapikey EXPLOGGERAPIKEY] [--expname EXPNAME]

optional arguments: -h, --help show this help message and exit --modelnameorpath MODELNAMEORPATH Path to pretrained model or model identifier from: CA- MTL-base, CA-MTL-large, bert-base-cased bert-base- uncased, bert-large-cased, bert-large-uncased --datadir DATADIR The input data dir. Should contain the .tsv files (or other data files) for the task. --tasksfilepath yml file containing task(s) information The task file that contains the tasks to train on. If None all tasks will be used --overwritecache Overwrite the cached training and evaluation sets --maxseqlength MAXSEQLENGTH The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. --outputdir OUTPUTDIR The output directory where the model predictions and checkpoints will be written. --overwriteoutputdir Overwrite the content of the output directory.Use this to continue training if outputdir points to a checkpoint directory. --dotrain Whether to run training. --doeval Whether to run eval on the dev set. --dopredict Whether to run predictions on the test set. --evaluateduringtraining Run evaluation during training at each logging step. --perdevicetrainbatchsize PERDEVICETRAINBATCHSIZE Batch size per GPU/TPU core/CPU for training. --perdeviceevalbatchsize PERDEVICEEVALBATCHSIZE Batch size per GPU/TPU core/CPU for evaluation. --pergputrainbatchsize PERGPUTRAINBATCHSIZE Deprecated, the use of `--perdevicetrainbatchsize` is preferred. Batch size per GPU/TPU core/CPU for training. --pergpuevalbatchsize PERGPUEVALBATCHSIZE Deprecated, the use of `--perdeviceevalbatchsize` is preferred.Batch size per GPU/TPU core/CPU for evaluation. --gradientaccumulationsteps GRADIENTACCUMULATIONSTEPS Number of updates steps to accumulate before performing a backward/update pass. --learningrate LEARNINGRATE The initial learning rate for Adam. --weightdecay WEIGHTDECAY Weight decay if we apply some. --adamepsilon ADAMEPSILON Epsilon for Adam optimizer. --maxgradnorm MAXGRADNORM Max gradient norm. --numtrainepochs NUMTRAINEPOCHS Total number of training epochs to perform. --maxsteps MAXSTEPS If > 0: set total number of training steps to perform. Override numtrainepochs. --warmupsteps WARMUPSTEPS Linear warmup over warmupsteps. --loggingdir LOGGINGDIR Tensorboard log dir. --loggingfirststep Log and eval the first globalstep --loggingsteps LOGGINGSTEPS Log every X updates steps. --savesteps SAVESTEPS Save checkpoint every X updates steps. --savetotallimit SAVETOTALLIMIT Limit the total amount of checkpoints.Deletes the older checkpoints in the outputdir. Default is unlimited checkpoints --nocuda Do not use CUDA even when it is available --seed SEED random seed for initialization --fp16 Whether to use 16-bit (mixed) precision (through NVIDIA apex) instead of 32-bit --fp16optlevel FP16OPTLEVEL For fp16: Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3'].See details at https://nvidia.github.io/apex/amp.html --localrank LOCALRANK For distributed training: localrank --tpunumcores TPUNUMCORES TPU: Number of TPU cores (automatically passed by launcher script) ```

Since our code is based on the huggingface implementation. All parameters are described in their documentation.

How do I cite?

@inproceedings{ijcai2023p0460, title = {On Conditional and Compositional Language Model Differentiable Prompting}, author = {Pilault, Jonathan and Liu, Can and Bansal, Mohit and Dreyer, Markus}, booktitle = {Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, {IJCAI-23}}, publisher = {International Joint Conferences on Artificial Intelligence Organization}, editor = {Edith Elkind}, pages = {4136--4144}, year = {2023}, month = {8}, note = {Main Track}, doi = {10.24963/ijcai.2023/460}, url = {https://doi.org/10.24963/ijcai.2023/460}, }

Contact and Contribution

For any question or request, please create a Github issue in this repository.

Owner

Name: Amazon Science
Login: amazon-science
Kind: organization

Website: https://amazon.science
Twitter: AmazonScience
Repositories: 80
Profile: https://github.com/amazon-science

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0