https://github.com/amazon-science/transformers-data-augmentation

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

https://github.com/amazon-science/transformers-data-augmentation

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary

Keywords

bart bert bert-model data-augmentation gpt
Last synced: 5 months ago · JSON representation

Repository

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 820 KB
Statistics
  • Stars: 52
  • Watchers: 2
  • Forks: 7
  • Open Issues: 2
  • Releases: 0
Topics
bart bert bert-model data-augmentation gpt
Created about 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Data Augmentation using Pre-trained Transformer Models

Code associated with the Data Augmentation using Pre-trained Transformer Models paper

Code contains implementation of the following data augmentation methods - EDA (Baseline) - Backtranslation (Baseline) - CBERT (Baseline) - BERT Prepend (Our paper) - GPT-2 Prepend (Our paper) - BART Prepend (Our paper)

DataSets

In paper, we use three datasets from following resources - STSA-2 : https://github.com/1024er/cbert_aug/tree/crayon/datasets/stsa.binary - TREC : https://github.com/1024er/cbert_aug/tree/crayon/datasets/TREC - SNIPS : https://github.com/MiuLab/SlotGated-SLU/tree/master/data/snips

Low-data regime experiment setup

Run src/utils/download_and_prepare_datasets.sh file to prepare all datsets.
download_and_prepare_datasets.sh performs following steps 1. Download data from github 2. Replace numeric labels with text for STSA-2 and TREC dataset 3. For a given dataset, creates 15 random splits of train and dev data.

Dependencies

To run this code, you need following dependencies - Pytorch 1.5 - fairseq 0.9 - transformers 2.9

How to run

To run data augmentation experiment for a given dataset, run bash script in scripts folder. For example, to run data augmentation on snips dataset, - run scripts/bart_snips_lower.sh for BART experiment - run scripts/bert_snips_lower.sh for rest of the data augmentation methods

How to cite

{bibtex} @inproceedings{kumar-etal-2020-data, title = "Data Augmentation using Pre-trained Transformer Models", author = "Kumar, Varun and Choudhary, Ashutosh and Cho, Eunah", booktitle = "Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems", month = dec, year = "2020", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.lifelongnlp-1.3", pages = "18--26", }

Contact

Please reachout to kuvrun@amazon.com for any questions related to this code.

License

This project is licensed under the Creative Common Attribution Non-Commercial 4.0 license.

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 1
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • SouLeo (1)
Pull Request Authors
  • dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1)