https://github.com/beomi/transformers-language-modeling

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

https://github.com/beomi/transformers-language-modeling

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • â—‹
    Committers with academic emails
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary

Keywords

bert deepspeed language-model transformers
Last synced: 5 months ago · JSON representation

Repository

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Basic Info
  • Host: GitHub
  • Owner: Beomi
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 609 KB
Statistics
  • Stars: 23
  • Watchers: 2
  • Forks: 4
  • Open Issues: 0
  • Releases: 0
Topics
bert deepspeed language-model transformers
Created almost 5 years ago · Last pushed almost 5 years ago
Metadata Files
Readme

README.md

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

Language model training

Fine-tuning (or training from scratch) the library models for language modeling on a text dataset for GPT, GPT-2, ALBERT, BERT, DistilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tuned using a causal language modeling (CLM) loss while ALBERT, BERT, DistilBERT and RoBERTa are trained or fine-tuned using a masked language modeling (MLM) loss. XLNet uses permutation language modeling (PLM), you can find more information about the differences between those objectives in our model summary.

There are two sets of scripts provided. The first set leverages the Trainer API. The second set with no_trainer in the suffix uses a custom training loop and leverages the 🤗 Accelerate library . Both sets use the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.

Note: The old script run_language_modeling.py is still available here.

The following examples, will run on datasets hosted on our hub or with your own text files for training and validation. We give examples of both below.

GPT-2/GPT and causal language modeling

The following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before the tokenization). The loss here is that of causal language modeling.

bash python run_clm.py \ --model_name_or_path gpt2 \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --do_train \ --do_eval \ --output_dir /tmp/test-clm

This takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It reaches a score of ~20 perplexity once fine-tuned on the dataset.

To run on your own training and validation files, use the following command:

bash python run_clm.py \ --model_name_or_path gpt2 \ --train_file path_to_train_file \ --validation_file path_to_validation_file \ --do_train \ --do_eval \ --output_dir /tmp/test-clm

This uses the built in HuggingFace Trainer for training. If you want to use a custom training loop, you can utilize or adapt the run_clm_no_trainer.py script. Take a look at the script for a list of supported arguments. An example is shown below:

bash python run_clm_no_trainer.py \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --model_name_or_path gpt2 \ --output_dir /tmp/test-clm

RoBERTa/BERT/DistilBERT and masked language modeling

The following example fine-tunes RoBERTa on WikiText-2. Here too, we're using the raw WikiText-2. The loss is different as BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their pre-training: masked language modeling.

In accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may, therefore, converge slightly slower (over-fitting takes more epochs).

bash python run_mlm.py \ --model_name_or_path roberta-base \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --do_train \ --do_eval \ --output_dir /tmp/test-mlm

To run on your own training and validation files, use the following command:

bash python run_mlm.py \ --model_name_or_path roberta-base \ --train_file path_to_train_file \ --validation_file path_to_validation_file \ --do_train \ --do_eval \ --output_dir /tmp/test-mlm

If your dataset is organized with one sample per line, you can use the --line_by_line flag (otherwise the script concatenates all texts and then splits them in blocks of the same length).

This uses the built in HuggingFace Trainer for training. If you want to use a custom training loop, you can utilize or adapt the run_mlm_no_trainer.py script. Take a look at the script for a list of supported arguments. An example is shown below:

bash python run_mlm_no_trainer.py \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --model_name_or_path roberta-base \ --output_dir /tmp/test-mlm

Note: On TPU, you should use the flag --pad_to_max_length in conjunction with the --line_by_line flag to make sure all your batches have the same length.

Whole word masking

This part was moved to examples/research_projects/mlm_wwm.

XLNet and permutation language modeling

XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input sequence factorization order.

We use the --plm_probability flag to define the ratio of length of a span of masked tokens to surrounding context length for permutation language modeling.

The --max_span_length flag may also be used to limit the length of a span of masked tokens used for permutation language modeling.

Here is how to fine-tune XLNet on wikitext-2:

bash python run_plm.py \ --model_name_or_path=xlnet-base-cased \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --do_train \ --do_eval \ --output_dir /tmp/test-plm

To fine-tune it on your own training and validation file, run:

bash python run_plm.py \ --model_name_or_path=xlnet-base-cased \ --train_file path_to_train_file \ --validation_file path_to_validation_file \ --do_train \ --do_eval \ --output_dir /tmp/test-plm

If your dataset is organized with one sample per line, you can use the --line_by_line flag (otherwise the script concatenates all texts and then splits them in blocks of the same length).

Note: On TPU, you should use the flag --pad_to_max_length in conjunction with the --line_by_line flag to make sure all your batches have the same length.

Owner

  • Name: Junbum Lee
  • Login: Beomi
  • Kind: user
  • Location: Seoul, South Korea

AI/ML GDE @ml-gde. Korean AI/NLP Researcher and creator of multiple Korean PLMs. Focused on advancing Open LLMs.

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 3
  • Total Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Junbum Lee b****i@s****r 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • datasets >=1.1.3
  • protobuf *
  • sentencepiece *