fmengine-torch

FMEngine [PyTorch version]

https://github.com/lorrinwww/fmengine-torch

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

FMEngine [PyTorch version]

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation Roadmap

README.md

FMEngine

Training preparation

  • Prepare checkpoints. As the first step, you will need to split a large model checkpoint into smaller pieces for each layer. This can be done by running the following command:

bash python scripts/conversions/llama/from_hf.py \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --output_dir path_to_outdir/llama2-7b \ --mp_world_size 1

You can download pre-configured checkpoints here: Google Drive.

  • Prepare datasets. We now only supports .jsonl format, which is a list of json objects, each of which contains a text field. For example, a sample of the dataset can be:

json {"text": "I love this movie!"} {"text": "I hate this movie!"} {"text": "I don't know."}

Training

In /scripts, we show some examples of training scripts, for example, to finetune a pythia-2.8b model, you can run the following command: bash deepspeed --num_gpus 4 --num_nodes 1 starter.py \ --output_dir .cache/models \ --init_ckpt /pretrained_weights/pythia-160m-deduped \ --data_path /datasets/quantitative_natural_instructions/train/all.train.jsonl \ --max_seq_len 1024 \ --train_steps 1000 \ --eval_steps 10 \ --save_steps 100 \ --log_steps 1 \ --pipe_parallel_size 1 \ --model_parallel_size 1 \ --use_flash_attn true \ --deepspeed_config ./configs/pythia.json

You are also advised to read ./configs/pythia.json for the deepspeed configuration, which convers the learning rate, batch size, etc.

Supported Models

(we only tried finetuning but not pretraining - but it should work)

| Model | #Params | #Layers | #Heads | #Dim | Pretrained Checkpoint | Flash Attention | | --- | --- | --- | --- | --- | --- | --- | | Pythia-160M | 85M | 12 | 12 | 768 | Download | Yes | | Pythia-1.4B | 1.2B | 24 | 16 | 2048 | Download | Yes | | Pythia-2.8B | 2.5B | 32 | 32 | 2560 | Download | Yes | | OpenLlama-3B | tba | tba | tba | tba | Download | Yes |

Multi-host training

We support multi-host training with deepspeed. To run multi-host training, you need to install pdsh first, by running the following command:

bash git clone https://github.com/chaos/pdsh.git cd pdsh ./configure --enable-static-modules --without-rsh --with-ssh --without-ssh-connect-timeout-option --prefix=/your/preferred/path make make install

If you have root access, it might be easier.

References

Owner

  • Name: Jue WANG
  • Login: LorrinWWW
  • Kind: user
  • Location: Hangzhou
  • Company: Zhejiang University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Yao"
  given-names: "Xiaozhe"
  orcid: "https://orcid.org/0000-0002-4661-533X"
title: "FMEngine: Library for Training/Serving Foundation Models"
version: 0.0.1
doi: 10.5281/zenodo.8314779
date-released: 2023-09-04
url: "https://github.com/eth-easl/fmengine"

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 55
  • Total Committers: 2
  • Avg Commits per committer: 27.5
  • Development Distribution Score (DDS): 0.018
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Xiaozhe Yao a****o@g****m 54
Jue Wang j****e@c****i 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • LorrinWWW (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

build/Dockerfile docker
  • nvcr.io/nvidia/pytorch 23.08-py3 build
requirements.txt pypi
  • Cython *
  • accelerate *
  • datasets *
  • deepspeed *
  • diffusers *
  • evaluate *
  • loguru *
  • numpy *
  • pandas *
  • peft *
  • scikit-build *
  • scikit-learn *
  • sentencepiece *
  • tabulate *
  • tokenizers *
  • transformers *
  • wandb *