fmengine-torch

FMEngine [PyTorch version]

https://github.com/lorrinwww/fmengine-torch

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

FMEngine [PyTorch version]

Basic Info

Host: GitHub
Owner: LorrinWWW
Language: Python
Default Branch: init
Homepage: https://docs.yao.sh/docs/projects/fmengine/
Size: 142 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme Citation Roadmap

FMEngine

Training preparation

Prepare checkpoints. As the first step, you will need to split a large model checkpoint into smaller pieces for each layer. This can be done by running the following command:

bash python scripts/conversions/llama/from_hf.py \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --output_dir path_to_outdir/llama2-7b \ --mp_world_size 1

You can download pre-configured checkpoints here: Google Drive.

Prepare datasets. We now only supports .jsonl format, which is a list of json objects, each of which contains a text field. For example, a sample of the dataset can be:

json {"text": "I love this movie!"} {"text": "I hate this movie!"} {"text": "I don't know."}

Training

In /scripts, we show some examples of training scripts, for example, to finetune a pythia-2.8b model, you can run the following command: bash deepspeed --num_gpus 4 --num_nodes 1 starter.py \ --output_dir .cache/models \ --init_ckpt /pretrained_weights/pythia-160m-deduped \ --data_path /datasets/quantitative_natural_instructions/train/all.train.jsonl \ --max_seq_len 1024 \ --train_steps 1000 \ --eval_steps 10 \ --save_steps 100 \ --log_steps 1 \ --pipe_parallel_size 1 \ --model_parallel_size 1 \ --use_flash_attn true \ --deepspeed_config ./configs/pythia.json

You are also advised to read ./configs/pythia.json for the deepspeed configuration, which convers the learning rate, batch size, etc.

Supported Models

(we only tried finetuning but not pretraining - but it should work)

| Model | #Params | #Layers | #Heads | #Dim | Pretrained Checkpoint | Flash Attention | | --- | --- | --- | --- | --- | --- | --- | | Pythia-160M | 85M | 12 | 12 | 768 | Download | Yes | | Pythia-1.4B | 1.2B | 24 | 16 | 2048 | Download | Yes | | Pythia-2.8B | 2.5B | 32 | 32 | 2560 | Download | Yes | | OpenLlama-3B | tba | tba | tba | tba | Download | Yes |

Multi-host training

We support multi-host training with deepspeed. To run multi-host training, you need to install pdsh first, by running the following command:

bash git clone https://github.com/chaos/pdsh.git cd pdsh ./configure --enable-static-modules --without-rsh --with-ssh --without-ssh-connect-timeout-option --prefix=/your/preferred/path make make install

If you have root access, it might be easier.

References

Deepspeed Configuration References

Owner

Name: Jue WANG
Login: LorrinWWW
Kind: user
Location: Hangzhou
Company: Zhejiang University

Website: https://juewang.me/about/
Repositories: 3
Profile: https://github.com/LorrinWWW

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Yao"
  given-names: "Xiaozhe"
  orcid: "https://orcid.org/0000-0002-4661-533X"
title: "FMEngine: Library for Training/Serving Foundation Models"
version: 0.0.1
doi: 10.5281/zenodo.8314779
date-released: 2023-09-04
url: "https://github.com/eth-easl/fmengine"

GitHub Events

Total

Last Year

Committers

Last synced: over 1 year ago

All Time

Total Commits: 55
Total Committers: 2
Avg Commits per committer: 27.5
Development Distribution Score (DDS): 0.018

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Xiaozhe Yao	a**o@g**m	54
Jue Wang	j**e@c**i	1

Committer Domains (Top 20 + Academic)

cr-a100-80gb-sxm-8x-chortletoes.cloud.together.ai: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 3 minutes
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

LorrinWWW (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

build/Dockerfile docker

nvcr.io/nvidia/pytorch 23.08-py3 build

requirements.txt pypi

Cython *
accelerate *
datasets *
deepspeed *
diffusers *
evaluate *
loguru *
numpy *
pandas *
peft *
scikit-build *
scikit-learn *
sentencepiece *
tabulate *
tokenizers *
transformers *
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

fmengine-torch

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

FMEngine

Training preparation

Training

Supported Models

Multi-host training

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies