https://github.com/bytedance/lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 18 committers (5.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Keywords

accelerate bart beam-search bert cuda diverse-decoding gpt inference multilingual-nmt sampling training transformer

Last synced: 5 months ago · JSON representation

Repository

LightSeq: A High Performance Library for Sequence Processing and Generation

Basic Info

Host: GitHub
Owner: bytedance
License: other
Language: C++
Default Branch: master
Homepage:
Size: 11.9 MB

Statistics

Stars: 3,290
Watchers: 56
Forks: 332
Open Issues: 180
Releases: 13

Archived

Topics

accelerate bart beam-search bert cuda diverse-decoding gpt inference multilingual-nmt sampling training transformer

Created about 6 years ago · Last pushed almost 3 years ago

Metadata Files

Readme Contributing License Codeowners

LightSeq: A High Performance Library for Sequence Processing and Generation

Release Notes
Introduction
- Support Matrix
Performance
Installation
- Install from PyPI
- Build from Source
Getting Started
Cite Us
We are Hiring!

Release Notes

[2022.10.25] Release v3.0.0 version, which supports int8 mixed-precision training and inference. [中文介绍]

[2021.06.18] Release v2.0.0 version, which supports fp16 mixed-precision training. [中文介绍]

[2019.12.06] Release v1.0.0 version, which supports fp16 mixed-precision inference. [中文介绍]

Introduction

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP and CV models such as BERT, GPT, Transformer, etc. It is therefore best useful for machine translation, text generation, image classification, and other sequence related tasks.

The library is built on top of CUDA official library(cuBLAS, Thrust, CUB) and custom kernel functions which are specially fused and optimized for Transformer model family. In addition to model components, the inference library also provide easy-to-deploy model management and serving backend based on TensorRT Inference Server. With LightSeq, one can easily develop modified Transformer architecture with little additional code.

LightSeq training and inference is very fast. Below is the overall performance: * LightSeq fp16 training achieves a speedup of up to 3x, compared to PyTorch fp16 training. * LightSeq int8 training achieves a speedup of up to 5x, compared to PyTorch QAT (i.e., quantization aware training). * LightSeq fp16 and int8 inference achieve a speedup of up to 12x and 15x, compared to PyTorch fp16 inference, respectively.

Support Matrix

LightSeq supports multiple features, which is shown in the table below. | Features | Support List | | ------------------ | -------------------------------------------------------------------- | | Model | Transformer, BERT, BART, GPT2, ViT, T5, MT5, XGLM, VAE, Multilingual, MoE | | Layer | embedding, encoder, decoder, criterion, optimizer | | Precision | fp32, fp16, int8 | | Mode | training, inference | | Compatibility | Fairseq, Hugging Face, DeepSpeed | | Decoding Algorithm | beam search, diverse beam search, sampling, CRF | | Others | gradient communication quantization, auto-tune GEMM algorithm |

The table below shows the running modes and precision currently supported by different models. | Models | fp16 Training | fp16 Inference | int8 Training | int8 Inference | | ------------ | ------------- | -------------- | ------------- | -------------- | | Transformer | Yes | Yes | Yes | Yes | | BERT | Yes | Yes | Yes | Yes | | GPT2 | Yes | Yes | Yes | Yes | | BART | Yes | Yes | - | - | | T5 | - | Yes | - | - | | MT5 | - | Yes | - | - | | XGLM | - | Yes | - | - | | ViT | Yes | Yes | Yes | Yes | | VAE | - | Yes | - | - | | Multilingual | - | Yes | - | Yes | | MoE | - | Yes | - | - |

Performance

We test the speedup of LightSeq training and inference using both fp16 and int8 mix-precision on Transformer and BERT models. The baseline is PyTorch fp16 mix-precision. Training experiments are tested on one A100 GPU and inference experiments are tested on eight A100 GPUs.

More performance results are available here.

Speedup of Transformer Training

| Batch Token Size | PyTorch QAT | LightSeq fp16 | LightSeq int8 | | ---------------- | ----------- | ------------- | ------------- | | 512 | 0.36 | 1.99 | 1.86 | | 1024 | 0.37 | 1.78 | 1.69 | | 2048 | 0.37 | 1.56 | 1.50 | | 4096 | 0.39 | 1.47 | 1.44 | | 8192 | 0.41 | 1.44 | 1.44 | | 15000 | 0.43 | 1.44 | 1.44 |

Speedup of BERT Training

| Batch Token Size | PyTorch QAT | LightSeq fp16 | LightSeq int8 | | ---------------- | ----------- | ------------- | ------------- | | 8 | 0.45 | 2.12 | 1.99 | | 16 | 0.44 | 1.92 | 1.80 | | 32 | 0.42 | 1.59 | 1.52 | | 64 | 0.46 | 1.62 | 1.58 | | 128 | 0.46 | 1.74 | 1.70 | | 256 | 0.46 | 1.68 | 1.73 |

Speedup of Transformer Inference

| Batch Size | Sequence Length | LightSeq fp16 | LightSeq int8 | |------------|-----------------|---------------|---------------| | 1 | 8 | 8.00 | 9.33 | | 1 | 32 | 6.48 | 7.38 | | 1 | 128 | 6.24 | 6.19 | | 8 | 8 | 9.38 | 10.71 | | 8 | 32 | 8.24 | 8.75 | | 8 | 128 | 6.83 | 7.28 | | 32 | 8 | 11.82 | 14.44 | | 32 | 32 | 9.68 | 11.15 | | 32 | 128 | 6.68 | 7.74 |

Speedup of BERT Inference

| Batch Size | Sequence Length | LightSeq fp16 | LightSeq int8 | | ---------- | --------------- | ------------- | ------------- | | 1 | 8 | 9.22 | 9.87 | | 1 | 32 | 10.51 | 11.30 | | 1 | 128 | 9.96 | 10.85 | | 8 | 8 | 9.88 | 10.33 | | 8 | 32 | 7.79 | 8.22 | | 8 | 128 | 4.04 | 4.35 | | 32 | 8 | 10.60 | 11.02 | | 32 | 32 | 8.11 | 8.85 | | 32 | 128 | 1.82 | 2.04 |

Installation

Install from PyPI

You can install LightSeq from PyPI, which only supports Python 3.6 to 3.8 on Linux: shell pip install lightseq

Build from Source

You can also build from source: shell PATH=/usr/local/hdf5/:$PATH ENABLE_FP32=0 ENABLE_DEBUG=0 pip install -e $PROJECT_DIR

Detailed building introduction is available here.

Getting Started

We provide several samples here to show the usage of LightSeq. Refer to the complete user guide and examples for more details.

LightSeq Training from Scratch

You can use the modules provided by LightSeq to build your own models. The following is an example of building a Transformer encoder layer.

First, import LightSeq Transformer encoder module: python from lightseq.training import LSTransformerEncoderLayer

Then create an encoder configuration, and create a LightSeq Transformer encoder layer initialized with the configuration: python config = LSTransformerEncoderLayer.get_config( max_batch_tokens=4096, max_seq_len=512, hidden_size=1024, intermediate_size=4096, nhead=16, attn_prob_dropout_ratio=0.1, activation_dropout_ratio=0.1, hidden_dropout_ratio=0.1, pre_layer_norm=True, activation_fn="relu", fp16=True, local_rank=0, ) layer = LSTransformerEncoderLayer(config)

In addition to encoder layers, the other modules can be created using similar methods, and then be trained as normal PyTorch models.

More usage is available here.

LightSeq Training from Fairseq

LightSeq integrates all the fast and lightning modules into Fairseq.

First install the two following requirements: shell pip install fairseq==0.10.2 sacremoses

You can train a fp16 mix-precision translation task on wmt14 en2de dataset by: shell sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh

(Optional) Then you can start int8 mix-precision training on the basis of fp16 pre-training models by: shell sh examples/training/fairseq/ls_fairseq_quant_wmt14en2de.sh

More usage is available here.

LightSeq Training from Hugging Face BERT

LightSeq replaces the encoder layers of Hugging Face BERT with LightSeq fast layers.

First you should install these requirements:

shell pip install transformers seqeval datasets

Before doing next training, you need to switch to the following directory: shell cd examples/training/huggingface/bert

Then you can easily fine-tune BERT for different tasks. Taking named entity recognition task as an example, you can train the BERT with fp16 mixed-precision using: shell python task_ner/run_ner.sh

(Optional) You can also start int8 mix-precision training on the basis of fp16 pre-training models by: shell python task_ner/run_quant_ner.sh

More usage is available here.

LightSeq Inference from Fairseq

After training using the above scripts, you can quickly infer the models using LightSeq.

You should transform the fp16 PyTorch weights to LightSeq protobuf or HDF5: shell python export/fairseq/ls_fs_transformer_export.py

(Optional) You can also transform the int8 PyTorch weights to LightSeq protobuf or HDF5: shell python export/fairseq/ls_fs_quant_transformer_export.py

Once obtaining the LightSeq weights, you can quickly infer them using the following code: python import lightseq.inference as lsi model = lsi.Transformer(MODEL_PATH, MAX_BATCH_SIZE) results = model.infer([[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1]]) Here MODELPATH is the path of your LightSeq weights and MAXBATCH_SIZE is the maximal batch size of your input sentences.

You can also quickly infer the int8 LightSeq weights by replacing the lsi.Transformer with lsi.QuantTransformer.

More usage is available here.

LightSeq Inference from Hugging Face BERT

We provide an end2end bert-base example to see how fast Lightseq is compared to original Hugging Face.

First you should install the requirements and locate to the specified directory: shell pip install transformers cd examples/inference/python

Then you can check the performance by simply running the following commands. hf_bert_export.py is used to transform PyTorch weights to LightSeq protobuf or HDF5. shell python export/huggingface/hf_bert_export.py python test/ls_bert.py

More usage is available here.

LightSeq Deployment Using Inference Server

We provide a docker image which contains tritonserver and LightSeq's dynamic link library, and you can deploy an inference server by simply replacing the model file with your own model file. shell sudo docker pull hexisyztem/tritonserver_lightseq:22.01-1

More usage is available here.

Cite Us

If you use LightSeq in your research, please cite the following papers.

``` @InProceedings{wang2021lightseq, title = "{L}ight{S}eq: A High Performance Inference Library for Transformers", author = "Wang, Xiaohui and Xiong, Ying and Wei, Yang and Wang, Mingxuan and Li, Lei", booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers (NAACL-HLT)", month = jun, year = "2021", publisher = "Association for Computational Linguistics", pages = "113--120", }

@article{wang2021lightseq2, title={LightSeq2: Accelerated Training for Transformer-based Models on GPUs}, author={Wang, Xiaohui and Xiong, Ying and Qian, Xian and Wei, Yang and Li, Lei and Wang, Mingxuan}, journal={arXiv preprint arXiv:2110.05722}, year={2021} } ```

We are Hiring!

The LightSeq team is hiring Interns and FTEs with backgrounds in deep learning system, natural language processing, computer vision, speech, etc. We are based in Beijing and Shanghai. If you are interested, please send your resume to wangxiaohui.neo@bytedance.com.

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

GitHub Events

Total

Issues event: 2
Watch event: 105
Issue comment event: 3
Fork event: 8

Last Year

Issues event: 2
Watch event: 105
Issue comment event: 3
Fork event: 8

Committers

Last synced: 9 months ago

All Time

Total Commits: 244
Total Committers: 18
Avg Commits per committer: 13.556
Development Distribution Score (DDS): 0.701

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Ying Xiong	x**a@b**m	73
Yang Wei	g**g@g**m	69
Xiaohui Wang	w**o@b**m	40
zhoubofan	z**n@b**m	34
Xingyao Wang	w**o@b**m	5
aachong	3****g	5
Lei Li	l**c@g**m	3
Jersey	3****y	2
Xiong Ying	x**2@g**m	2
Ying Zhang	4****8	2
AnYang	4**3@q**m	2
naivebird	2****d	1
lszxb	7**b@g**m	1
Xingyao Wang	x**w@u**u	1
Kangmo Kim	k**m@g**m	1
nomadlx	n**x@l**n	1
nullday	a**y@h**m	1
xian8	8****8	1

Committer Domains (Top 20 + Academic)

bytedance.com: 4 live.cn: 1 umich.edu: 1 qq.com: 1

Issues and Pull Requests

Last synced: 5 months ago

All Time

Total issues: 74
Total pull requests: 57
Average time to close issues: about 2 months
Average time to close pull requests: about 2 months
Total issue authors: 63
Total pull request authors: 10
Average comments per issue: 2.35
Average comments per pull request: 0.12
Merged pull requests: 34
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Kangmo (3)
MeJerry215 (3)
xiao2mo (2)
frankxyy (2)
Youggls (2)
dengcunqin (2)
moseshu (2)
xiao12mm (2)
Csinclair0 (2)
lileilai (1)
Wayne-Bfx (1)
GongCQ (1)
Mi-Peng (1)
wzh232894 (1)
quancq (1)

Pull Request Authors

hexisyztem (28)
neopro12 (12)
Anychnn (6)
401qingkong (3)
Taka152 (2)
szha (2)
godweiyang (1)
Kangmo (1)
aseaday (1)
Csinclair0 (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 2,074 last-month

Total dependent packages: 2
(may contain duplicates)
Total dependent repositories: 9
(may contain duplicates)
Total versions: 34
Total maintainers: 2

pypi.org: lightseq

LightSeq is a high performance library for sequence processing and generation

Homepage: https://github.com/bytedance/lightseq
Documentation: https://lightseq.readthedocs.io/
License: Apache Software License
Latest release: 3.0.1
published over 3 years ago

Versions: 21
Dependent Packages: 2
Dependent Repositories: 9
Downloads: 2,074 Last month

Rankings

Stargazers count: 1.4%

Forks count: 2.9%

Dependent packages count: 3.2%

Average: 3.8%

Dependent repos count: 4.9%

Downloads: 6.9%

Maintainers (2)

godweiyang taka152

Last synced: 5 months ago

proxy.golang.org: github.com/bytedance/lightseq

Documentation: https://pkg.go.dev/github.com/bytedance/lightseq#section-documentation
License: other
Latest release: v3.0.1+incompatible
published over 3 years ago

Versions: 13
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 0.8%

Forks count: 1.0%

Average: 4.5%

Dependent packages count: 7.0%

Dependent repos count: 9.3%

Last synced: 5 months ago

Dependencies

examples/training/fairseq/requirements.txt pypi

fairseq *
lightseq *
ninja *
numpy ==1.19.5
sacrebleu ==1.5.1
sacremoses *

examples/training/huggingface/bart/summarization/requirements.txt pypi

accelerate *
datasets >=1.8.0
nltk *
protobuf *
py7zr *
rouge-score *
sentencepiece *
torch >=1.3

examples/training/huggingface/gpt/requirements.txt pypi

datasets >=1.8.0
protobuf *
sentencepiece *
torch >=1.3
transformers ==4.16.2

setup.py pypi

ninja *
numpy *
scipy *

.github/workflows/build_check.yml actions

actions/checkout v2 composite
actions/checkout v1 composite

docker/Pypi/Dockerfile docker

quay.io/pypa/manylinux2014_x86_64 latest build

docker/Tritonserver/Dockerfile docker

nvcr.io/nvidia/tritonserver 22.01-py3 build

https://github.com/bytedance/lightseq

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

LightSeq: A High Performance Library for Sequence Processing and Generation

Table Of Contents

Release Notes

Introduction

Support Matrix

Performance

Speedup of Transformer Training

Speedup of BERT Training

Speedup of Transformer Inference

Speedup of BERT Inference

Installation

Install from PyPI

Build from Source

Getting Started

LightSeq Training from Scratch

LightSeq Training from Fairseq

LightSeq Training from Hugging Face BERT

LightSeq Inference from Fairseq

LightSeq Inference from Hugging Face BERT

LightSeq Deployment Using Inference Server

Cite Us

We are Hiring!

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: lightseq

Rankings

Maintainers (2)

proxy.golang.org: github.com/bytedance/lightseq

Rankings

Dependencies