aitemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

https://github.com/facebookincubator/aitemplate

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 98 committers (1.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Keywords from Contributors

onnx

Last synced: 10 months ago · JSON representation ·

Repository

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Basic Info

Host: GitHub
Owner: facebookincubator
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 5.12 MB

Statistics

Stars: 4,660
Watchers: 82
Forks: 383
Open Issues: 155
Releases: 0

Created almost 4 years ago · Last pushed 11 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

AITemplate

| |

AITemplate (AIT) is a Python framework that transforms deep neural networks into CUDA (NVIDIA GPU) / HIP (AMD GPU) C++ code for lightning-fast inference serving. AITemplate highlights include:

High performance: close to roofline fp16 TensorCore (NVIDIA GPU) / MatrixCore (AMD GPU) performance on major models, including ResNet, MaskRCNN, BERT, VisionTransformer, Stable Diffusion, etc.
Unified, open, and flexible. Seamless fp16 deep neural network models for NVIDIA GPU or AMD GPU. Fully open source, Lego-style easily extendable high-performance primitives for new model support. Supports a significantly more comprehensive range of fusions than existing solutions for both GPU platforms.

More about AITemplate

Excellent Backward Capability

AITemplate doesn't depend on third-party libraries or runtimes, such as cuBLAS, cuDNN, rocBLAS, MIOpen, TensorRT, MIGraphX, etc. Each model is compiled into a self-contained portable binary, which can be used on any software environment with the same hardware.

Horizontal Fusion

AITemplate provides unique advanced horizontal fusion. AITemplate can fuse parallel GEMM, LayerNorm, and other operators with different input shapes into a single GPU kernel.

Vertical Fusion

AITemplate provides strong vertical fusion. AITemplate can fuse a large range of operations into TensorCore/MatrixCore operations, such as elementwise operations, reductions, and layout permutations. AITemplate also provides back-to-back style TensorCore / MatrixCore operation fusion.

Memory Fusion

AITemplate provides innovative memory fusions. AITemplate can fuse GEMM, LayerNorm, and other operators, followed by memory operations such as concatenation, split, and slice into a single operator.

Working w/wo PyTorch

The AITemplate-generated Python runtime can take PyTorch tensors as inputs and outputs without an extra copy. For environments without PyTorch, the AITemplate Python/C++ runtime is self-contained.

Extensions without suffering

AITemplate provides a straightforward approach for making an extension in codegen. To add a new operator or a new fused kernel into AITemplate, most of the time one only needs to add two Python files: one for a graph node definition and another for the backend codegen. The CUDA/HIP kernel in a text header file can be directly utilized in the codegen.

FX2AIT

FX2AIT is a Python-based tool that converts PyTorch models into AITemplate (AIT) engine for lightning-fast inference serving. Using FX2AIT's built-in AITLowerer, partial AIT acceleration can be achieved for models with unsupported operators in AITemplate.

Key features of FX2AIT include:

Easy Conversion: FX2AIT requires only a PyTorch model and input for conversion, generating an "AITModule" output for inference serving.
Expanded Support: AITemplate does not support all PyTorch operators. FX2AIT's AITLowerer offers a solution for partial AIT conversion for models with unsupported operators. Check the fx2ait/fx2ait/example/03_lowering_split for more information.

More info can be found from https://github.com/facebookincubator/AITemplate/tree/main/fx2ait.

Installation

Hardware requirements:

NVIDIA: AIT is only tested on SM80+ GPUs (Ampere etc). Not all kernels work with old SM75/SM70 (T4/V100) GPUs.
AMD: AIT is only tested on CDNA2 (MI-210/250) GPUs. There may be compiler issues for old CDNA1 (MI-100) GPUs.

Clone the code

When cloning the code, please use the following command to also clone the submodules: git clone --recursive https://github.com/facebookincubator/AITemplate

Docker Image

We highly recommend using AITemplate with Docker to avoid accidentally using a wrong version of NVCC or HIPCC.

CUDA: ./docker/build.sh cuda
ROCM: DOCKER_BUILDKIT=1 ./docker/build.sh rocm

This will build a docker image with tag ait:latest.

From Source

The following command will create a Python wheel for AITemplate. Please ensure you have correct CUDA/ROCm compiler installed.

CUDA: CUDA 11.6
ROCm: We tested on ROCm 5.2.3 with a customized build HIPCC with the command in docker/Dockerfile.rocm#L87-L96

Incorrect compiler will lead performance regression.

Please check all submodules are cloned correctly before go to next step.

cd python python setup.py bdist_wheel pip install dist/*.whl --force-reinstall

Getting Started

Check out the AITemplate Documentation for API reference.

There are a few tutorials for onboarding:

Examples & Performance

AITemplate provides the following model templates & reference performance data on A100/MI-250:

01_ResNet-50 with PyTorch Image Models (TIMM)
02_MaskRCNN-FPN with Detectron2
03_BERT with Hugging Face Transformer
04_Vision Transformer with PyTorch Image Models (TIMM)
05_Stable Diffusion with Hugging Face Diffusers

Release

All current development updates can be seen in the AITemplate repository. Releases are not on a set schedule and will only be tagged for significant feature releases.

Mid-term plan:

Better dynamic shape support: Focus on the dynamic sequence in Transformers. Add symbolic shape support.
More automatic graph passes: Relief manual rewrite models to obtain the best performance.
Quantization: fp8/int8/int4.
Sparsity pruning for Gemm.
PT2 integration: Aten2AIT is under active development.

Long-term plan:

Automatic ONNX, Open-XLA and other format model conversion.
Composable Kernel CPU extension on AVX2/AVX-512 for AMD Epyc CPU.

Contributing

Check our contributing guide to learn about how to contribute to the project.

The Team

AITemplate is currently maintained by Meta engineers: Ying Zhang, Yang Chen, Terry Chen, Mu-Chu Lee, Max Podkorytov, Adnan Akhundov.

AITemplate is co-created by Meta engineers: Bing Xu, Ying Zhang, Hao Lu, Yang Chen, and Terry Chen, with major contributions coming from other talented engineers. A non-exhaustive list to mention is Mike Iovine, Mu-Chu Lee, Scott Wolchok, Oleg Khabinov, Shirong Wu, Huamin Li, Hui Guo, Zhijing Li, Max Podkorytov. We also want to thank Andrew Tulloch, Yinghai Lu, Lu Fang for the valuable discussions.

FX2AIT and Aten2AIT are co-created and maintained by Meta engineers: Wei Wei, Shirong Wu and Zhijing Li.

Acknowledgements

AITemplate team works closely with NVIDIA CUTLASS Team (led by Andrew Kerr, Haicheng Wu) and AMD Composable Kernel Team (led by Chao Liu, Jing Zhang). We co-designed many advanced GPU optimizations specialized for each platform, and nothing is possible without our close collaboration.

License

AITemplate is licensed under the Apache 2.0 License.

Owner

Name: Meta Incubator
Login: facebookincubator
Kind: organization
Location: Menlo Park, California

Website: https://opensource.fb.com
Repositories: 81
Profile: https://github.com/facebookincubator

We work hard to contribute our work back to the web, mobile, big data, & infrastructure communities. NB: members must have two-factor auth.

Citation (CITATION.cff)

cff-version: 1.2.0
title: AITemplate
message: >-
  If you use this software, please cite using the
  following metadata.
type: software
authors:
  - given-names: Bing
    family-names: Xu
    affiliation: Meta
    email: bingxu@meta.com
  - given-names: Ying
    family-names: Zhang
    affiliation: Meta
    email: yingz@meta.com
  - given-names: Hao
    family-names: Lu
    affiliation: Meta
    email: hlu@meta.com
  - given-names: Yang
    family-names: Chen
    affiliation: Meta
    email: yangche@meta.com
  - given-names: Terry
    family-names: Chen
    affiliation: Meta
    email: terrychen@meta.com
  - given-names: Mike
    family-names: Iovine
    affiliation: Meta
    email: mikeiovine@meta.com
  - given-names: Mu-Chu
    family-names: Lee
    affiliation: Meta
    email: mlee8@meta.com
  - given-names: Zhijing
    family-names: Li
    affiliation: Meta
    email: tissue030@meta.com


repository-code: 'https://github.com/facebookincubator/AITemplate'
abstract: >-
  AITemplate (AIT) is a unified inference framework with separate acceleration backends for both AMD and NVIDIA GPU hardware. It delivers close to hardware-native Tensor Core (NVIDIA GPU) and Matrix Core (AMD GPU) performance on a variety of widely used AI models such as convolutional neural networks, transformers, and diffusers.
keywords:
  - 'neural network, cutlass, composable kernel, cuda, rocm'
license: Apache 2.0
license-url: https://github.com/facebookincubator/AITemplate/LICENSE
version: '0.1'
date-released: '2022-10-03'
identifiers:
  - type: url
    value: "https://github.com/facebookincubator/AITemplate/tree/v0.1.0"
    description: The GitHub release URL of tag 0.1.0

GitHub Events

Total

Issues event: 1
Watch event: 145
Issue comment event: 17
Push event: 4
Pull request event: 10
Fork event: 21

Last Year

Issues event: 1
Watch event: 145
Issue comment event: 17
Push event: 4
Pull request event: 10
Fork event: 21

Committers

Last synced: about 1 year ago

All Time

Total Commits: 714
Total Committers: 98
Avg Commits per committer: 7.286
Development Distribution Score (DDS): 0.857

Past Year

Commits: 16
Committers: 12
Avg Commits per committer: 1.333
Development Distribution Score (DDS): 0.813

Top Committers

Name	Email	Commits
Adnan Akhundov	a**v@m**m	102
Max Podkorytov	m**p@m**m	58
Yang Chen	y**e@f**m	55
Alexander Pivovarov	a**v@g**m	51
Mu-Chu Lee	m**8@m**m	39
Ying Zhang	y**z@m**m	29
Wei Wei	w**6@m**m	28
Zhijing Li (Accelerator Enablement)	t**0@m**m	27
Alexandr Guzhva	a**a@m**m	25
Kai Londenberg	k**g@m**m	21
Colin Chan	c**c@m**m	17
Huamin Li	h**i@m**m	16
Henry Hu	h**h@m**m	16
Janet Yang	q**1@m**m	15
Shirong Wu	s**g@m**m	13
Jez Ng	j**g@m**m	11
Grigory Sizov	g**v@m**m	10
Cheng Cai	c**i@m**m	10
Terry Chen	h**u@h**m	10
Colin Peppler	c**r@m**m	10
Terry Chen	t**n@m**m	9
fsx950223	f**3@o**m	9
Mor Tzur	m**r@m**m	7
hlky	1****y	7
Bing Xu	a**n@g**m	6
Richard Barnes	r**s@m**m	4
Oleg Khabinov	k**v@m**m	4
chengscott	6**t@g**m	4
generatedunixname89002005232357	g**7@f**m	4
Yanming Wang	y**g@a**m	4
and 68 more...

Committer Domains (Top 20 + Academic)

meta.com: 61 fb.com: 7 amazon.com: 2 foxmail.com: 1 live.cn: 1 linux.com: 1 bytedance.com: 1 cs.fsu.edu: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 95
Total pull requests: 113
Average time to close issues: 11 days
Average time to close pull requests: 8 days
Total issue authors: 67
Total pull request authors: 55
Average comments per issue: 3.47
Average comments per pull request: 3.15
Merged pull requests: 25
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 22
Average time to close issues: 17 days
Average time to close pull requests: 13 days
Issue authors: 5
Pull request authors: 15
Average comments per issue: 1.4
Average comments per pull request: 2.68
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

dashesy (7)
OrangeSodahub (4)
jiangwei221 (4)
antinucleon (3)
Suhail (3)
ericlormul (3)
causten (2)
ecilay (2)
gexahedron (2)
msaroufim (2)
mvpatel2000 (2)
jonpryai (2)
pommedeterresautee (2)
ADongGu (2)
chengscott (1)

Pull Request Authors

aakhundov (13)
frank-wei (7)
muchulee8 (6)
henryhu6 (6)
chenyang78 (6)
r-barnes (6)
terrychenism (6)
hlky (6)
bradleyhd (5)
22quinn (5)
antinucleon (5)
zoranzhao (5)
hl475 (4)
tpolasek (3)
chengscott (3)

Top Labels

Issue Labels

wishlist (1) bug (1) CLA Signed (1) fb-exported (1)

Pull Request Labels

CLA Signed (118) fb-exported (78) Merged (46) module: rocm (9)

Dependencies

docker/install/rocm_dev-requirements.txt pypi

ROCmSoftwarePlatform * development
danmar * development

python/setup.py pypi

jinja2 *

aitemplate

Science Score: 54.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

AITemplate

More about AITemplate

Excellent Backward Capability

Horizontal Fusion

Vertical Fusion

Memory Fusion

Working w/wo PyTorch

Extensions without suffering

FX2AIT

Installation

Clone the code

Docker Image

From Source

Getting Started

Examples & Performance

Release

Contributing

The Team

Acknowledgements

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies