colossalai

https://github.com/hpcaitech/colossalai - Version v0.5.0 Release Today!

What's Changed

[HotFix] update load lora model Readme; by @duanjunwen in https://github.com/hpcaitech/ColossalAI/pull/6240
Update README.md by @Yanjia0 in https://github.com/hpcaitech/ColossalAI/pull/6268
[ci] update ci by @flybird11111 in https://github.com/hpcaitech/ColossalAI/pull/6254
[upgrade]Upgrade transformers by @flybird11111 in https://github.com/hpcaitech/ColossalAI/pull/6320
[release] release version by @flybird11111 in https://github.com/hpcaitech/ColossalAI/pull/6330

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.9...v0.5.0

- Python
Published by github-actions[bot] about 1 year ago

https://github.com/hpcaitech/colossalai - Version v0.4.9 Release Today!

What's Changed

Release

[release] update version (#6236) by Hongxin Liu

Hotfix

[hotfix] fix lora load (#6231) by Hongxin Liu

Misc

[misc] update torch version (#6206) by Hongxin Liu

Chat

Merge pull request #6208 from hpcaitech/grpo_dev by YeAnbang

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.9...v0.4.8

- Python
Published by github-actions[bot] over 1 year ago

https://github.com/hpcaitech/colossalai - Version v0.4.8 Release Today!

What's Changed

Release

[release] update version (#6195) by Hongxin Liu

Doc

[doc] DeepSeek V3/R1 news (#6199) by binmakeswell

Application

[application] add lora sft example data (#6198) by Hongxin Liu
[application] Update README (#6196) by Tong Li
[application] add lora sft example (#6192) by Hongxin Liu

Pre-commit.ci

Add GRPO and Support RLVR for PPO (#6186) by YeAnbang

Checkpointio

[checkpointio] fix for async io (#6189) by flybird11111
[checkpointio] fix checkpoint for 3d (#6187) by flybird11111
[checkpointio] gather tensor before unpad it if the tensor is both padded and distributed (#6168) by Lemon Qin
[checkpointio] support load-pin overlap (#6177) by Hongxin Liu

Hotfix

[hotfix] fix zero optim save (#6191) by Hongxin Liu
[hotfix] fix hybrid checkpointio for sp+dp (#6184) by flybird11111

Shardformer

[shardformer] support pipeline for deepseek v3 and optimize lora save (#6188) by Hongxin Liu
[shardformer] support ep for deepseek v3 (#6185) by Hongxin Liu

Ci

[CI] Cleanup Dist Optim tests with shared helper funcs (#6125) by Wenxuan Tan

Issue template

[Issue template] Add checkbox asking for details to reproduce error (#6104) by Wenxuan Tan

Inference

[Inference]Fix example in readme (#6178) by Guangyao Zhang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.8...v0.4.7

- Python
Published by github-actions[bot] over 1 year ago

https://github.com/hpcaitech/colossalai - Version v0.4.7 Release Today!

What's Changed

Release

[release] update version (#6174) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] pre-commit autoupdate (#6113) by pre-commit-ci[bot]

Sharderformer

[Sharderformer] Support zbv in Sharderformer Policy (#6150) by duanjunwen

Checkpointio

[checkpointio] support non blocking pin load (#6172) by Hongxin Liu
[checkpointio]support asyncio for 3d (#6152) by flybird11111
[checkpointio] fix async io (#6155) by flybird11111
[checkpointio] support debug log (#6153) by Hongxin Liu
[checkpointio] fix zero optimizer async save memory (#6151) by Hongxin Liu
Merge pull request #6149 from ver217/hotfix/ckpt by Wang Binluo
[checkpointio] disable buffering by ver217
[checkpointio] fix pinned state dict by ver217
[checkpointio] fix size compute by ver217
[checkpointio] fix performance issue (#6139) by Hongxin Liu
[checkpointio] support async model save (#6131) by Hongxin Liu

News

[news] release colossalai for sora (#6166) by binmakeswell

Hotfix

[hotfix] improve compatibility (#6165) by Hongxin Liu
[Hotfix] hotfix normalization (#6163) by duanjunwen
[hotfix] fix zero comm buffer init (#6154) by Hongxin Liu
[hotfix] fix flash attn window_size err (#6132) by duanjunwen

Doc

[doc] add bonus event (#6164) by binmakeswell
[doc] update cloud link (#6148) by Sze-qq
[doc] add hpc cloud intro (#6147) by Sze-qq

Device

[Device]Support npu (#6159) by flybird11111

Fix

[fix] fix bug caused by perf version (#6156) by duanjunwen
[fix] multi-node backward slowdown (#6134) by Hanks

Optim

[optim] hotfix adam load (#6146) by Hongxin Liu

Zerobubble

[Zerobubble] merge main. (#6142) by duanjunwen

Async io

[async io]supoort async io (#6137) by flybird11111

Ckpt

[ckpt] Add async ckpt api (#6136) by Wang Binluo

Cli

[cli] support run as module option (#6135) by Hongxin Liu

Zero

[zero] support extra dp (#6123) by Hongxin Liu

Coati

[Coati] Refine prompt for better inference (#6117) by Tong Li

Plugin

[plugin] support getgradnorm (#6115) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.7...v0.4.6

- Python
Published by github-actions[bot] over 1 year ago

https://github.com/hpcaitech/colossalai - Version v0.4.6 Release Today!

What's Changed

Release

[release] update version (#6109) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] pre-commit autoupdate (#6078) by pre-commit-ci[bot]

Checkpointio

[checkpointio] fix hybrid plugin model save (#6106) by Hongxin Liu

Mcts

[MCTS] Add self-refined MCTS (#6098) by Tong Li

Doc

[doc] sora solution news (#6100) by binmakeswell

Extension

[extension] hotfix compile check (#6099) by Hongxin Liu

Hotfix

Merge pull request #6096 from BurkeHulk/hotfix/lora_ckpt by Hanks

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.6...v0.4.5

- Python
Published by github-actions[bot] over 1 year ago

https://github.com/hpcaitech/colossalai - Version v0.4.5 Release Today!

What's Changed

Release

[release] update version (#6094) by Hongxin Liu

Misc

[misc] fit torch api upgradation and remove legecy import (#6093) by Hongxin Liu

Fp8

[fp8] add fallback and make compile option configurable (#6092) by Hongxin Liu

Chore

[chore] refactor by botbw

Ckpt

[ckpt] add safetensors util by botbw

Pipeline

[pipeline] hotfix backward for multiple outputs (#6090) by Hongxin Liu

Ring attention

[Ring Attention] Improve comments (#6085) by Wenxuan Tan
Merge pull request #6071 from wangbluo/ring_attention by Wang Binluo

Coati

[Coati] Train DPO using PP (#6054) by Tong Li

Shardformer

[shardformer] optimize seq parallelism (#6086) by Hongxin Liu
[shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.5...v0.4.4

- Python
Published by github-actions[bot] over 1 year ago

https://github.com/hpcaitech/colossalai - Version v0.4.4 Release Today!

What's Changed

Release

[release] update version (#6062) by Hongxin Liu

Colossaleval

[ColossalEval] support for vllm (#6056) by Camille Zhong

Moe

[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) by botbw

Sp

Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo

Doc

[doc] FP8 training and communication document (#6050) by Guangyao Zhang
[doc] update sp doc (#6055) by flybird11111

Fp8

[fp8] Disable allgather intranode. Disable Redundant allgather fp8 (#6059) by Guangyao Zhang
[fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
[fp8] hotfix backward hook (#6053) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Hotfix

[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) by botbw

Feature

[Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.4...v0.4.3

- Python
Published by github-actions[bot] almost 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.4.3 Release Today!

What's Changed

Release

[release] update version (#6041) by Hongxin Liu

Fp8

[fp8] disable alltoall_fp8 in intranode (#6045) by Hanks
[fp8] fix linear hook (#6046) by Hongxin Liu
[fp8] optimize all-gather (#6043) by Hongxin Liu
[FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
Merge pull request #6033 from wangbluo/fix by Wang Binluo
Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
[fp8] zero support fp8 linear. (#6006) by flybird11111
[fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
[fp8]update reduce-scatter test (#6002) by flybird11111
[fp8] linear perf enhancement by botbw
[fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
[fp8] support asynchronous FP8 communication (#5997) by flybird11111
[fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
[fp8] support hybrid parallel plugin (#5982) by Wang Binluo
[fp8]Moe support fp8 communication (#5977) by flybird11111
[fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
[fp8] support gemini plugin (#5978) by Hongxin Liu
[fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
[fp8] add fp8 linear (#5967) by Hongxin Liu
[fp8]support all2all fp8 (#5953) by flybird11111
[FP8] rebase main (#5963) by flybird11111
Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
[fp8] add fp8 comm for low level zero by ver217

Hotfix

[Hotfix] Remove deprecated install (#6042) by Tong Li
[Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
[Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
[Hotfix] README link (#5966) by Tong Li
[hotfix] Remove unused plan section (#5957) by Tong Li

Colossalai/checkpoint_io/...

[colossalai/checkpointio/...] fix bug in loadstatedictinto_model; format error msg (#6020) by Gao, Ruiyuan

Colossal-llama

[Colossal-LLaMA] Refactor latest APIs (#6030) by Tong Li

Plugin

[plugin] hotfix zero plugin (#6036) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) by Hongxin Liu

Ci

[CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Colossalchat

[ColossalChat] Add PP support (#6001) by Tong Li

Misc

[misc] Use dist logger in plugins (#6011) by Edenzzzz
[misc] update compatibility (#6008) by Hongxin Liu
[misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
[misc] remove useless condition by haze188
[misc] fix ci failure: change default value to false in moe plugin by haze188
[misc] remove incompatible test config by haze188
[misc] remove debug/print code by haze188
[misc] skip redunant test by haze188
[misc] solve booster hang by rename the variable by haze188

Feature

[Feature] Zigzag Ring attention (#5905) by Edenzzzz
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
[Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
[Feature] MoE Ulysses Support (#5918) by Haze188

Chat

[Chat] fix readme (#5989) by YeAnbang
Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
[Chat] Fix lora (#5946) by YeAnbang

Test ci

[test ci]Feature/fp8 comm (#5981) by flybird11111

Docs

[Docs] clarify launch port by Edenzzzz

Test

[test] add zero fp8 test case by ver217
[test] add check by hxwang
[test] fix test: testzero12 by hxwang
[test] add mixtral modelling test by botbw
[test] pass mixtral shardformer test by botbw
[test] mixtra pp shard test by hxwang
[test] add mixtral transformer test by hxwang
[test] add mixtral for sequence classification by hxwang

Lora

[lora] lora support hybrid parallel plugin (#5956) by Wang Binluo

Feat

[feat] Dist Loader for Eval (#5950) by Tong Li

Chore

[chore] remove redundant test case, print string & reduce test tokens by botbw
[chore] docstring by hxwang
[chore] change moepgmesh to private by hxwang
[chore] solve moe ckpt test failure and some other arg pass failure by hxwang
[chore] minor fix after rebase by hxwang
[chore] minor fix by hxwang
[chore] arg pass & remove drop token by hxwang
[chore] trivial fix by botbw
[chore] manually revert unintended commit by botbw
[chore] handle non member group by hxwang

Moe

[moe] solve dp axis issue by botbw
[moe] remove forceoverlapcomm flag and add warning instead by hxwang
Revert "[moe] implement submesh initialization" by hxwang
[moe] refactor mesh assignment by hxwang
[moe] deepseek moe sp support by haze188
[moe] remove ops by hxwang
[moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
[moe] finalize test (no pp) by hxwang
[moe] init moe plugin comm setting with sp by hxwang
[moe] clean legacy code by hxwang
[moe] test deepseek by hxwang
[moe] implement tp by botbw
[moe] add mixtral dp grad scaling when not all experts are activated by botbw
[moe] implement submesh initialization by botbw
[moe] implement transit between non moe tp and ep by botbw
[moe] fix plugin by hxwang

Doc

[doc] add MoeHybridParallelPlugin docstring by botbw

Deepseek

[deepseek] replace attn (a workaround for bug in transformers) by hxwang

Bug

[bug] fix: somehow logger hangs the program by botbw

Zero

[zero] solve hang by botbw
[zero] solve hang by hxwang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.3...v0.4.2

- Python
Published by github-actions[bot] almost 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.4.2 Release Today!

What's Changed

Release

[release] update version (#5952) by Hongxin Liu

Zero

[zero] hotfix update master params (#5951) by Hongxin Liu

Feat

[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) by Runyu Lu

Shardformer

[shardformer] hotfix attn mask (#5947) by Hongxin Liu
[shardformer] hotfix attn mask (#5945) by Hongxin Liu

Chat

Merge pull request #5922 from hpcaitech/kto by YeAnbang

Feature

[Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua

Hotfix

[Hotfix] Fix ZeRO typo #5936 by Edenzzzz

Fix bug

[FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
[FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua

Colossalchat

[ColossalChat] Hotfix for ColossalChat (#5910) by Tong Li

Examples

[Examples] Add lazy init to OPT and GPT examples (#5924) by Edenzzzz

Plugin

[plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.2...v0.4.1

- Python
Published by github-actions[bot] almost 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.4.1 Release Today!

What's Changed

Release

[release] update version (#5912) by Hongxin Liu

Misc

[misc] support torch2.3 (#5893) by Hongxin Liu

Compatibility

[compatibility] support torch 2.2 (#5875) by Guangyao Zhang

Chat

Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang

Shardformer

[ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
[shardformer] DeepseekMoE support (#5871) by Haze188
[shardformer] fix the moe (#5883) by Wang Binluo
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
[shardformer]delete xformers (#5859) by flybird11111

Auto parallel

[Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö

Zero

[zero] support all-gather overlap (#5898) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]

Feature

[Feature] Enable PP + SP for llama (#5868) by Edenzzzz

Hotfix

[HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
[Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188

Feat

[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) by Runyu Lu

Hoxfix

[Hoxfix] Fix CUDADEVICEMAX_CONNECTIONS for comm overlap by Edenzzzz

Quant

[quant] fix bitsandbytes version check (#5882) by Hongxin Liu

Doc

[doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz

Moe/zero

[MoE/ZeRO] Moe refactor with zero refactor (#5821) by Haze188

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.1...v0.4.0

- Python
Published by github-actions[bot] about 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.4.0 Release Today!

What's Changed

Release

[release] update version (#5864) by Hongxin Liu

Inference

[Inference]Lazy Init Support (#5785) by Runyu Lu

Shardformer

[shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang

Zero

[zero] use bucket during allgather (#5860) by Hongxin Liu

Gemini

[gemini] fixes for benchmarking (#5847) by botbw
[gemini] fix missing return (#5845) by botbw

Feature

[Feature] optimize PP overlap (#5735) by Edenzzzz

Doc

[doc] add GPU cloud playground (#5851) by binmakeswell
[doc] fix open sora model weight link (#5848) by binmakeswell
[doc] opensora v1.2 news (#5846) by binmakeswell

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.0...v0.3.9

- Python
Published by github-actions[bot] about 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.9 Release Today!

What's Changed

Release

[release] update version (#5833) by Hongxin Liu

Fix

[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao

Shardformer

[shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
[shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
[shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
[shardformer] fix import (#5788) by Hongxin Liu

Devops

[devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
[devops] fix docker ci (#5780) by Hongxin Liu

Launch

[launch] Support IPv4 host initialization in launch (#5822) by Kai Lv

Misc

[misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
[misc] update requirements (#5787) by Hongxin Liu
[misc] fix dist logger (#5782) by Hongxin Liu
[misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
[misc] update dockerfile (#5776) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Gemini

[gemini] quick fix on possible async operation (#5803) by botbw
[Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
[gemini] optimize reduce scatter d2h copy (#5760) by botbw

Inference

[Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
[Inference]refactor baichuan (#5791) by Runyu Lu
Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
[Inference]Add Streaming LLM (#5745) by yuehuayingxueluo

Test

[test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
[test] fix chatglm test kit (#5793) by Hongxin Liu
[test] Fix/fix testcase (#5770) by duanjunwen

Colossalchat

Merge pull request #5759 from hpcaitech/colossalchat_upgrade by YeAnbang

Install

[install]fix setup (#5786) by flybird11111

Hotfix

[hotfix] fix testcase in testfx/testtracer (#5779) by duanjunwen
[hotfix] fix llama flash attention forward (#5777) by flybird11111
[Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao

Test/ci

[Test/CI] remove test cases to reduce CI duration (#5753) by botbw

Ci/tests

[CI/tests] simplify some test case to reduce testing time (#5755) by Haze188

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.9...v0.3.8

- Python
Published by github-actions[bot] about 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.8 Release Today!

What's Changed

Release

[release] update version (#5752) by Hongxin Liu

Fix/example

[Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao

Gemini

Merge pull request #5749 from hpcaitech/prefetch by botbw
Merge pull request #5754 from Hz188/prefetch by botbw
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
Merge pull request #5733 from Hz188/feature/prefetch by botbw
Merge pull request #5731 from botbw/prefetch by botbw
[gemini] init auto policy prefetch by hxwang
Merge pull request #5722 from botbw/prefetch by botbw
[gemini] maxprefetch means maximum work to keep by hxwang
[gemini] use compute_chunk to find next chunk by hxwang
[gemini] prefetch chunks by hxwang
[gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

[chore] refactor profiler utils by hxwang
[chore] remove unnecessary assert since compute list might not be recorded by hxwang
[chore] remove unnecessary test & changes by hxwang
Merge pull request #5738 from botbw/prefetch by Haze188
[chore] fix init error by hxwang
[chore] Update placement_policy.py by botbw
[chore] remove debugging info by hxwang
[chore] remove print by hxwang
[chore] refactor & sync by hxwang
[chore] sync by hxwang

Bug

[bug] continue fix by hxwang
[bug] workaround for idx fix by hxwang
[bug] fix early return (#5740) by botbw

Bugs

[bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

[inference] Fix running time of testcontinuousbatching (#5750) by Yuanheng Zhao
[Inference]Fix readme and example for API server (#5742) by Jianghai
[inference] release (#5747) by binmakeswell
[Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
[Inference] Fix API server, test and example (#5712) by Jianghai
[Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
[Inference]Adapt repetitionpenalty and norepeatngramsize (#5708) by yuehuayingxueluo
[Inference] Add example test_ci script by CjhHa1
[Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
[Inference] resolve rebase conflicts by CjhHa1
[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
[Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
[Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
[Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
[Inference] Fix quant bits order (#5681) by 傅剑寒
[inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
[Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo

Feature

[Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
[Feature] qlora support (#5586) by linsj20

Example

[example] add profile util for llama by hxwang
[example] Update Inference Example (#5725) by Yuanheng Zhao

Colossal-inference

Colossal-Inference Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

[NFC] fix requirements (#5744) by Yuanheng Zhao
[NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao

Ci

[ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
[ci] Fix example tests (#5714) by Yuanheng Zhao

Sync

Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
[sync] Sync feature/colossal-infer with main by Yuanheng Zhao
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
[sync] resolve conflicts of merging main by Yuanheng Zhao

Shardformer

[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
[Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
[shardformer] support biasgelujit_fused for models (#5647) by flybird11111

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Doc

[doc] Update Inference Readme (#5736) by Yuanheng Zhao

Fix/inference

[Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao

Lazy

[lazy] fix lazy cls init (#5720) by flybird11111

Misc

[misc] Update PyTorch version in docs (#5724) by binmakeswell
[misc] Update PyTorch version in docs (#5711) by Edenzzzz
[misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
[misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu

Colossal-llama

[Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
[Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
[Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

[Feat]Inference RPC Server Support (#5705) by Runyu Lu

Hotfix

[hotfix] fix inference typo (#5438) by hugo-syn
[hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) by Yuanheng Zhao

Inference/feat

[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) by 傅剑寒
[Inference/Feat] Add quant kvcache interface (#5700) by 傅剑寒
[Inference/Feat] Add quant kvcache support for decodekvcache_memcpy (#5686) by 傅剑寒
[Inference/Feat] Add kvcache quant support for fusedrotaryembeddingcachecopy (#5680) by 傅剑寒
[Inference/Feat] Feat quant kvcache step2 (#5674) by 傅剑寒

Online server

[Online Server] Chat Api for streaming and not streaming response (#5470) by Jianghai

Zero

[zero]remove registered gradients hooks (#5687) by flybird11111

Kernel

[kernel] Support New KCache Layout - Triton Kernel (#5677) by Yuanheng Zhao

Inference/kernel

[Inference/Kernel] refactor kvcache manager and rotaryembedding and kvcachememcpy oper… (#5663) by Steve Luo

Lowlevelzero

[LowLevelZero] low level zero support lora (#5153) by flybird11111

Lora

[lora] add lora APIs for booster, support lora for TorchDDP (#4981) by Baizhou Zhang

Devops

[devops] fix release docker ci (#5665) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.8...v0.3.7

- Python
Published by github-actions[bot] about 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.7 Release Today!

What's Changed

Release

[release] update version (#5654) by Hongxin Liu
[release] grok-1 inference benchmark (#5500) by binmakeswell
[release] grok-1 314b inference (#5490) by binmakeswell

Hotfix

[hotfix] add soft link to support required files (#5661) by Tong Li
[hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
[hotfix] fix typo s/getdefualtparser /getdefaultparser (#5548) by digger yu
[hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
[hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
[hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

[news] llama3 and open-sora v1.1 (#5655) by binmakeswell

Lazyinit

[lazyinit] skip whisper test (#5653) by Hongxin Liu

Shardformer

[shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
[shardformer] fix chatglm implementation (#5644) by Hongxin Liu
[shardformer] remove useless code (#5645) by flybird11111
[shardformer] update transformers (#5583) by Wang Binluo
[shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
[shardformer] refactor embedding resize (#5603) by flybird11111
[shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
[shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
[shardformer]Fix lm parallel. (#5480) by flybird11111
[shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111

Fix

[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
[fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
[fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

[coloattention]modify coloattention (#5627) by flybird11111

Example

[example] llama3 (#5631) by binmakeswell
[example] update Grok-1 inference (#5495) by Yuanheng Zhao
[example] add grok-1 inference (#5485) by Hongxin Liu

Exampe

[exampe] update llama example (#5626) by Hongxin Liu

Feature

[Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li

Zero

[zero] support multiple (partial) backward passes (#5596) by Hongxin Liu

Doc

[doc] fix ColossalMoE readme (#5599) by Camille Zhong
[doc] update open-sora demo (#5479) by binmakeswell
[doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell

Devops

[devops] remove post commit ci (#5566) by Hongxin Liu
[devops] fix example test ci (#5504) by Hongxin Liu
[devops] fix compatibility (#5444) by Hongxin Liu

Shardformer, pipeline

[shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

[ColossalChat] Update RLHF V2 (#5286) by YeAnbang

Format

[format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.7...v0.3.6

- Python
Published by github-actions[bot] about 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.6 Release Today!

What's Changed

Release

[release] update version (#5411) by Hongxin Liu

Colossal-llama2

[colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

[hotfix] fix stable diffusion inference bug. (#5289) by Youngon
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
[hotfix] fix typo change _descrption to _description (#5331) by digger yu
[hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
[hotfix] fix sd vit import error (#5420) by MickeyCHAN
[hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
[hotfix] fix variable type for top_p (#5313) by CZYCW

Doc

[doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
[doc] update some translations with README-zh-Hans.md (#5382) by digger yu
[doc] sora release (#5425) by binmakeswell
[doc] fix blog link by binmakeswell
[doc] fix blog link by binmakeswell
[doc] updated installation command (#5389) by Frank Lee
[doc] Fix typo (#5361) by yixiaoer

Eval-hotfix

[eval-hotfix] set fewshotdata to None when few shot is disabled (#5422) by Dongruixuan Li

Devops

[devops] fix extention building (#5427) by Hongxin Liu

Example

[example]add gpt2 benchmark example script. (#5295) by flybird11111
[example] reuse flash attn patch (#5400) by Hongxin Liu

Workflow

[workflow] added pypi channel (#5412) by Frank Lee

Shardformer

[shardformer]gather llama logits (#5398) by flybird11111

Setup

[setup] fixed nightly release (#5388) by Frank Lee

Fsdp

[fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

[extension] hotfix jit extension setup (#5402) by Hongxin Liu

Llama

[llama] fix training and inference scripts (#5384) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.6...v0.3.5

- Python
Published by github-actions[bot] over 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.5 Release Today!

What's Changed

Release

[release] update version (#5380) by Hongxin Liu

Llama

Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
[llama] fix memory issue (#5371) by Hongxin Liu
[llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
[llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
[llama] add flash attn patch for npu (#5362) by Hongxin Liu
[llama] update training script (#5360) by Hongxin Liu
[llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

[moe] fix tests by ver217
[moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
[moe] fix mixtral forward default value (#5329) by Hongxin Liu
[moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
[moe] support mixtral (#5309) by Hongxin Liu
[moe] update capacity computing (#5253) by Hongxin Liu
[moe] init mixtral impl by Xuanlei Zhao
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
[moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
[moe] merge moe into main (#4978) by Xuanlei Zhao

Lr-scheduler

[lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

[eval] update llama npu eval (#5366) by Camille Zhong

Gemini

[gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
[gemini] gemini support extra-dp (#5043) by flybird11111
[gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

[fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

[Chat] fix sft loss nan (#5345) by YeAnbang

Extension

[extension] fixed exception catch (#5342) by Frank Lee

Doc

[doc] added docs for extensions (#5324) by Frank Lee
[doc] add llama2-13B disyplay (#5285) by Desperado-Jia
[doc] fix doc typo (#5256) by binmakeswell
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
[doc] SwiftInfer release (#5236) by binmakeswell
[doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
[doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
[doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
[doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
[doc] update pytorch version in documents. (#5177) by flybird11111
[doc] fix colossalqa document (#5146) by Michelle
[doc] updated paper citation (#5131) by Frank Lee
[doc] add moe news (#5128) by binmakeswell

Tests

[tests] fix t5 test. (#5322) by flybird11111

Accelerator

Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
[accelerator] fixed npu api by FrankLeeeee
[accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

[workflow] updated CI image (#5318) by Frank Lee
[workflow] fixed oom tests (#5275) by Frank Lee
[workflow] fixed incomplete bash command (#5272) by Frank Lee
[workflow] fixed build CI (#5240) by Frank Lee

Feat

[feat] refactored extension module (#5298) by Frank Lee

Nfc

[NFC] polish applications/Colossal-LLaMA-2/colossalllama2/tokenizer/inittokenizer.py code style (#5228) by 李文军
[nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
[nfc] fix typo change directoty to directory (#5111) by digger yu
[nfc] fix typo and author name (#5089) by digger yu
[nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

[hotfix] fix 3d plugin test (#5292) by Hongxin Liu
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
[hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
[hotfix] removed unused flag (#5242) by Frank Lee
[hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
[hotfix]: modify createephierarchical_group and add test (#5032) by Wenhao Chen
[hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
[hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

[shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
[shardformer] llama support DistCrossEntropy (#5176) by flybird11111
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
[shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

[ci] fix testhybridparallelplugincheckpoint_io.py (#5276) by flybird11111
[ci] fix shardformer tests. (#5255) by flybird11111
[ci] fixed ddp test (#5254) by Frank Lee
[ci] fixed booster test (#5251) by Frank Lee

Npu

[npu] change device to accelerator api (#5239) by Hongxin Liu
[npu] use extension for op builder (#5172) by Xuanlei Zhao
[npu] support triangle attention for llama (#5130) by Xuanlei Zhao
[npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
[npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

[pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
[pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

[format] applied code formatting on changed files in pull request 5234 (#5235) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5115 (#5118) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5124 (#5125) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5088 (#5127) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5067 (#5072) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4926 (#5007) by github-actions[bot]

Colossal-llama-2

[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen

Devops

[devops] update torch versoin in ci (#5217) by Hongxin Liu

Colossaleval

[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen

Colossalqa

[colossalqa] fix pangu api (#5170) by Michelle
[ColossalQA] refactor server and webui & add new feature (#5138) by Michelle

Plugin

[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111

Feature

[FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) by Zian(Andy) Zheng
[Feature] Add document retrieval QA (#5020) by YeAnbang

Inference

[inference] refactor examples and fix schedule (#5077) by Hongxin Liu
[inference] update examples and engine (#5073) by Xu Kai
[inference] Refactor inference architecture (#5057) by Xu Kai
[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai

Hotfix/hybridengine

[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia

Misc

[misc] remove outdated submodule (#5070) by Hongxin Liu
[misc] add code owners (#5024) by Hongxin Liu

Kernels

[Kernels]added flash-decoidng of triton (#5063) by Cuiqing Li (李崔卿)
[Kernels]Update triton kernels into 2.1.0 (#5046) by Cuiqing Li (李崔卿)

Exampe

[exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111

Pipeline,shardformer

[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.5...v0.3.4

- Python
Published by github-actions[bot] over 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.4 Release Today!

What's Changed

Release

[release] update version (#4995) by Hongxin Liu

Pipeline inference

[Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
[Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
[Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

[doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
[doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
Merge pull request #4889 from ppt0011/main by ppt0011
[doc] add reminder for issue encountered with hybrid adam by ppt0011
[doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
Merge pull request #4858 from Shawlleyw/main by ppt0011
[doc] update slack link (#4823) by binmakeswell
[doc] add lazy init docs (#4808) by Hongxin Liu
Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
[doc] polish shardformer doc (#4779) by Baizhou Zhang
[doc] add llama2 domain-specific solution news (#4789) by binmakeswell

Hotfix

[hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
[hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
[hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
[hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
[hotfix] fix bug in sequence parallel test (#4887) by littsk
[hotfix] Correct several erroneous code comments (#4794) by littsk
[hotfix] fix norm type error in zero optimizer (#4795) by littsk
[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

[Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
[Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
[inference] add reference and fix some bugs (#4937) by Xu Kai
[inference] Add smmoothquant for llama (#4904) by Xu Kai
[inference] add llama2 support (#4898) by Xu Kai
[inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

[test] merge old components to test to model zoo (#4945) by Hongxin Liu
[test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
Merge pull request #4856 from KKZ20/test/modelsupportforlowlevel_zero by ppt0011
[test] modify model supporting part of lowlevelzero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

[Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

[nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
[nfc] fix minor typo in README (#4846) by Blagoy Simandoff
[NFC] polish code style (#4799) by Camille Zhong
[NFC] polish colossalai/inference/quant/gptq/caigptq/init_.py code style (#4792) by Michelle

Format

[format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]

Gemini

[gemini] support gradient accumulation (#4869) by Baizhou Zhang
[gemini] support amp o3 for gemini (#4872) by Hongxin Liu

Kernel

[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

[feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
[feature] Add clipgradnorm for hybridparallelplugin (#4837) by littsk
[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

[checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang

Infer

[infer] fix test bug (#4838) by Xu Kai
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
[Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao

Chat

[chat] fix gemini strategy (#4698) by flybird11111

Misc

[misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

[lazy] support from_pretrained (#4801) by Hongxin Liu

Fix

[fix] fix weekly runing example (#4787) by flybird11111

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.4...v0.3.3

- Python
Published by github-actions[bot] over 2 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.3 Release Today!

What's Changed

Release

[release] update version (#4775) by Hongxin Liu

Inference

[inference] chatglm2 infer demo (#4724) by Jianghai

Feature

[feature] add gptq for inference (#4754) by Xu Kai
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
[bug] fix getdefaultparser in examples (#4764) by Baizhou Zhang

Lazy

[lazy] support torch 2.0 (#4763) by Hongxin Liu

Chat

[chat]: add lora merge weights config (#4766) by Wenhao Chen
[chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen

Doc

[doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
[doc] clean up outdated docs (#4765) by Hongxin Liu
Merge pull request #4757 from ppt0011/main by ppt0011
[doc] put native colossalai plugins first in description section by Pengtai Xu
[doc] add model examples for each plugin by Pengtai Xu
[doc] put individual plugin explanation in front by Pengtai Xu
[doc] explain suitable use case for each plugin by Pengtai Xu
[doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
[doc] polish shardformer doc (#4735) by Baizhou Zhang
[doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
[doc] Add user document for Shardformer (#4702) by Baizhou Zhang
[doc] fix llama2 code link (#4726) by binmakeswell
[doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
[doc] Update booster user documents. (#4669) by Baizhou Zhang

Shardformer

[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
[shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
[shardformer] update seq parallel document (#4730) by Bin Jia
[shardformer] update pipeline parallel document (#4725) by flybird11111
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
[shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
[shardformer] update shardformer readme (#4689) by flybird11111
[shardformer]fix gpt2 double head (#4663) by flybird11111
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242

Misc

[misc] update pre-commit and run all files (#4752) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]

Legacy

[legacy] clean up legacy code (#4743) by Hongxin Liu
Merge pull request #4738 from ppt0011/main by ppt0011
[legacy] remove deterministic data loader test by Pengtai Xu
[legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu

Kernel

[kernel] update triton init #4740 (#4740) by Xuanlei Zhao

Example

[example] llama2 add fine-tune example (#4673) by flybird11111
[example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
[example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang

Hotfix

[hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
[hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang

Devops

[devops] fix concurrency group (#4667) by Hongxin Liu
[devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu

Pipeline

[pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.3...v0.3.2

- Python
Published by github-actions[bot] almost 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.2 Release Today!

What's Changed

Release

[release] update version (#4623) by Hongxin Liu

Shardformer

Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
[shardformer] update shardformer readme (#4617) by flybird11111
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
[shardformer] Pytree fix (#4533) by Jianghai
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
[shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
[shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
[shardformer] fix opt test hanging (#4521) by flybird11111
[shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
[shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
[shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
[shardformer] opt fix. (#4514) by flybird11111
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
[shardformer] tests for 3d parallel (#4493) by Jianghai
[shardformer] chatglm support sequence parallel (#4482) by flybird11111
[shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
[shardformer] Pipeline/whisper (#4456) by Jianghai
[shardformer] bert support sequence parallel. (#4455) by flybird11111
[shardformer] bloom support sequence parallel (#4465) by flybird11111
[shardformer] support interleaved pipeline (#4448) by LuGY
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
[shardformer] fix import by ver217
[shardformer] fix embedding by ver217
[shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
[shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
[shardformer] update tests for all optimization (#4413) by flybird11111
[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
[shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
[shardformer] test all optimizations (#4399) by flybird1111
[shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
[Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
[shardformer] add util functions for shardformer tests/fix syncsharedparam (#4366) by Baizhou Zhang
[shardformer] support Blip2 (#4243) by FoolPlayer
[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
[shardformer] pre-commit check files by klhhhhh
[shardformer] register without auto policy by klhhhhh
[shardformer] ChatGLM support layernorm sharding by klhhhhh
[shardformer] delete some file by klhhhhh
[shardformer] support chatglm without layernorm by klhhhhh
[shardformer] polish code by klhhhhh
[shardformer] polish chatglm code by klhhhhh
[shardformer] add test kit in model zoo for chatglm by klhhhhh
[shardformer] vit test finish and support by klhhhhh
[shardformer] added tests by klhhhhh
Feature/chatglm (#4240) by Kun Lin
[shardformer] support whisper (#4212) by FoolPlayer
[shardformer] support SAM (#4231) by FoolPlayer
Feature/vit support (#4182) by Kun Lin
[shardformer] support pipeline base vit model (#4284) by FoolPlayer
[shardformer] support inplace sharding (#4251) by Hongxin Liu
[shardformer] fix base policy (#4229) by Hongxin Liu
[shardformer] support lazy init (#4202) by Hongxin Liu
[shardformer] fix type hint by ver217
[shardformer] rename policy file name by ver217

Legacy

[legacy] move builder and registry to legacy (#4603) by Hongxin Liu
[legacy] move engine to legacy (#4560) by Hongxin Liu
[legacy] move trainer to legacy (#4545) by Hongxin Liu

Test

[test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
[test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
[test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
[test] skip some not compatible models by FoolPlayer
[test] add shard util tests by ver217
[test] update shardformer tests by ver217
[test] remove useless tests (#4359) by Hongxin Liu

Zero

[zero] hotfix master param sync (#4618) by Hongxin Liu
[zero]fix zero ckptIO with offload (#4529) by LuGY
[zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

[checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
[checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu

Coati

Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
[coati] update ci by ver217
[coati] add chatglm model (#4539) by yingliu-hpc

Doc

[doc] add llama2 benchmark (#4604) by binmakeswell
[DOC] hotfix/llama2news (#4595) by binmakeswell
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
[doc] update Coati README (#4405) by Wenhao Chen
[doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
[doc] Fix gradient accumulation doc. (#4349) by flybird1111

Pipeline

[pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
[pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
[pipeline] add chatglm (#4363) by Jianghai
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
[pipeline] add unit test for 1f1b (#4303) by LuGY
[pipeline] fix returndict/fix purepipeline_test (#4331) by Baizhou Zhang
[pipeline] add pipeline support for all T5 models (#4310) by Baizhou Zhang
[pipeline] test pure pipeline process using llama (#4218) by Jianghai
[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) by Baizhou Zhang
[pipeline] reformat for unified design (#4283) by Jianghai
[pipeline] OPT model pipeline (#4258) by Jianghai
[pipeline] refactor gpt2 pipeline forwards (#4287) by Baizhou Zhang
[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) by Baizhou Zhang
[pipeline] finish bloom models pipeline and tests (#4223) by Jianghai
[pipeline] All bert models (#4233) by Jianghai
[pipeline] add pipeline forward for variants of gpt2 (#4238) by Baizhou Zhang
[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) by Baizhou Zhang
[pipeline] add bloom model pipeline (#4210) by Jianghai
[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) by Jianghai
[pipeline] Llama pipeline (#4205) by Jianghai
[pipeline] Bert pipeline for shardformer and its tests (#4197) by Jianghai
[pipeline] move bert related pipeline components to shardformer (#4187) by Jianghai
[pipeline] add bertforpretraining bert_lmhead forward and policy (#4172) by Jianghai
[pipeline] update shardformer docstring by ver217
[pipeline] update shardformer policy by ver217
[pipeline] build bloom model and policy , revise the base class of policy (#4161) by Jianghai
[pipeline]add pipeline policy and bert forward (#4130) by Jianghai
[pipeline] add stage manager (#4093) by Hongxin Liu
[pipeline]add pipeline policy and bert forward (#4130) by Jianghai
[pipeline] refactor 1f1b schedule (#4115) by Hongxin Liu
[pipeline] implement p2p communication (#4100) by Hongxin Liu
[pipeline] add stage manager (#4093) by Hongxin Liu

Fix

[Fix] Fix compile error (#4357) by Mashiro
[fix] coloattention support flash attention 2 (#4347) by flybird1111

Devops

[devops] cancel previous runs in the PR (#4546) by Hongxin Liu
[devops] add large-scale distributed test marker (#4452) by Hongxin Liu

Example

[example] change accelerate version (#4431) by Tian Siyuan
[example] update streamlit 0.73.1 to 1.11.1 (#4386) by ChengDaqi2023
[example] add llama2 example (#4527) by Hongxin Liu

Shardformer/fix overlap bug

[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) by Bin Jia

Format

[format] applied code formatting on changed files in pull request 4479 (#4504) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4441 (#4445) by github-actions[bot]

Gemini

[gemini] improve compatibility and add static placement policy (#4479) by Hongxin Liu
[gemini] fix tensor storage cleaning in state dict collection (#4396) by Baizhou Zhang

Shardformer/sequence parallel

[shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488) by Bin Jia
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) by Bin Jia
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450) by Bin Jia

Chat

[chat] update config and prompt (#4139) by Michelle
[chat] fix bugs and add unit tests (#4213) by Wenhao Chen

Misc

[misc] update requirements by ver217
[misc] resolve code factor issues (#4433) by Hongxin Liu

Sharformer

[sharformer] add first version of policy of chatglm by klhhhhh

Hotfix

[hotfix] fix gemini and zero test (#4333) by Hongxin Liu
[hotfix] fix opt pipeline (#4293) by Jianghai
[hotfix] fix unsafe async comm in zero (#4404) by LuGY
[hotfix] update gradio 3.11 to 3.34.0 (#4329) by caption

Plugin

[plugin] add 3d parallel plugin (#4295) by Hongxin Liu

Bugs

[bugs] hot fix some testing bugs for new models (#4268) by Jianghai

Cluster

[cluster] add process group mesh (#4039) by Hongxin Liu

Kernel

[kernel] updated unittests for coloattention (#4389) by flybird1111

Coloattention

[coloattention] fix import error (#4380) by flybird1111

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.2...v0.3.1

- Python
Published by github-actions[bot] almost 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.1 Release Today!

What's Changed

Release

[release] update version (#4332) by Hongxin Liu

Chat

[chat] fix computeapproxkl (#4338) by Wenhao Chen
[chat] removed cache file (#4155) by Frank Lee
[chat] use official transformers and fix some issues (#4117) by Wenhao Chen
[chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
[chat] refactor trainer class (#4080) by Wenhao Chen
[chat]: fix chat evaluation possible bug (#4064) by Michelle
[chat] refactor strategy class with booster api (#3987) by Wenhao Chen
[chat] refactor actor class (#3968) by Wenhao Chen
[chat] add distributed PPO trainer (#3740) by Hongxin Liu

Zero

[zero] optimize the optimizer step time (#4221) by LuGY
[zero] support shard optimizer state dict of zero (#4194) by LuGY
[zero] add state dict for low level zero (#4179) by LuGY
[zero] allow passing process group to zero12 (#4153) by LuGY
[zero]support no_sync method for zero1 plugin (#4138) by LuGY
[zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

[NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
[NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
[NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
[NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
[NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
[NFC] polish applications/Chat/examples/trainrewardmodel.py code style (#4271) by Xu Kai
[NFC] fix: format (#4270) by dayellow
[NFC] polish runtimepreparationpass style (#4266) by Wenhao Chen
[NFC] polish unaryelementwisegenerator.py code style (#4267) by YeAnbang
[NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
[NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
[NFC] polish colossalai/booster/plugin/lowlevelzero_plugin.py code style (#4256) by 梁爽
[NFC] polish colossalai/autoparallel/offload/ampoptimizer.py code style (#4255) by Yanjia0
[NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocdwithnaming
[NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
[NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
[NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
[NFC] Fix format for mixed precision (#4253) by Jianghai
[nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
[nfc] fix dim not defined and fix typo (#3991) by digger yu
[nfc] fix typo colossalai/zero (#3923) by digger yu
[nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
[nfc] fix typo colossalai/nn (#3887) by digger yu
[nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

Fix/format (#4261) by Michelle
[example] add llama pretraining (#4257) by binmakeswell
[example] fix bucket size in example of gpt gemini (#4028) by LuGY
[example] update ViT example using booster api (#3940) by Baizhou Zhang
Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
[example] update opt example using booster api (#3918) by Baizhou Zhang
[example] Modify palm example with the new booster API (#3913) by Liu Ziming
[example] update gemini examples (#3868) by jiangmingyan

Ci

[ci] support testmon core pkg change detection (#4305) by Hongxin Liu

Checkpointio

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang

Lazy

[lazy] support init on cuda (#4269) by Hongxin Liu
[lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
[lazy] refactor lazy init (#3891) by Hongxin Liu

Kernels

[Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

[docker] fixed ninja build command (#4203) by Frank Lee
[docker] added ssh and rdma support for docker (#4192) by Frank Lee

Dtensor

[dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
[dtensor] updated api and doc (#3845) by Frank Lee

Workflow

[workflow] show test duration (#4159) by Frank Lee
[workflow] added status check for test coverage workflow (#4106) by Frank Lee
[workflow] cover all public repositories in weekly report (#4069) by Frank Lee
[workflow] fixed the directory check in build (#3980) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] added docker latest tag for release (#3920) by Frank Lee
[workflow] fixed workflow check for docker build (#3849) by Frank Lee

Cli

[cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]

Shardformer

[shardformer] added development protocol for standardization (#4149) by Frank Lee
[shardformer] made tensor parallelism configurable (#4144) by Frank Lee
[shardformer] refactored some doc and api (#4137) by Frank Lee
[shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
[shardformer] added embedding gradient check (#4124) by Frank Lee
[shardformer] import huggingface implicitly (#4101) by Frank Lee
[shardformer] integrate with data parallelism (#4103) by Frank Lee
[shardformer] supported fused normalization (#4112) by Frank Lee
[shardformer] supported bloom model (#4098) by Frank Lee
[shardformer] support vision transformer (#4096) by Kun Lin
[shardformer] shardformer support opt models (#4091) by jiangmingyan
[shardformer] refactored layernorm (#4086) by Frank Lee
[shardformer] Add layernorm (#4072) by FoolPlayer
[shardformer] supported fused qkv checkpoint (#4073) by Frank Lee
[shardformer] add linearconv1d test (#4067) by FoolPlayer
[shardformer] support module saving and loading (#4062) by Frank Lee
[shardformer] refactored the shardformer layer structure (#4053) by Frank Lee
[shardformer] adapted T5 and LLaMa test to use kit (#4049) by Frank Lee
[shardformer] add gpt2 test and layer class refactor (#4041) by FoolPlayer
[shardformer] supported T5 and its variants (#4045) by Frank Lee
[shardformer] adapted llama to the new API (#4036) by Frank Lee
[shardformer] fix bert and gpt downstream with new api (#4024) by FoolPlayer
[shardformer] updated doc (#4016) by Frank Lee
[shardformer] removed inplace tensor sharding (#4018) by Frank Lee
[shardformer] refactored embedding and dropout to parallel module (#4013) by Frank Lee
[shardformer] integrated linear 1D with dtensor (#3996) by Frank Lee
[shardformer] Refactor shardformer api (#4001) by FoolPlayer
[shardformer] fix an error in readme (#3988) by FoolPlayer
[Shardformer] Downstream bert (#3979) by FoolPlayer
[shardformer] shardformer support t5 model (#3994) by wukong1992
[shardformer] support llama model using shardformer (#3969) by wukong1992
[shardformer] Add dropout layer in shard model and refactor policy api (#3949) by FoolPlayer
[shardformer] Unit test (#3928) by FoolPlayer
[shardformer] Align bert value (#3907) by FoolPlayer
[shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
[shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
[shardformer] update readme with modules implement doc (#3834) by FoolPlayer
[shardformer] refactored the user api (#3828) by Frank Lee
[shardformer] updated readme (#3827) by Frank Lee
[shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
[shardformer] init shardformer code structure (#3731) by FoolPlayer
[shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
[shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
[shardformer] update readme with modules implement doc (#3834) by FoolPlayer
[shardformer] refactored the user api (#3828) by Frank Lee
[shardformer] updated readme (#3827) by Frank Lee
[shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
[shardformer] init shardformer code structure (#3731) by FoolPlayer

Test

[test] fixed tests failed due to dtensor change (#4082) by Frank Lee
[test] fixed codefactor format report (#4026) by Frank Lee

Device

[device] support init device mesh from process group (#3990) by Frank Lee

Hotfix

[hotfix] fix import bug in checkpoint_io (#4142) by Baizhou Zhang
[hotfix]fix argument naming in docs and examples (#4083) by Baizhou Zhang

Doc

[doc] update and revise some typos and errs in docs (#4107) by Jianghai
[doc] add a note about unit-testing to CONTRIBUTING.md (#3970) by Baizhou Zhang
[doc] add lazy init tutorial (#3922) by Hongxin Liu
[doc] fix docs about booster api usage (#3898) by Baizhou Zhang
[doc]update moe chinese document. (#3890) by jiangmingyan
[doc] update document of zero with chunk. (#3855) by jiangmingyan
[doc] update nvme offload documents. (#3850) by jiangmingyan

Examples

[examples] copy resnet example to image (#4090) by Jianghai

Testing

[testing] move pytest to be inside the function (#4087) by Frank Lee

Gemini

Merge pull request #4056 from Fridge003/hotfix/fixgeminichunkconfigsearching by Baizhou Zhang
[gemini] fix argument naming during chunk configuration searching by Baizhou Zhang
[gemini] fixed the gemini checkpoint io (#3934) by Frank Lee
[gemini] fixed the gemini checkpoint io (#3934) by Frank Lee

Devops

[devops] fix build on pr ci (#4043) by Hongxin Liu
[devops] update torch version in compability test (#3919) by Hongxin Liu
[devops] hotfix testmon cache clean logic (#3917) by Hongxin Liu
[devops] hotfix CI about testmon cache (#3910) by Hongxin Liu
[devops] improving testmon cache (#3902) by Hongxin Liu

Sync

Merge pull request #4025 from hpcaitech/develop by Frank Lee
Merge pull request #3967 from ver217/update-develop by Frank Lee
Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer by FoolPlayer
Revert "[sync] sync feature/shardformer with develop" by Frank Lee
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer by FoolPlayer
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop by Frank Lee
Merge pull request #3915 from FrankLeeeee/update/develop by Frank Lee

Booster

[booster] make optimizer argument optional for boost (#3993) by Wenhao Chen
[booster] update bert example, using booster api (#3885) by wukong1992

Evaluate

[evaluate] support gpt evaluation with reference (#3972) by Yuanchen

Feature

Merge pull request #3926 from hpcaitech/feature/dtensor by Frank Lee

Bf16

[bf16] add bf16 support (#3882) by Hongxin Liu

Evaluation

[evaluation] improvement on evaluation (#3862) by Yuanchen

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.1...v0.3.0

- Python
Published by github-actions[bot] almost 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.3.0 Release Today!

What's Changed

Release

[release] bump to v0.3.0 (#3830) by Frank Lee

Nfc

[nfc] fix typo colossalai/ applications/ (#3831) by digger yu
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
[NFC] fix typo with colossalai/autoparallel/tensorshard (#3742) by digger yu
[NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
[NFC] polish colossalai/engine/gradienthandler/init_.py code style (#3329) by Ofey Chan
[NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
[NFC] polish colossalai/fx/tracer/tracerutils.py (#3323) by Michelle
[NFC] polish colossalai/gemini/paramhooks/paramhookmgr.py code style by Xu Kai
[NFC] polish initializer_data.py code style (#3287) by RichardoLuo
[NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
[NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
[NFC] polish colossalai/engine/gradientaccumulation/gradient_accumulation.py code style (#3277) by Sze-qq
[NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
[NFC] polish colossalai/engine/schedule/pipelineschedule_v2.py code style (#3275) by Zirui Zhu
[NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
[NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
[NFC] polish code style (#3273) by Xuanlei Zhao
[NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
[NFC] polish code style (#3268) by Yuanchen
[NFC] polish tensorplacementpolicy.py code style (#3265) by Camille Zhong
[NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
[NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
[NFC] polish colossalai/engine/gradienthandler/moegradienthandler.py (#3260) by LuGY
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

[doc] update document of gemini instruction. (#3842) by jiangmingyan
Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
[doc]fix by jiangmingyan
[doc]fix by jiangmingyan
[doc] add warning about fsdp plugin (#3813) by Hongxin Liu
[doc] add removed change of config.py by jiangmingyan
[doc] add removed warning by jiangmingyan
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update gradient accumulation (#3771) by jiangmingyan
[doc] update gradient cliping document (#3778) by jiangmingyan
[doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
[doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
[doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
[doc] add tutorial for booster plugins (#3758) by Hongxin Liu
[doc] add tutorial for cluster utils (#3763) by Hongxin Liu
[doc] update hybrid parallelism doc (#3770) by jiangmingyan
[doc] update booster tutorials (#3718) by jiangmingyan
[doc] fix chat spelling error (#3671) by digger-yu
[Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
[doc] Fix typo under colossalai and doc(#3618) by digger-yu
[doc] .github/workflows/README.md (#3605) by digger-yu
[doc] fix setup.py typo (#3603) by digger-yu
[doc] fix op_builder/README.md (#3597) by digger-yu
[doc] Update .github/workflows/README.md (#3577) by digger-yu
[doc] Update 1Dtensorparallel.md (#3573) by digger-yu
[doc] Update 1Dtensorparallel.md (#3563) by digger-yu
[doc] Update README.md (#3549) by digger-yu
[doc] Update README-zh-Hans.md (#3541) by digger-yu
[doc] hide diffusion in application path (#3519) by binmakeswell
[doc] add requirement and highlight application (#3516) by binmakeswell
[doc] Add docs for clip args in zero optim (#3504) by YH
[doc] updated contributor list (#3474) by Frank Lee
[doc] polish diffusion example (#3386) by Jan Roudaut
[doc] add Intel cooperation news (#3333) by binmakeswell
[doc] added authors to the chat application (#3307) by Fazzie-Maqianli

Workflow

[workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
[workflow] fixed testmon cache in build CI (#3806) by Frank Lee
[workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
[workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
[workflow] enable testing for develop & feature branch (#3801) by Frank Lee
[workflow] fixed the docker build workflow (#3794) by Frank Lee

Booster

[booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
[booster] torch fsdp fix ckpt (#3788) by wukong1992
[booster] removed models that don't support fsdp (#3744) by wukong1992
[booster] support torch fsdp plugin in booster (#3697) by wukong1992
[booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
[booster] fix no_sync method (#3709) by Hongxin Liu
[booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
[booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
[booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
[booster] add low level zero plugin (#3594) by Hongxin Liu
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
[booster] implement Gemini plugin (#3352) by ver217

Docs

[docs] change placememtpolicy to placementpolicy (#3829) by digger yu

Evaluation

[evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

[Docker] Fix a couple of build issues (#3691) by Yanming W
Fix/docker action (#3266) by liuzeming

Api

[API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

[test] fixed lazy init test import error (#3799) by Frank Lee
Update test_ci.sh by Camille Zhong
[test] refactor tests with spawn (#3452) by Frank Lee
[test] reorganize zero/gemini tests (#3445) by ver217
[test] fixed gemini plugin test (#3411) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3786 (#3787) by github-actions[bot]
[format] Run lint on colossalai.engine (#3367) by Hakjin Lee

Plugin

[plugin] a workaround for zero plugins' optimizer checkpoint (#3780) by Hongxin Liu
[plugin] torch ddp plugin supports sharded model checkpoint (#3775) by Hongxin Liu

Chat

[chat] add performance and tutorial (#3786) by binmakeswell
[chat] fix bugs in stage 3 training (#3759) by Yuanchen
[chat] fix community example ray (#3719) by MisterLin1995
[chat] fix train_prompts.py gemini strategy bug (#3666) by zhang-yi-chi
[chat] PPO stage3 doc enhancement (#3679) by Camille Zhong
[chat] add opt attn kernel (#3655) by Hongxin Liu
[chat] typo accimulationsteps -> accumulationsteps (#3662) by tanitna
Merge pull request #3656 from TongLi3701/chat/update_eval by Tong Li
[chat] set default zero2 strategy (#3667) by binmakeswell
[chat] refactor model save/load logic (#3654) by Hongxin Liu
[chat] remove lm model class (#3653) by Hongxin Liu
[chat] refactor trainer (#3648) by Hongxin Liu
[chat] polish performance evaluator (#3647) by Hongxin Liu
Merge pull request #3621 from zhang-yi-chi/fix/chat-train-prompts-single-gpu by Tong Li
[Chat] Remove duplicate functions (#3625) by ddobokki
[chat] fix enable single gpu training bug by zhang-yi-chi
[chat] polish code note typo (#3612) by digger-yu
[chat] update reward model sh (#3578) by binmakeswell
[chat] ChatGPT train prompts on ray example (#3309) by MisterLin1995
[chat] polish tutorial doc (#3551) by binmakeswell
[chat]add examples of training with limited resources in chat readme (#3536) by Yuanchen
[chat]: add vf_coef argument for PPOTrainer (#3318) by zhang-yi-chi
[chat] add zero2 cpu strategy for sft training (#3520) by ver217
[chat] fix stage3 PPO sample sh command (#3477) by binmakeswell
[Chat]Add Peft support & fix the ptx bug (#3433) by YY Lin
[chat]fix save_model(#3377) by Dr-Corgi
[chat]fix readme (#3429) by kingkingofall
[Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453) by Camille Zhong
[chat]fix sft training for bloom, gpt and opt (#3418) by Yuanchen
[chat] correcting a few obvious typos and grammars errors (#3338) by Andrew

Devops

[devops] fix doc test on pr (#3782) by Hongxin Liu
[devops] fix ci for document check (#3751) by Hongxin Liu
[devops] make build on PR run automatically (#3748) by Hongxin Liu
[devops] update torch version of CI (#3725) by Hongxin Liu
[devops] fix chat ci (#3628) by Hongxin Liu

Amp

[amp] Add naive amp demo (#3774) by jiangmingyan

Auto

[auto] fix install cmd (#3772) by binmakeswell

Fix

[fix] Add init to fix import error when importing _analyzer (#3668) by Ziyue Jiang

Ci

[CI] fix typo with tests/ etc. (#3727) by digger-yu
[CI] fix typo with tests components (#3695) by digger-yu
[CI] fix some spelling errors (#3707) by digger-yu
[CI] Update testshardedoptimwithsync_bn.py (#3688) by digger-yu

Example

[example] add train resnet/vit with booster example (#3694) by Hongxin Liu
[example] add finetune bert with booster example (#3693) by Hongxin Liu
[example] fix community doc (#3586) by digger-yu
[example] reorganize for community examples (#3557) by binmakeswell
[example] remove redundant texts & update roberta (#3493) by mandoxzhang
[example] update roberta with newer ColossalAI (#3472) by mandoxzhang
[example] update examples related to zero/gemini (#3431) by ver217

Tensor

[tensor] Refactor handletransspec in DistSpecManager by YH

Zero

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173) by YH
[zero] reorganize zero/gemini folder structure (#3424) by ver217

Gemini

[gemini] accelerate inference (#3641) by Hongxin Liu
[gemini] state dict supports fp16 (#3590) by Hongxin Liu
[gemini] support save state dict in shards (#3581) by Hongxin Liu
[gemini] gemini supports lazy init (#3379) by Hongxin Liu

Bot

[bot] Automated submodule synchronization (#3596) by github-actions[bot]

Misc

[misc] op_builder/builder.py (#3593) by digger-yu
[misc] add verbose arg for zero and op builder (#3552) by Hongxin Liu

Coati

[coati] fix install cmd (#3592) by binmakeswell
[coati] add costom model suppor tguide (#3579) by Fazzie-Maqianli
[coati] Fix LlamaCritic (#3475) by gongenlei

Fx

[fx] fix meta tensor registration (#3589) by Hongxin Liu

Chatgpt

[chatgpt] Detached PPO Training (#3195) by csric
[chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223) by Camille Zhong

Lazyinit

[lazyinit] fix clone and deepcopy (#3553) by Hongxin Liu

Checkpoint

[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479) by jiangmingyan
[checkpoint] support huggingface style sharded checkpoint (#3461) by jiangmingyan
[checkpoint] refactored the API and added safetensors support (#3427) by Frank Lee

Chat community

[Chat Community] Update README.md (fixed#3487) (#3506) by NatalieC323

Dreambooth

Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) by NatalieC323
[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378) by NatalieC323

Autoparallel

[autoparallel]integrate auto parallel feature with new tracer (#3408) by YuliangLiu0306
[autoparallel] adapt autoparallel with new analyzer (#3261) by YuliangLiu0306

Moe

[moe] add checkpoint for moe models (#3354) by HELSON

Hotfix

[hotfix] metatensorcompatibilitywithtorch2 by YuliangLiu0306

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.0...v0.2.8

- Python
Published by github-actions[bot] about 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.8 Release Today!

What's Changed

Release

[release] v0.2.8 (#3305) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
[format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]

Doc

[doc] add ColossalChat news (#3304) by binmakeswell
[doc] add ColossalChat (#3297) by binmakeswell
[doc] fix typo (#3222) by binmakeswell
[doc] update chatgpt doc paper link (#3229) by Camille Zhong
[doc] add community contribution guide (#3153) by binmakeswell
[doc] add Intel cooperation for biomedicine (#3108) by binmakeswell

Application

[application] updated the README (#3301) by Frank Lee

Chat

[chat]polish prompts training (#3300) by BlueRum
[chat]Update Readme (#3296) by BlueRum

Coati

[coati] fix inference profanity check (#3299) by ver217
[coati] inference supports profanity check (#3295) by ver217
[coati] add repetition_penalty for inference (#3294) by ver217
[coati] fix inference output (#3285) by ver217
[Coati] first commit (#3283) by Fazzie-Maqianli

Colossalchat

[ColossalChat]add cite for datasets (#3292) by Fazzie-Maqianli

Examples

[examples] polish AutoParallel readme (#3270) by YuliangLiu0306
[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323

Fx

[fx] meta registration compatibility (#3253) by HELSON
[FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306

Booster

[booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
[booster] implemented the cluster module (#3191) by Frank Lee
[booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
[booster] added the accelerator implementation (#3159) by Frank Lee
[booster] implemented mixed precision class (#3151) by Frank Lee

Ci

[CI] Fix pre-commit workflow (#3238) by Hakjin Lee

Api

[API] implement device mesh manager (#3221) by YuliangLiu0306
[api] implemented the checkpoint io module (#3205) by Frank Lee

Hotfix

[hotfix] skip torchaudio tracing test (#3211) by YuliangLiu0306
[hotfix] layout converting issue (#3188) by YuliangLiu0306

Chatgpt

[chatgpt] add precision option for colossalai (#3233) by ver217
[chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
[chatgpt] support instuct training (#3216) by Fazzie-Maqianli
[chatgpt]add reward model code for deberta (#3199) by Yuanchen
[chatgpt]support llama (#3070) by Fazzie-Maqianli
[chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
[chatgpt]Reward Model Training Process update (#3133) by BlueRum
[chatgpt] fix trainer generate kwargs (#3166) by ver217
[chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
[chatgpt]update ci (#3087) by BlueRum
[chatgpt]Fix examples (#3116) by BlueRum
[chatgpt] fix lora support for gpt (#3113) by BlueRum
[chatgpt] type miss of kwargs (#3107) by hiko2MSP
[chatgpt] fix lora save bug (#3099) by BlueRum

Lazyinit

[lazyinit] combine lazy tensor with dtensor (#3204) by ver217
[lazyinit] add correctness verification (#3147) by ver217
[lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

[auto] fix requirements typo for issue #3125 (#3209) by Yan Fang

Analyzer

[Analyzer] fix analyzer tests (#3197) by YuliangLiu0306

Dreambooth

[dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

[auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

[zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

[test] fixed torchrec registration in model zoo (#3177) by Frank Lee
[test] fixed torchrec model test (#3167) by Frank Lee
[test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
[test] added transformers models to test model zoo (#3135) by Frank Lee
[test] added torchvision models to test model zoo (#3132) by Frank Lee
[test] added timm models to test model zoo (#3129) by Frank Lee

Refactor

[refactor] update docs (#3174) by Saurav Maheshkar

Tests

[tests] model zoo add torchaudio models (#3138) by ver217
[tests] diffuser models in model zoo (#3136) by HELSON

Docker

[docker] Add opencontainers image-spec to Dockerfile (#3006) by Saurav Maheshkar

Dtensor

[DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306

Workflow

[workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

[autochunk] support complete benchmark (#3121) by Xuanlei Zhao

Tutorial

[tutorial] update notes for TransformerEngine (#3098) by binmakeswell

Nvidia

[NVIDIA] Add FP8 example using TE (#3080) by Kirthi Shankar Sivamani

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.8...v0.2.7

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.6 Release Today!

What's Changed

Release

[release] v0.2.6 (#3057) by Frank Lee

Doc

[doc] moved doc test command to bottom (#3075) by Frank Lee
[doc] specified operating system requirement (#3019) by Frank Lee
[doc] update nvme offload doc (#3014) by ver217
[doc] add ISC tutorial (#2997) by binmakeswell
[doc] add deepspeed citation and copyright (#2996) by ver217
[doc] added reference to related works (#2994) by Frank Lee
[doc] update news (#2983) by binmakeswell
[doc] fix chatgpt inference typo (#2964) by binmakeswell
[doc] add env scope (#2933) by binmakeswell
[doc] added readme for documentation (#2935) by Frank Lee
[doc] removed read-the-docs (#2932) by Frank Lee
[doc] update installation for GPT (#2922) by binmakeswell
[doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
[doc] fix GPT tutorial (#2860) by dawei-wang
[doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
[doc] update OPT serving (#2804) by binmakeswell
[doc] update example and OPT serving link (#2769) by binmakeswell
[doc] add opt service doc (#2747) by Frank Lee
[doc] fixed a typo in GPT readme (#2736) by cloudhuang
[doc] updated documentation version list (#2730) by Frank Lee

Workflow

[workflow] fixed doc build trigger condition (#3072) by Frank Lee
[workflow] supported conda package installation in doc test (#3028) by Frank Lee
[workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
[workflow] added auto doc test on PR (#2929) by Frank Lee
[workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

[booster] init module structure and definition (#3056) by Frank Lee

Example

[example] fix redundant note (#3065) by binmakeswell
[example] fixed opt model downloading from huggingface by Tomek
[example] add LoRA support (#2821) by Haofan Wang

Autochunk

[autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao

Chatgpt

[chatgpt] change critic input as state (#3042) by wenjunyang
[chatgpt] fix readme (#3025) by BlueRum
[chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
[chatgpt]fix inference model load (#2988) by BlueRum
[chatgpt] allow shard init and display warning (#2986) by ver217
[chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
[chatgpt] making experience support dp (#2971) by ver217
[chatgpt]fix lora bug (#2974) by BlueRum
[chatgpt] fix inference demo loading bug (#2969) by BlueRum
[ChatGPT] fix README (#2966) by Fazzie-Maqianli
[chatgpt]add inference example (#2944) by BlueRum
[chatgpt]support opt & gpt for rm training (#2876) by BlueRum
[chatgpt] Support saving ckpt in examples (#2846) by BlueRum
[chatgpt] fix rm eval (#2829) by BlueRum
[chatgpt] add test checkpoint (#2797) by ver217
[chatgpt] update readme about checkpoint (#2792) by ver217
[chatgpt] startegy add prepare method (#2766) by ver217
[chatgpt] disable shard init for colossalai (#2767) by ver217
[chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
[chatgpt]fix train_rm bug with lora (#2741) by BlueRum

Dtensor

[DTensor] refactor CommSpec (#3034) by YuliangLiu0306
[DTensor] refactor sharding spec (#2987) by YuliangLiu0306
[DTensor] implementation of dtensor (#2946) by YuliangLiu0306

Hotfix

[hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
[hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
[hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
[hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
[hotfix] fix chunk size can not be divided (#2867) by HELSON
Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
[hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
[hotfix] add correct device for fake_param (#2796) by HELSON

Revert] recover "[refactor

[revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]

Pipeline

[pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang

Fx

[fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel

Refactor

[refactor] restructure configuration files (#2977) by Saurav Maheshkar

Kernel

[kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Misc

[misc] add reference (#2930) by ver217

Autoparallel

[autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
[autoparallel] find repeat blocks (#2854) by YuliangLiu0306
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
[autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
[autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
[autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
[autoparallel] rotor solver refactor (#2813) by Boyuan Yao
[autoparallel] Patch meta information of torch.nn.Embedding (#2760) by Boyuan Yao
[autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306

Zero

[zero] trivial zero optimizer refactoring (#2869) by YH
[zero] fix wrong import (#2777) by Boyuan Yao

Cli

[cli] handled version check exceptions (#2848) by Frank Lee

Triton

[triton] added copyright information for flash attention (#2835) by Frank Lee

Nfc

[NFC] polish colossalai/engine/schedule/pipelineschedule.py code style (#2744) by Michelle
[NFC] polish code format by binmakeswell
[NFC] polish colossalai/autoparallel/tensorshard/deprecated/graph_analysis.py code style (#2737) by xyupeng
[NFC] polish colossalai/context/processgroupinitializer/initializer_2d.py code style (#2726) by Zirui Zhu
[NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/batchnorm_handler.py code style (#2728) by Zangwei Zheng
[NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)

Exmaple

[exmaple] add bert and albert (#2824) by Jiarui Fang

Ci/cd

[CI/CD] fix nightly release CD running on forked repo (#2812) by LuGY

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.6...v0.2.5

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.7 Release Today!

What's Changed

Release

[release] v0.2.7 (#3094) by Frank Lee
[release] v0.2.6 (#3057) by Frank Lee

Chatgpt

[chatgpt]add flag of action mask in critic(#3086) by Fazzie-Maqianli
[chatgpt] change critic input as state (#3042) by wenjunyang
[chatgpt] fix readme (#3025) by BlueRum
[chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
[chatgpt]fix inference model load (#2988) by BlueRum
[chatgpt] allow shard init and display warning (#2986) by ver217
[chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
[chatgpt] making experience support dp (#2971) by ver217
[chatgpt]fix lora bug (#2974) by BlueRum
[chatgpt] fix inference demo loading bug (#2969) by BlueRum
[ChatGPT] fix README (#2966) by Fazzie-Maqianli
[chatgpt]add inference example (#2944) by BlueRum
[chatgpt]support opt & gpt for rm training (#2876) by BlueRum
[chatgpt] Support saving ckpt in examples (#2846) by BlueRum
[chatgpt] fix rm eval (#2829) by BlueRum
[chatgpt] add test checkpoint (#2797) by ver217
[chatgpt] update readme about checkpoint (#2792) by ver217
[chatgpt] startegy add prepare method (#2766) by ver217
[chatgpt] disable shard init for colossalai (#2767) by ver217
[chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
[chatgpt]fix train_rm bug with lora (#2741) by BlueRum

Kernel

[kernel] added kernel loader to softmax autograd function (#3093) by Frank Lee
[kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Analyzer

[analyzer] a minimal implementation of static graph analyzer (#2852) by Super Daniel

Diffusers

[diffusers] fix ci and docker (#3085) by Fazzie-Maqianli

Doc

[doc] fixed typos in docs/README.md (#3082) by Frank Lee
[doc] moved doc test command to bottom (#3075) by Frank Lee
[doc] specified operating system requirement (#3019) by Frank Lee
[doc] update nvme offload doc (#3014) by ver217
[doc] add ISC tutorial (#2997) by binmakeswell
[doc] add deepspeed citation and copyright (#2996) by ver217
[doc] added reference to related works (#2994) by Frank Lee
[doc] update news (#2983) by binmakeswell
[doc] fix chatgpt inference typo (#2964) by binmakeswell
[doc] add env scope (#2933) by binmakeswell
[doc] added readme for documentation (#2935) by Frank Lee
[doc] removed read-the-docs (#2932) by Frank Lee
[doc] update installation for GPT (#2922) by binmakeswell
[doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
[doc] fix GPT tutorial (#2860) by dawei-wang
[doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
[doc] update OPT serving (#2804) by binmakeswell
[doc] update example and OPT serving link (#2769) by binmakeswell
[doc] add opt service doc (#2747) by Frank Lee
[doc] fixed a typo in GPT readme (#2736) by cloudhuang
[doc] updated documentation version list (#2730) by Frank Lee

Autochunk

[autochunk] support vit (#3084) by Xuanlei Zhao
[autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao

Dtensor

[DTensor] implement layout converter (#3055) by YuliangLiu0306
[DTensor] refactor CommSpec (#3034) by YuliangLiu0306
[DTensor] refactor sharding spec (#2987) by YuliangLiu0306
[DTensor] implementation of dtensor (#2946) by YuliangLiu0306

Workflow

[workflow] fixed doc build trigger condition (#3072) by Frank Lee
[workflow] supported conda package installation in doc test (#3028) by Frank Lee
[workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
[workflow] added auto doc test on PR (#2929) by Frank Lee
[workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

[booster] init module structure and definition (#3056) by Frank Lee

Example

[example] fix redundant note (#3065) by binmakeswell
[example] fixed opt model downloading from huggingface by Tomek
[example] add LoRA support (#2821) by Haofan Wang

Hotfix

[hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
[hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
[hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
[hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
[hotfix] fix chunk size can not be divided (#2867) by HELSON
Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
[hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
[hotfix] add correct device for fake_param (#2796) by HELSON

Revert] recover "[refactor

[revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]

Pipeline

[pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang

Fx

[fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel

Refactor

[refactor] restructure configuration files (#2977) by Saurav Maheshkar

Misc

[misc] add reference (#2930) by ver217

Autoparallel

[autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
[autoparallel] find repeat blocks (#2854) by YuliangLiu0306
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
[autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
[autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
[autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
[autoparallel] rotor solver refactor (#2813) by Boyuan Yao
[autoparallel] Patch meta information of torch.nn.Embedding (#2760) by Boyuan Yao
[autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306

Zero

[zero] trivial zero optimizer refactoring (#2869) by YH
[zero] fix wrong import (#2777) by Boyuan Yao

Cli

[cli] handled version check exceptions (#2848) by Frank Lee

Triton

[triton] added copyright information for flash attention (#2835) by Frank Lee

Nfc

[NFC] polish colossalai/engine/schedule/pipelineschedule.py code style (#2744) by Michelle
[NFC] polish code format by binmakeswell
[NFC] polish colossalai/autoparallel/tensorshard/deprecated/graph_analysis.py code style (#2737) by xyupeng
[NFC] polish colossalai/context/processgroupinitializer/initializer_2d.py code style (#2726) by Zirui Zhu
[NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/batchnorm_handler.py code style (#2728) by Zangwei Zheng
[NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)

Exmaple

[exmaple] add bert and albert (#2824) by Jiarui Fang

Ci/cd

[CI/CD] fix nightly release CD running on forked repo (#2812) by LuGY

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.7...v0.2.5

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.5 Release Today!

What's Changed

Chatgpt

[chatgpt] optimize generation kwargs (#2717) by ver217

Autoparallel

[autoparallel] add shard option (#2696) by YuliangLiu0306
[autoparallel] fix parameters sharding bug (#2716) by YuliangLiu0306
[autoparallel] refactor runtime pass (#2644) by YuliangLiu0306
[autoparallel] remove deprecated codes (#2664) by YuliangLiu0306
[autoparallel] test compatibility for gemini and auto parallel (#2700) by YuliangLiu0306

Doc

[doc] updated documentation version list (#2715) by Frank Lee
[doc] add open-source contribution invitation (#2714) by binmakeswell
[doc] add Quick Preview (#2706) by binmakeswell
[doc] resize figure (#2705) by binmakeswell
[doc] add ChatGPT (#2703) by binmakeswell

Devops

[devops] add chatgpt ci (#2713) by ver217

Workflow

[workflow] fixed tensor-nvme build caching (#2711) by Frank Lee

App

[app] fix ChatGPT requirements (#2704) by binmakeswell
[app] add chatgpt application (#2698) by ver217

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.5...v0.2.4

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.4 Release Today!

What's Changed

Release

[release] update version (#2691) by ver217

Doc

[doc] update auto parallel paper link (#2686) by binmakeswell
[doc] added documentation sidebar translation (#2670) by Frank Lee

Zero1&2

[zero1&2] only append parameters with gradients (#2681) by HELSON

Gemini

[gemini] fix coloinitcontext (#2683) by ver217
[gemini] add fakereleasechunk for keep-gathered chunk in the inference mode (#2671) by HELSON

Workflow

[workflow] fixed communtity report ranking (#2680) by Frank Lee
[workflow] added trigger to build doc upon release (#2678) by Frank Lee
[workflow] added doc build test (#2675) by Frank Lee

Autoparallel

[autoparallel] Patch meta information of torch.nn.functional.softmax and torch.nn.Softmax (#2674) by Boyuan Yao

Dooc

[dooc] fixed the sidebar itemm key (#2672) by Frank Lee

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.4...v0.2.3

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.3 Release Today!

What's Changed

Release

[release] v0.2.3 (#2669) by Frank Lee

Doc

[doc] add CVPR tutorial (#2666) by binmakeswell

Docs

[Docs] layout converting management (#2665) by YuliangLiu0306

Autoparallel

[autoparallel] Patch meta information of torch.nn.LayerNorm (#2647) by Boyuan Yao

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.3...v0.2.2

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.2 Release Today!

What's Changed

Release

[release] v0.2.2 (#2661) by Frank Lee

Workflow

[workflow] fixed gpu memory check condition (#2659) by Frank Lee
[workflow] fixed the test coverage report (#2614) by Frank Lee
[workflow] fixed test coverage report (#2611) by Frank Lee

Example

[example] Polish README.md (#2658) by Jiatong (Julius) Han

Doc

[doc] fixed compatiblity with docusaurus (#2657) by Frank Lee
[doc] added docusaurus-based version control (#2656) by Frank Lee
[doc] migrate the markdown files (#2652) by Frank Lee
[doc] fix typo of BLOOM (#2643) by binmakeswell
[doc] removed pre-built wheel installation from readme (#2637) by Frank Lee
[doc] updated the sphinx theme (#2635) by Frank Lee
[doc] fixed broken badge (#2623) by Frank Lee

Autoparallel

[autoparallel] refactor handlers which reshape input tensors (#2615) by YuliangLiu0306
[autoparallel] adapt autoparallel tests with latest api (#2626) by YuliangLiu0306
[autoparallel] Patch meta information of torch.matmul (#2584) by Boyuan Yao

Tutorial

[tutorial] added energonai to opt inference requirements (#2625) by Frank Lee
[tutorial] add video link (#2619) by binmakeswell

Autochunk

[autochunk] support diffusion for autochunk (#2621) by oahzxl

Build

[build] fixed the doc build process (#2618) by Frank Lee

Test

[test] fixed the triton version for testing (#2608) by Frank Lee

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.2...v0.2.1

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.1 Release Today!

What's Changed

Workflow

[workflow] fixed broken rellease workflows (#2604) by Frank Lee
[workflow] added cuda extension build test before release (#2598) by Frank Lee
[workflow] hooked pypi release with lark (#2596) by Frank Lee
[workflow] hooked docker release with lark (#2594) by Frank Lee
[workflow] added test-pypi check before release (#2591) by Frank Lee
[workflow] fixed the typo in the example check workflow (#2589) by Frank Lee
[workflow] hook compatibility test failure to lark (#2586) by Frank Lee
[workflow] hook example test alert with lark (#2585) by Frank Lee
[workflow] added notification if scheduled build fails (#2574) by Frank Lee
[workflow] added discussion stats to community report (#2572) by Frank Lee
[workflow] refactored compatibility test workflow for maintenability (#2560) by Frank Lee
[workflow] adjust the GPU memory threshold for scheduled unit test (#2558) by Frank Lee
[workflow] fixed example check workflow (#2554) by Frank Lee
[workflow] fixed typos in the leaderboard workflow (#2567) by Frank Lee
[workflow] added contributor and user-engagement report (#2564) by Frank Lee
[workflow] only report coverage for changed files (#2524) by Frank Lee
[workflow] fixed the precommit CI (#2525) by Frank Lee
[workflow] fixed changed file detection (#2515) by Frank Lee
[workflow] fixed the skip condition of example weekly check workflow (#2481) by Frank Lee
[workflow] automated bdist wheel build (#2459) by Frank Lee
[workflow] automated the compatiblity test (#2453) by Frank Lee
[workflow] fixed the on-merge condition check (#2452) by Frank Lee
[workflow] make test coverage report collapsable (#2436) by Frank Lee
[workflow] report test coverage even if below threshold (#2431) by Frank Lee
[workflow]auto comment with test coverage report (#2419) by Frank Lee
[workflow] auto comment if precommit check fails (#2417) by Frank Lee
[workflow] added translation for non-english comments (#2414) by Frank Lee
[workflow] added precommit check for code consistency (#2401) by Frank Lee
[workflow] refactored the example check workflow (#2411) by Frank Lee
[workflow] added nightly release to pypi (#2403) by Frank Lee
[workflow] added missing file change detection output (#2387) by Frank Lee
[workflow]New version: Create workflow files for examples' auto check (#2298) by ziyuhuang123
[workflow] fixed pypi release workflow error (#2328) by Frank Lee
[workflow] fixed pypi release workflow error (#2327) by Frank Lee
[workflow] added workflow to release to pypi upon version change (#2320) by Frank Lee
[workflow] removed unused assign reviewer workflow (#2318) by Frank Lee
[workflow] rebuild cuda kernels when kernel-related files change (#2317) by Frank Lee

Release

[release] v0.2.1 (#2602) by Frank Lee

Doc

[doc] updated readme for CI/CD (#2600) by Frank Lee
[doc] fixed issue link in pr template (#2577) by Frank Lee
[doc] updated the CHANGE_LOG.md for github release page (#2552) by Frank Lee
[doc] fixed the typo in pr template (#2556) by Frank Lee
[doc] added pull request template (#2550) by Frank Lee
[doc] update example link (#2520) by binmakeswell
[doc] update opt and tutorial links (#2509) by binmakeswell
[doc] added documentation for CI/CD (#2420) by Frank Lee
[doc] updated kernel-related optimisers' docstring (#2385) by Frank Lee
[doc] updated readme regarding pypi installation (#2406) by Frank Lee
[doc] hotfix #2377 by Jiarui Fang
[doc] hotfix #2377 by jiaruifang
[doc] update stable diffusion link (#2322) by binmakeswell
[doc] update diffusion doc (#2296) by binmakeswell
[doc] update news (#2295) by binmakeswell
[doc] update news by binmakeswell

Setup

[setup] fixed inconsistent version meta (#2578) by Frank Lee
[setup] refactored setup.py for dependency graph (#2413) by Frank Lee
[setup] support pre-build and jit-build of cuda kernels (#2374) by Frank Lee
[setup] make cuda extension build optional (#2336) by Frank Lee
[setup] remove torch dependency (#2333) by Frank Lee
[setup] removed the build dependency on colossalai (#2307) by Frank Lee

Tutorial

[tutorial] polish README (#2568) by binmakeswell
[tutorial] update fastfold tutorial (#2565) by oahzxl

Polish

[polish] polish ColoTensor and its submodules (#2537) by HELSON
[polish] polish code for getstatictorch_model (#2405) by HELSON

Kernel

[kernel] fixed repeated loading of kernels (#2549) by Frank Lee

Hotfix

[hotfix] fix zero ddp warmup check (#2545) by ver217
[hotfix] fix autoparallel demo (#2533) by YuliangLiu0306
[hotfix] fix lightning error (#2529) by HELSON
[hotfix] meta tensor default device. (#2510) by Super Daniel
[hotfix] gpt example titans bug #2493 (#2494) by Jiarui Fang
[hotfix] gpt example titans bug #2493 by jiaruifang
[hotfix] add norm clearing for the overflow step (#2416) by HELSON
[hotfix] add DISTPAN argument for benchmark (#2412) by HELSON
[hotfix] fix gpt gemini example (#2404) by HELSON
[hotfix] issue #2388 by Jiarui Fang
[hotfix] issue #2388 by jiaruifang
[hotfix] fix implement error in diffusers by Jiarui Fang
[hotfix] fix implement error in diffusers by 1SAA

Autochunk

[autochunk] add benchmark for transformer and alphafold (#2543) by oahzxl
[autochunk] support multi outputs chunk search (#2538) by oahzxl
[autochunk] support transformer (#2526) by oahzxl
[autochunk] support parsing blocks (#2506) by oahzxl
[autochunk] support autochunk on evoformer (#2497) by oahzxl
[autochunk] support evoformer tracer (#2485) by oahzxl
[autochunk] add autochunk feature by Jiarui Fang

Git

[git] remove invalid submodule (#2540) by binmakeswell

Gemini

[gemini] add profiler in the demo (#2534) by HELSON
[gemini] update the gpt example (#2527) by HELSON
[gemini] update ddp strict mode (#2518) by HELSON
[gemini] add get static torch model (#2356) by HELSON

Example

[example] Add fastfold tutorial (#2528) by LuGY
[example] update lightning dependency for stable diffusion (#2522) by Jiarui Fang
Merge pull request #2499 from feifeibear/dev0116_10 by Fazzie-Maqianli
[example] dreambooth example by jiaruifang
[example] fix requirements (#2488) by binmakeswell
[example] titans for gpt (#2484) by Jiarui Fang
[example] titans for gpt by jiaruifang
[example] stable diffusion add roadmap (#2482) by Jiarui Fang
[example] stable diffusion add roadmap by jiaruifang
[example] update gpt gemini example ci test (#2477) by ver217
[example] integrate seq-parallel tutorial with CI (#2463) by Frank Lee
[example] update vit ci script (#2469) by ver217
[example] integrate autoparallel demo with CI (#2466) by Frank Lee
[example] fixed seed error in traindreamboothcolossalai.py (#2445) by Haofan Wang
[example] updated large-batch optimizer tutorial (#2448) by Frank Lee
[example] updated the hybrid parallel tutorial (#2444) by Frank Lee
[example] improved the clarity yof the example readme (#2427) by Frank Lee
[example] removed duplicated stable diffusion example (#2424) by Frank Lee
[example] gpt, shard init on all processes (#2366) by Jiarui Fang
[example] upload auto parallel gpt2 demo (#2354) by YuliangLiu0306
[example] add google doc for benchmark results of GPT (#2355) by Jiarui Fang
[example] make gpt example directory more clear (#2353) by Jiarui Fang
[example] simplify opt example (#2344) by Jiarui Fang
[example] add example requirement (#2345) by binmakeswell
[example] diffusion update diffusion,Dreamblooth (#2329) by Fazzie-Maqianli
[example] update diffusion readme with official lightning (#2304) by Jiarui Fang
[example] update gemini benchmark bash (#2306) by HELSON

Zero

[zero] add zero wrappers (#2523) by HELSON
[zero] fix gradient clipping in hybrid parallelism (#2521) by HELSON
[zero] add strict ddp mode (#2508) by HELSON
[zero] add unit testings for hybrid parallelism (#2486) by HELSON
[zero] add unit test for low-level zero init (#2474) by HELSON
[zero] polish low level optimizer (#2473) by HELSON
[zero] low level optim supports ProcessGroup (#2464) by Jiarui Fang
[zero] add warning for ignored parameters (#2446) by HELSON
[zero] fix statedict and loadstate_dict for ddp ignored parameters (#2443) by HELSON
[zero] add inference mode and its unit test (#2418) by HELSON

Autoparallel

[autoparallel] accelerate gpt2 training (#2495) by YuliangLiu0306
[autoparallel] support origin activation ckpt on autoprallel system (#2468) by YuliangLiu0306
[autoparallel] update binary elementwise handler (#2451) by YuliangLiu0306
[autoparallel] integrate device mesh initialization into autoparallelize (#2393) by YuliangLiu0306
[autoparallel] add shard option (#2423) by YuliangLiu0306
[autoparallel] bypass MetaInfo when unavailable and modify BCASTFUNCOP metainfo (#2293) by Boyuan Yao

Utils

[utils] lazy init. (#2148) by Super Daniel

Auto-chunk

[auto-chunk] support extramsa (#3) (#2504) by oahzxl

Fx

[fx] allow control of ckpt_codegen init (#2498) by oahzxl
[fx] allow native ckpt trace and codegen. (#2438) by Super Daniel

Ci

[CI] add test_ci.sh for palm, opt and gpt (#2475) by Jiarui Fang

Cli

[cli] fixed hostname mismatch error (#2465) by Frank Lee
[cli] provided more details if colossalai run fail (#2442) by Frank Lee
[cli] updated installation check cli for aot/jit build (#2395) by Frank Lee

Examples

[examples] update autoparallel tutorial demo (#2449) by YuliangLiu0306
[examples] adding tflops to PaLM (#2365) by ZijianYY
[examples]adding tp to PaLM (#2319) by ZijianYY
[exmaple] fix dreamblooth format (#2315) by Fazzie-Maqianli

Ddp

[ddp] add isddpignored (#2434) by HELSON

Docker

[docker] updated Dockerfile and release workflow (#2410) by Frank Lee

Worfklow

[worfklow] added coverage test (#2399) by Frank Lee

Device

[device] find best logical mesh by Jiarui Fang
[device] find best logical mesh by YuliangLiu0306
[device] alpha beta profiler (#2311) by YuliangLiu0306

Pipeline

[Pipeline] Refine GPT PP Example by Jiarui Fang

Builder

[builder] correct readme (#2375) by Jiarui Fang
[builder] reconfig op_builder for pypi install (#2314) by Jiarui Fang
[builder] MOE builder (#2277) by Jiarui Fang

Auto-parallel

[auto-parallel] refactoring ColoTracer (#2118) by Zihao

Amp

[amp] add gradient clipping for unit tests (#2283) by HELSON

Autockpt

Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel by Boyuan Yao

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.1...v0.2.0

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.2.0 Release Today!

What's Changed

Version

[version] 0.1.14 -> 0.2.0 (#2286) by Jiarui Fang

Examples

[examples] using args and combining two versions for PaLM (#2284) by ZijianYY
[examples] replace einsum with matmul (#2210) by ZijianYY

Doc

[doc] add feature diffusion v2, bloom, auto-parallel (#2282) by binmakeswell
[doc] updated the stable diffussion on docker usage (#2244) by Frank Lee

Zero

[zero] polish low level zero optimizer (#2275) by HELSON
[zero] fix error for BEiT models (#2169) by HELSON

Example

[example] add benchmark (#2276) by Ziyue Jiang
[example] fix save_load bug for dreambooth (#2280) by BlueRum
[example] GPT polish readme (#2274) by Jiarui Fang
[example] fix gpt example with 0.1.10 (#2265) by HELSON
[example] clear diffuser image (#2262) by Fazzie-Maqianli
[example] diffusion install from docker (#2239) by Jiarui Fang
[example] fix benchmark.sh for gpt example (#2229) by HELSON
[example] make palm + GeminiDPP work (#2227) by Jiarui Fang
[example] Palm adding gemini, still has bugs (#2221) by ZijianYY
[example] update gpt example (#2225) by HELSON
[example] add benchmark.sh for gpt (#2226) by Jiarui Fang
[example] update gpt benchmark (#2219) by HELSON
[example] update GPT example benchmark results (#2212) by Jiarui Fang
[example] update gpt example for larger model scale (#2211) by Jiarui Fang
[example] update gpt readme with performance (#2206) by Jiarui Fang
[example] polish doc (#2201) by ziyuhuang123
[example] Change some training settings for diffusion (#2195) by BlueRum
[example] support Dreamblooth (#2188) by Fazzie-Maqianli
[example] gpt demo more accuracy tflops (#2178) by Jiarui Fang
[example] add palm pytorch version (#2172) by Jiarui Fang
[example] update vit readme (#2155) by Jiarui Fang
[example] add zero1, zero2 example in GPT examples (#2146) by HELSON

Hotfix

[hotfix] fix fp16 optimzier bug (#2273) by YuliangLiu0306
[hotfix] fix error for torch 2.0 (#2243) by xcnick
[hotfix] Fixing the bug related to ipv6 support by Tongping Liu
[hotfix] correcnt cpu_optim runtime compilation (#2197) by Jiarui Fang
[hotfix] add kwargs for colo_addmm (#2171) by Tongping Liu
[hotfix] Jit type hint #2161 (#2164) by アマデウス
[hotfix] fix auto policy of testshardedoptim_v2 (#2157) by Jiarui Fang
[hotfix] fix aten default bug (#2158) by YuliangLiu0306

Autoparallel

[autoparallel] fix spelling error (#2270) by YuliangLiu0306
[autoparallel] gpt2 autoparallel examples (#2267) by YuliangLiu0306
[autoparallel] patch torch.flatten metainfo for autoparallel (#2247) by Boyuan Yao
[autoparallel] autoparallel initialize (#2238) by YuliangLiu0306
[autoparallel] fix construct meta info. (#2245) by Super Daniel
[autoparallel] record parameter attribute in colotracer (#2217) by YuliangLiu0306
[autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162) by Boyuan Yao
[autoparallel] new metainfoprop based on metainfo class (#2179) by Boyuan Yao
[autoparallel] update getitem handler (#2207) by YuliangLiu0306
[autoparallel] updategetattrhandler (#2193) by YuliangLiu0306
[autoparallel] add gpt2 performance test code (#2194) by YuliangLiu0306
[autoparallel] integrategptrelated_tests (#2134) by YuliangLiu0306
[autoparallel] memory estimation for shape consistency (#2144) by Boyuan Yao
[autoparallel] use metainfo in handler (#2149) by YuliangLiu0306

Gemini

[Gemini] fix the converttotorch_module bug (#2269) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232) by Ziyue Jiang

Builder

[builder] builder for scaleduppertriangmaskedsoftmax (#2234) by Jiarui Fang
[builder] polish builder with better base class (#2216) by Jiarui Fang
[builder] raise Error when CUDA_HOME is not set (#2213) by Jiarui Fang
[builder] multihead attn runtime building (#2203) by Jiarui Fang
[builder] unified cpuoptim fusedoptim inferface (#2190) by Jiarui Fang
[builder] use runtime builder for fused_optim (#2189) by Jiarui Fang
[builder] runtime adam and fused_optim builder (#2184) by Jiarui Fang
[builder] use builder() for cpu adam and fused optim in setup.py (#2187) by Jiarui Fang

Logger

[logger] hotfix, missing _FORMAT (#2231) by Super Daniel

Diffusion

[diffusion] update readme (#2214) by HELSON

Testing

[testing] add beit model for unit testings (#2196) by HELSON

NFC

[NFC] fix some typos' (#2175) by ziyuhuang123
[NFC] update news link (#2191) by binmakeswell
[NFC] fix a typo 'stable-diffusion-typo-fine-tune' by Arsmart1

Exmaple

[exmaple] diffuser, support quant inference for stable diffusion (#2186) by BlueRum
[exmaple] add vit missing functions (#2154) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware ] Fix deadlock when nummicrobatch=numstage (#2156) by Ziyue Jiang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.0...v0.1.13

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.13 Release Today!

What's Changed

Version

[version] 0.1.13 (#2152) by Jiarui Fang
Revert "[version] version to v0.1.13 (#2139)" (#2153) by Jiarui Fang
[version] version to v0.1.13 (#2139) by Jiarui Fang

Gemini

[Gemini] GeminiDPP convert to PyTorch Module. (#2151) by Jiarui Fang
[Gemini] Update coloinitctx to support metatensor (#2147) by BlueRum
[Gemini] revert ZeROInitCtx related tracer (#2138) by Jiarui Fang
[Gemini] update API of the chunkmemstatscollector. (#2129) by Jiarui Fang
[Gemini] update the non model data record method in runtime memory tracer (#2128) by Jiarui Fang
[Gemini] test step-tensor mapping using repeatedcomputedlayers.py (#2127) by Jiarui Fang
[Gemini] update non model data calculation method (#2126) by Jiarui Fang
[Gemini] hotfix the unittest bugs (#2125) by Jiarui Fang
[Gemini] mapping of preop timestep and param (#2124) by Jiarui Fang
[Gemini] chunk init using runtime visited param order (#2115) by Jiarui Fang
[Gemini] chunk init use OrderedParamGenerator (#2110) by Jiarui Fang

Nfc

[NFC] remove useless graph node code (#2150) by Jiarui Fang
[NFC] update chunk manager API (#2119) by Jiarui Fang
[NFC] polish comments for Chunk class (#2116) by Jiarui Fang

Autoparallel

[autoparallel] process size nodes in runtime pass (#2130) by YuliangLiu0306
[autoparallel] implement softmax handler (#2132) by YuliangLiu0306
[autoparallel] gpt2lp runtimee test (#2113) by YuliangLiu0306

Example

Merge pull request #2120 from Fazziekey/example/stablediffusion-v2 by Fazzie-Maqianli

Optimizer

[optimizer] add div_scale for optimizers (#2117) by HELSON

Pp middleware

[PP Middleware] Add bwd and step for PP middleware (#2111) by Ziyue Jiang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.13...v0.1.12

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.12 Release Today!

What's Changed

Zero

[zero] add L2 gradient clipping for ZeRO (#2112) by HELSON

Gemini

[gemini] get the param visited order during runtime (#2108) by Jiarui Fang
[Gemini] NFC, polish searchchunkconfiguration (#2107) by Jiarui Fang
[Gemini] gemini use the runtime memory tracer (RMT) (#2099) by Jiarui Fang
[Gemini] make RuntimeMemTracer work correctly (#2096) by Jiarui Fang
[Gemini] remove eval in gemini unittests! (#2092) by Jiarui Fang
[Gemini] remove GLOBALMODELDATA_TRACER (#2091) by Jiarui Fang
[Gemini] remove GLOBALCUDAMEM_INFO (#2090) by Jiarui Fang
[Gemini] use MemStats in Runtime Memory tracer (#2088) by Jiarui Fang
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) by Jiarui Fang
[Gemini] remove static tracer (#2083) by Jiarui Fang
[Gemini] ParamOpHook -> ColoParamOpHook (#2080) by Jiarui Fang
[Gemini] polish runtime tracer tests (#2077) by Jiarui Fang
[Gemini] rename hooks related to runtime mem tracer (#2076) by Jiarui Fang
[Gemini] add albert in test models. (#2075) by Jiarui Fang
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) by Jiarui Fang
[Gemini] remove not used MemtracerWrapper (#2072) by Jiarui Fang
[Gemini] fix grad unreleased issue and param recovery issue (#2052) by Zihao

Hotfix

[hotfix] fix a type in ColoInitContext (#2106) by Jiarui Fang
[hotfix] update test for latest version (#2060) by YuliangLiu0306
[hotfix] skip gpt tracing test (#2064) by YuliangLiu0306

Colotensor

[ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105) by Jiarui Fang

Autoparallel

[autoparallel] support linear function bias addition (#2104) by YuliangLiu0306
[autoparallel] support addbmm computation (#2102) by YuliangLiu0306
[autoparallel] add sum handler (#2101) by YuliangLiu0306
[autoparallel] add bias addtion function class (#2098) by YuliangLiu0306
[autoparallel] complete gpt related module search (#2097) by YuliangLiu0306
[autoparallel]add embedding handler (#2089) by YuliangLiu0306
[autoparallel] add tensor constructor handler (#2082) by YuliangLiu0306
[autoparallel] add non_split linear strategy (#2078) by YuliangLiu0306
[autoparallel] Add F.conv metainfo (#2069) by Boyuan Yao
[autoparallel] complete gpt block searching (#2065) by YuliangLiu0306
[autoparallel] add binary elementwise metainfo for auto parallel (#2058) by Boyuan Yao
[autoparallel] fix forward memory calculation (#2062) by Boyuan Yao
[autoparallel] adapt solver with self attention (#2037) by YuliangLiu0306

Version

[version] 0.1.11rc5 -> 0.1.12 (#2103) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) by Ziyue Jiang
[Pipeline Middleware] Adapt scheduler for Topo (#2066) by Ziyue Jiang

Fx

[fx] An experimental version of ColoTracer.' (#2002) by Super Daniel

Example

[example] update GPT README (#2095) by ZijianYY

Device

[device] update flatten device mesh usage (#2079) by YuliangLiu0306

Test

[test] bert test in non-distributed way (#2074) by Jiarui Fang

Pipeline

[Pipeline] Add Topo Class (#2059) by Ziyue Jiang

Examples

[examples] update autoparallel demo (#2061) by YuliangLiu0306

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.12...v0.1.11rc5

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.11rc5 Release Today!

What's Changed

Release

[release] update to 0.1.11rc5 (#2053) by Frank Lee

Cli

[cli] updated installation cheheck with more inforamtion (#2050) by Frank Lee

Gemini

[gemini] fix init bugs for modules (#2047) by HELSON
[gemini] add arguments (#2046) by HELSON
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) by Zihao
[Gemini] more tests for Gemini (#2038) by Jiarui Fang
[Gemini] more rigorous unit tests for runfwdbwd (#2034) by Jiarui Fang
[Gemini] paramWrapper paramTracerHook unitest (#2030) by Zihao
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) by Jiarui Fang
[gemini] paramtracehook (#2020) by Zihao
[Gemini] add unitests to check gemini correctness (#2015) by Jiarui Fang
[Gemini] ParamMemHook (#2008) by Zihao
[Gemini] paramtracerwrapper and test case (#2009) by Zihao

Setup

[setup] supported conda-installed torch (#2048) by Frank Lee

Test

[test] align model name with the file name. (#2045) by Jiarui Fang

Hotfix

[hotfix] hotfix Gemini for no leaf modules bug (#2043) by Jiarui Fang
[hotfix] add bert test for gemini fwd bwd (#2035) by Jiarui Fang
[hotfix] revert bug PRs (#2016) by Jiarui Fang

Zero

[zero] fix testing parameters (#2042) by HELSON
[zero] fix unit-tests (#2039) by HELSON
[zero] test gradient accumulation (#1964) by HELSON

Testing

[testing] fix testing models (#2036) by HELSON

Rpc

[rpc] split with dag (#2028) by Ziyue Jiang

Autoparallel

[autoparallel] add split handler (#2032) by YuliangLiu0306
[autoparallel] add experimental permute handler (#2029) by YuliangLiu0306
[autoparallel] add runtime pass and numerical test for view handler (#2018) by YuliangLiu0306
[autoparallel] add experimental view handler (#2011) by YuliangLiu0306
[autoparallel] mix gather (#1977) by Genghan Zhang

Fx

[fx]Split partition with DAG information (#2025) by Ziyue Jiang

Github

[GitHub] update issue template (#2023) by binmakeswell

Workflow

[workflow] removed unused pypi release workflow (#2022) by Frank Lee

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc5...v0.1.11rc4

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.11rc4 Release Today!

What's Changed

Workflow

[workflow] fixed the python and cpu arch mismatch (#2010) by Frank Lee
[workflow] fixed the typo in condarc (#2006) by Frank Lee
[workflow] added conda cache and fixed no-compilation bug in release (#2005) by Frank Lee

Gemini

[Gemini] add an inlineopmodule to common test models and polish unitests. (#2004) by Jiarui Fang
[Gemini] open grad checkpoint when model building (#1984) by Jiarui Fang
[Gemini] add bert for MemtracerWrapper unintests (#1982) by Jiarui Fang
[Gemini] MemtracerWrapper unittests (#1981) by Jiarui Fang
[Gemini] memory trace hook (#1978) by Jiarui Fang
[Gemini] independent runtime tracer (#1974) by Jiarui Fang
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) by Jiarui Fang
[Gemini] clean no used MemTraceOp (#1970) by Jiarui Fang
[Gemini] polish memstats collector (#1962) by Jiarui Fang
[Gemini] add GeminiAdamOptimizer (#1960) by Jiarui Fang

Autoparallel

[autoparallel] Add metainfo support for F.linear (#1987) by Boyuan Yao
[autoparallel] use pytree map style to process data (#1989) by YuliangLiu0306
[autoparallel] adapt handlers with attention block (#1990) by YuliangLiu0306
[autoparallel] support more flexible data type (#1967) by YuliangLiu0306
[autoparallel] add pooling metainfo (#1968) by Boyuan Yao
[autoparallel] support distributed dataloader option (#1906) by YuliangLiu0306
[autoparallel] Add alpha beta (#1973) by Genghan Zhang
[autoparallel] add torch.nn.ReLU metainfo (#1868) by Boyuan Yao
[autoparallel] support addmm in tracer and solver (#1961) by YuliangLiu0306
[autoparallel] remove redundancy comm node (#1893) by YuliangLiu0306

Fx

[fx] add more meta_registry for MetaTensor execution. (#2000) by Super Daniel

Hotfix

[hotfix] make Gemini work for conv DNN (#1998) by Jiarui Fang

Example

[example] add diffusion inference (#1986) by Fazzie-Maqianli
[example] enhance GPT demo (#1959) by Jiarui Fang
[example] add vit (#1942) by Jiarui Fang

Kernel

[kernel] move all symlinks of kernel to colossalai._C (#1971) by ver217

Polish

[polish] remove useless file memtracer_hook.py (#1963) by Jiarui Fang

Zero

[zero] fix memory leak for zero2 (#1955) by HELSON

Colotensor

[ColoTensor] reconfig ColoInitContext, decouple defaultpg and defaultdist_spec. (#1953) by Jiarui Fang
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) by Jiarui Fang

Tutorial

[tutorial] polish all README (#1946) by binmakeswell
[tutorial] added missing dummy dataloader (#1944) by Frank Lee
[tutorial] fixed pipeline bug for sequence parallel (#1943) by Frank Lee

Tensorparallel

[tensorparallel] fixed tp layers (#1938) by アマデウス

Sc demo

[sc demo] add requirements to spmd README (#1941) by YuliangLiu0306

Sc

[SC] remove redundant hands on (#1939) by Boyuan Yao

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc4...v0.1.11rc3

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.11rc3 Release Today!

What's Changed

Release

[release] update version (#1931) by ver217

Tutorial

[tutorial] polish README and OPT files (#1930) by binmakeswell
[tutorial] add synthetic dataset for opt (#1924) by ver217
[tutorial] updated hybrid parallel readme (#1928) by Frank Lee
[tutorial] added synthetic data for sequence parallel (#1927) by Frank Lee
[tutorial] removed huggingface model warning (#1925) by Frank Lee
Hotfix/tutorial readme index (#1922) by Frank Lee
[tutorial] modify hands-on of auto activation checkpoint (#1920) by Boyuan Yao
[tutorial] added synthetic data for hybrid parallel (#1921) by Frank Lee
[tutorial] added synthetic data for hybrid parallel (#1919) by Frank Lee
[tutorial] added synthetic dataset for auto parallel demo (#1918) by Frank Lee
[tutorial] updated auto parallel demo with latest data path (#1917) by Frank Lee
[tutorial] added data script and updated readme (#1916) by Frank Lee
[tutorial] add cifar10 for diffusion (#1907) by binmakeswell
[tutorial] removed duplicated tutorials (#1904) by Frank Lee
[tutorial] edited hands-on practices (#1899) by BoxiangW

Example

[example] update auto_parallel img path (#1910) by binmakeswell
[example] add cifar10 dadaset for diffusion (#1902) by Fazzie-Maqianli
[example] migrate diffusion and auto_parallel hands-on (#1871) by binmakeswell
[example] initialize tutorial (#1865) by binmakeswell
Merge pull request #1842 from feifeibear/jiarui/polish by Fazzie-Maqianli
[example] polish diffusion readme by jiaruifang

Sc

[SC] add GPT example for auto checkpoint (#1889) by Boyuan Yao
[sc] add examples for auto checkpoint. (#1880) by Super Daniel

Nfc

[NFC] polish colossalai/amp/naiveamp/init_.py code style (#1905) by Junming Wu
[NFC] remove redundant dependency (#1869) by binmakeswell
[NFC] polish .github/workflows/scripts/buildcolossalaiwheel.py code style (#1856) by yuxuan-lou
[NFC] polish .github/workflows/scripts/generatereleasedraft.py code style (#1855) by Ofey Chan
[NFC] polish workflows code style (#1854) by Kai Wang (Victor Kai)
[NFC] polish colossalai/amp/apexamp/init_.py code style (#1853) by LuGY
[NFC] polish .readthedocs.yaml code style (#1852) by nuszzh
[NFC] polish <.github/workflows/release_nightly.yml> code style (#1851) by RichardoLuo
[NFC] polish amp.naiveamp.gradscaler code style by zbian
[NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/operatorhandler.py code style (#1845) by HELSON
[NFC] polish ./colossalai/amp/torchamp/init_.py code style (#1836) by Genghan Zhang
[NFC] polish .github/workflows/build.yml code style (#1837) by xyupeng
[NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/convhandler.py code style (#1829) by Sze-qq
[NFC] polish colossalai/amp/torchamp/grad_scaler.py code style (#1823) by Ziyue Jiang
[NFC] polish .github/workflows/release_docker.yml code style by Maruyama_Aya
[NFC] polish .github/workflows/submodule.yml code style (#1822) by shenggan
[NFC] polish .github/workflows/draftgithubrelease_post.yml code style (#1820) by Arsmart1
[NFC] polish colossalai/amp/naiveamp/fp16_optimizer.py code style (#1819) by Fazzie-Maqianli
[NFC] polish colossalai/amp/naiveamp/utils.py code style (#1816) by CsRic
[NFC] polish .github/workflows/buildgpu8.yml code style (#1813) by Zangwei Zheng
[NFC] polish MANIFEST.in code style (#1814) by Zirui Zhu
[NFC] polish strategies_constructor.py code style (#1806) by binmakeswell

Doc

[doc] add news (#1901) by binmakeswell

Zero

[zero] migrate zero1&2 (#1878) by HELSON

Autoparallel

[autoparallel] user-friendly API for CheckpointSolver. (#1879) by Super Daniel
[autoparallel] fix linear logical convert issue (#1857) by YuliangLiu0306

Fx

[fx] metainfo_trace as an API. (#1873) by Super Daniel

Hotfix

[hotfix] pass testcompleteworkflow (#1877) by Jiarui Fang

Inference

[inference] overlap comm and compute in Linear1DRow when streamchunk_num > 1 (#1876) by Jiarui Fang
[inference] streaming Linear 1D Row inference (#1874) by Jiarui Fang

Amp

[amp] add torch amp test (#1860) by xcnick

Diffusion

[diffusion] fix package conflicts (#1875) by HELSON

Utils

[utils] fixed lazy init context (#1867) by Frank Lee
[utils] remove lazymemoryallocate from ColoInitContext (#1844) by Jiarui Fang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc3...v0.1.11rc2

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.11rc2 Release Today!

What's Changed

Autoparallel

[autoparallel] fix bugs caused by negative dim key (#1808) by YuliangLiu0306
[autoparallel] fix bias addition module (#1800) by YuliangLiu0306
[autoparallel] add batch norm metainfo (#1815) by Boyuan Yao
[autoparallel] add conv metainfo class for auto parallel (#1796) by Boyuan Yao
[autoparallel]add essential CommActions for broadcast oprands (#1793) by YuliangLiu0306
[autoparallel] refactor and add rotorc. (#1789) by Super Daniel
[autoparallel] add getattr handler (#1767) by YuliangLiu0306
[autoparallel] added matmul handler (#1763) by Frank Lee
[autoparallel] fix conv handler numerical test (#1771) by YuliangLiu0306
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764) by Super Daniel
[autoparallel] add numerical test for handlers (#1769) by YuliangLiu0306
[autoparallel] update CommSpec to CommActions (#1768) by YuliangLiu0306
[autoparallel] add numerical test for node strategies (#1760) by YuliangLiu0306
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757) by YuliangLiu0306
[autoparallel] added binary elementwise node handler (#1758) by Frank Lee
[autoparallel] fix param hook issue in transform pass (#1755) by YuliangLiu0306
[autoparallel] added addbmm handler (#1751) by Frank Lee
[autoparallel] shard param and buffer as expected (#1753) by YuliangLiu0306
[autoparallel] add sequential order to communication actions (#1735) by YuliangLiu0306
[autoparallel] recovered skipped test cases (#1748) by Frank Lee
[autoparallel] fixed wrong sharding strategy in conv handler (#1747) by Frank Lee
[autoparallel] fixed wrong generated strategy for dot op (#1746) by Frank Lee
[autoparallel] handled illegal sharding strategy in shape consistency (#1744) by Frank Lee
[autoparallel] handled illegal strategy in node handler (#1743) by Frank Lee
[autoparallel] handled illegal sharding strategy (#1728) by Frank Lee

Kernel

[kernel] added jit warmup (#1792) by アマデウス
[kernel] more flexible flashatt interface (#1804) by oahzxl
[kernel] skip tests of flash_attn and triton when they are not available (#1798) by Jiarui Fang

Gemini

[Gemini] make gemini usage simple (#1821) by Jiarui Fang

Checkpointio

[CheckpointIO] a uniform checkpoint I/O module (#1689) by ver217

Doc

[doc] polish diffusion README (#1840) by binmakeswell
[doc] remove obsolete API demo (#1833) by binmakeswell
[doc] add diffusion (#1827) by binmakeswell
[doc] add FastFold (#1766) by binmakeswell

Example

[example] remove useless readme in diffusion (#1831) by Jiarui Fang
[example] add TP to GPT example (#1828) by Jiarui Fang
[example] add stable diffuser (#1825) by Fazzie-Maqianli
[example] simplify the GPT2 huggingface example (#1826) by Jiarui Fang
[example] opt does not depend on Titans (#1811) by Jiarui Fang
[example] add GPT by Jiarui Fang
[example] add opt model in lauguage (#1809) by Jiarui Fang
[example] add diffusion to example (#1805) by Jiarui Fang

Nfc

[NFC] update gitignore remove DS_Store (#1830) by Jiarui Fang
[NFC] polish type hint for shape consistency (#1801) by Jiarui Fang
[NFC] polish tests/testlayers/test3d/test_3d.py code style (#1740) by Ziheng Qin
[NFC] polish tests/testlayers/test3d/checks_3d/common.py code style (#1733) by lucasliunju
[NFC] polish colossalai/nn/metric/_utils.py code style (#1727) by Sze-qq
[NFC] polish tests/testlayers/test3d/checks3d/checklayer_3d.py code style (#1731) by Xue Fuzhao
[NFC] polish tests/testlayers/testsequence/checksseq/checklayer_seq.py code style (#1723) by xyupeng
[NFC] polish accuracy_2d.py code style (#1719) by Ofey Chan
[NFC] polish .github/workflows/scripts/buildcolossalaiwheel.py code style (#1721) by Arsmart1
[NFC] polish checkpointhook.py code style (#1722) by LuGY
[NFC] polish test2p5d/checks2p5d/checkoperation2p5d.py code style (#1718) by Kai Wang (Victor Kai)
[NFC] polish colossalai/zero/shardedparam/init_.py code style (#1717) by CsRic
[NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) by yuxuan-lou
[NFC] polish tests/testlayers/test2d/checks2d/checkoperation_2d.py code style (#1715) by binmakeswell
[NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) by shenggan

Fx

[fx] add a symbolic_trace api. (#1812) by Super Daniel
[fx] skip diffusers unitest if it is not installed (#1799) by Jiarui Fang
[fx] Add linear metainfo class for auto parallel (#1783) by Boyuan Yao
[fx] support module with bias addition (#1780) by YuliangLiu0306
[fx] refactor memory utils and extend shard utils. (#1754) by Super Daniel
[fx] test tracer on diffuser modules. (#1750) by Super Daniel

Hotfix

[hotfix] fix build error when torch version >= 1.13 (#1803) by xcnick
[hotfix] polish flash attention (#1802) by oahzxl
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786) by HELSON
[hotfix] polish chunk import (#1787) by Jiarui Fang
[hotfix] autoparallel unit test (#1752) by YuliangLiu0306

Pipeline

[Pipeline]Adapt to Pipelinable OPT (#1782) by Ziyue Jiang

Ci

[CI] downgrade fbgemm. (#1778) by Super Daniel

Compatibility

[compatibility] ChunkMgr import error (#1772) by Jiarui Fang

Feat

[feat] add flash attention (#1762) by oahzxl

Fx/profiler

[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730) by Super Daniel

Workflow

[workflow] handled the git directory ownership error (#1741) by Frank Lee

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc2...v0.1.11rc1

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.11rc1 Release Today!

What's Changed

Hotfix

[hotfix] resharding cost issue (#1742) by YuliangLiu0306
[hotfix] solver bug caused by dict type comm cost (#1686) by YuliangLiu0306
[hotfix] fix wrong type name in profiler (#1678) by Boyuan Yao
[hotfix]unit test (#1670) by YuliangLiu0306
[hotfix] add recompile after graph manipulatation (#1621) by YuliangLiu0306
[hotfix] got sliced types (#1614) by YuliangLiu0306

Release

[release] update to v0.1.11 (#1736) by Frank Lee

Doc

[doc] update recommendation system catalogue (#1732) by binmakeswell
[doc] update recommedation system urls (#1725) by Jiarui Fang

Zero

[zero] add chunk init function for users (#1729) by HELSON
[zero] add constant placement policy (#1705) by HELSON

Pre-commit

[pre-commit] update pre-commit (#1726) by HELSON

Autoparallel

[autoparallel] runtimebackwardapply (#1720) by YuliangLiu0306
[autoparallel] moved tests to testtensorshard (#1713) by Frank Lee
[autoparallel] resnet block runtime apply (#1709) by YuliangLiu0306
[autoparallel] fixed broken node handler tests (#1708) by Frank Lee
[autoparallel] refactored the autoparallel module for organization (#1706) by Frank Lee
[autoparallel] adapt runtime passes (#1703) by YuliangLiu0306
[autoparallel] collated all deprecated files (#1700) by Frank Lee
[autoparallel] init new folder structure (#1696) by Frank Lee
[autoparallel] adapt solver and CostGraph with new handler (#1695) by YuliangLiu0306
[autoparallel] add output handler and placeholder handler (#1694) by YuliangLiu0306
[autoparallel] add pooling handler (#1690) by YuliangLiu0306
[autoparallel] wherehandlerv2 (#1688) by YuliangLiu0306
[autoparallel] fix C version rotor inconsistency (#1691) by Boyuan Yao
[autoparallel] added sharding spec conversion for linear handler (#1687) by Frank Lee
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) by YuliangLiu0306
[autoparallel] add unary element wise handler v2 (#1674) by YuliangLiu0306
[autoparallel] add following node generator (#1673) by YuliangLiu0306
[autoparallel] add layer norm handler v2 (#1671) by YuliangLiu0306
[autoparallel] fix insecure subprocess (#1680) by Boyuan Yao
[autoparallel] add rotor C version (#1658) by Boyuan Yao
[autoparallel] added utils for broadcast operation (#1665) by Frank Lee
[autoparallel] update CommSpec (#1667) by YuliangLiu0306
[autoparallel] added bias comm spec to matmul strategy (#1664) by Frank Lee
[autoparallel] add batch norm handler v2 (#1666) by YuliangLiu0306
[autoparallel] remove no strategy nodes (#1652) by YuliangLiu0306
[autoparallel] added compute resharding costs for node handler (#1662) by Frank Lee
[autoparallel] added new strategy constructor template (#1661) by Frank Lee
[autoparallel] added node handler for bmm (#1655) by Frank Lee
[autoparallel] add conv handler v2 (#1663) by YuliangLiu0306
[autoparallel] adapt solver with gpt (#1653) by YuliangLiu0306
[autoparallel] implemented all matmul strategy generator (#1650) by Frank Lee
[autoparallel] change the following nodes strategies generation logic (#1636) by YuliangLiu0306
[autoparallel] where handler (#1651) by YuliangLiu0306
[autoparallel] implemented linear projection strategy generator (#1639) by Frank Lee
[autoparallel] adapt solver with mlp (#1638) by YuliangLiu0306
[autoparallel] Add pofo sequence annotation (#1637) by Boyuan Yao
[autoparallel] add elementwise handler (#1622) by YuliangLiu0306
[autoparallel] add embedding handler (#1620) by YuliangLiu0306
[autoparallel] protect bcast handler from invalid strategies (#1631) by YuliangLiu0306
[autoparallel] add layernorm handler (#1629) by YuliangLiu0306
[autoparallel] recover the merged node strategy index (#1613) by YuliangLiu0306
[autoparallel] added new linear module handler (#1616) by Frank Lee
[autoparallel] added new node handler (#1612) by Frank Lee
[autoparallel]add bcast matmul strategies (#1605) by YuliangLiu0306
[autoparallel] refactored the data structure for sharding strategy (#1610) by Frank Lee
[autoparallel] add bcast op handler (#1600) by YuliangLiu0306
[autoparallel] added all non-bcast matmul strategies (#1603) by Frank Lee
[autoparallel] added strategy generator and bmm strategies (#1602) by Frank Lee
[autoparallel] add reshape handler (#1594) by YuliangLiu0306
[autoparallel] refactored shape consistency to remove redundancy (#1591) by Frank Lee
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) by YuliangLiu0306
[autoparallel] added generateshardingspec to utils (#1590) by Frank Lee
[autoparallel] added solver option dataclass (#1588) by Frank Lee
[autoparallel] adapt solver with resnet (#1583) by YuliangLiu0306

Fx/meta/rpc

[fx/meta/rpc] move metaregistration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) by Super Daniel

Embeddings

[embeddings] add doc in readme (#1711) by Jiarui Fang
[embeddings] more detailed timer (#1692) by Jiarui Fang
[embeddings] cache option (#1635) by Jiarui Fang
[embeddings] use cacheratio instead of cudarow_num (#1611) by Jiarui Fang
[embeddings] add alreadysplitalong_rank flag for tablewise mode (#1584) by CsRic

Unittest

[unittest] added doc for the pytest wrapper (#1704) by Frank Lee
[unittest] supported condititonal testing based on env var (#1701) by Frank Lee

Embedding

[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) by Jiarui Fang
[embedding] polish async copy (#1657) by Jiarui Fang
[embedding] add more detail profiling (#1656) by Jiarui Fang
[embedding] print profiling results (#1654) by Jiarui Fang
[embedding] non-blocking cpu-gpu copy (#1647) by Jiarui Fang
[embedding] isolate cache_op from forward (#1645) by CsRic
[embedding] rollback for better FAW performance (#1625) by Jiarui Fang
[embedding] updates some default parameters by Jiarui Fang

Fx/profiler

[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679) by Super Daniel
[fx/profiler] provide a table of summary. (#1634) by Super Daniel
[fx/profiler] tuned the calculation of memory estimation (#1619) by Super Daniel

Pipeline/fix-bug

[pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684) by Kirigaya Kazuto

Pipeline/rank_recorder

[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) by Kirigaya Kazuto

Feature

[feature] A new ZeRO implementation (#1644) by HELSON
Revert "[feature] new zero implementation (#1623)" (#1643) by Jiarui Fang
[feature] new zero implementation (#1623) by HELSON

Fx

[fx] Add concrete info prop (#1677) by Boyuan Yao
[fx] refactor code for profiler / enable fake tensor movement. (#1646) by Super Daniel
[fx] fix offload codegen test (#1648) by Boyuan Yao
[fx] Modify offload codegen (#1618) by Boyuan Yao
[fx] PoC of runtime shape consistency application (#1607) by YuliangLiu0306
[fx] Add pofo solver (#1608) by Boyuan Yao
[fx] Add offload codegen (#1598) by Boyuan Yao
[fx] provide an accurate estimation of memory. (#1587) by Super Daniel
[fx] Improve linearize and rotor solver (#1586) by Boyuan Yao
[fx] Add nested checkpoint in activation checkpoint codegen (#1585) by Boyuan Yao

Pipeline/pytree

[pipeline/pytree] add pytree to process args and kwargs | provide data_process_func to process args and kwargs after forward (#1642) by Kirigaya Kazuto

Fix

[fix] fixed the collective pattern name for consistency (#1649) by Frank Lee

Moe

[moe] initialize MoE groups by ProcessGroup (#1640) by HELSON
[moe] fix moe bugs (#1633) by HELSON
[moe] fix MoE bugs (#1628) by HELSON

Tensor

[tensor] use communication autograd func (#1617) by YuliangLiu0306

Pipeline/chimera

[pipeline/chimera] test chimera | fix bug of initializing (#1615) by Kirigaya Kazuto
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595) by Kirigaya Kazuto

Workflow

[workflow] deactivate conda environment before removing (#1606) by Frank Lee

Fx/tuning

[fx/tuning] tune performance on rotor with meta info. (#1599) by Super Daniel

Hotfix/rotor

[hotfix/rotor] fix variable names (#1597) by Super Daniel

Nfc

[NFC] add OPT serving (#1581) by binmakeswell
[NFC] polish ./colossalai/trainer/hooks/lrscheduler_hook.py code style (#1576) by Boyuan Yao
[NFC] polish colossalai/zero/shardedmodel/reducescatter.py code style (#1554) by Fazzie-Maqianli
[NFC] polish utils/tensordetector/init_.py code style (#1573) by CsRic
[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) by Sze-qq
[NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571) by superhao1995
[NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570) by Jiatong Han
[NFC] polish colossalai/pipeline/utils.py code style (#1562) by Zirui Zhu
[NFC] polish colossalai/fx/tracer/metapatch/patchedmodule/convolution.py code style (#1563) by Xue Fuzhao
[NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) by Zangwei Zheng
[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) by DouJS
[NFC] polish colossalai/utils/tensordetector/tensordetector.py code style (#1566) by LuGY
[NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) by BigOneLiXiaoMing
[NFC] polish colossalai/builder/init.py code style (#1560) by Ziheng Qin
[NFC] polish colossalai/testing/comparison.py code style. (#1558) by Super Daniel
[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) by Ofey Chan
[NFC] polish code colossalai/gemini/update/search_utils.py (#1557) by Kai Wang (Victor Kai)
[NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) by yuxuan-lou
[NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) by shenggan
[NFC] polish colossalai/nn/ops/embeddingbag.py code style (#1552) by Maruyama_Aya
[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style by binmakeswell
[NFC] polish colossalai/utils/multitensorapply/multitensorapply.py code style (#1559) by Kirigaya Kazuto

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc1...v0.1.10

- Python
Published by github-actions[bot] over 3 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.10 Release Today!

What's Changed

Embedding

[embedding] cache_embedding small improvement (#1564) by CsRic
[embedding] polish parallel embedding tablewise (#1545) by Jiarui Fang
[embedding] freqawareembedding: add small functions for caller application (#1537) by CsRic
[embedding] fix a bug in table wise sharding (#1538) by Jiarui Fang
[embedding] tablewise sharding polish (#1535) by Jiarui Fang
[embedding] add tablewise sharding for FAW (#1526) by CsRic

Nfc

[NFC] polish test component gpt code style (#1567) by アマデウス
[NFC] polish doc style for ColoTensor (#1457) by Jiarui Fang
[NFC] global vars should be upper case (#1456) by Jiarui Fang

Pipeline/tuning

[pipeline/tuning] improve dispatch performance both time and space cost (#1544) by Kirigaya Kazuto

Fx

[fx] provide a stable but not accurate enough version of profiler. (#1547) by Super Daniel
[fx] Add common node in model linearize (#1542) by Boyuan Yao
[fx] support meta tracing for aten level computation graphs like functorch. (#1536) by Super Daniel
[fx] Modify solver linearize and add corresponding test (#1531) by Boyuan Yao
[fx] add test for meta tensor. (#1527) by Super Daniel
[fx]patch nn.functional convolution (#1528) by YuliangLiu0306
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521) by Boyuan Yao
[fx] hack torch_dispatch for meta tensor and autograd. (#1515) by Super Daniel
[fx] Fix activation codegen dealing with checkpointing first op (#1510) by Boyuan Yao
[fx] fix the discretize bug (#1506) by Boyuan Yao
[fx] fix wrong variable name in solver rotor (#1502) by Boyuan Yao
[fx] Add activation checkpoint solver rotor (#1496) by Boyuan Yao
[fx] add more op patches for profiler and error message for unsupported ops. (#1495) by Super Daniel
[fx] fixed adapative pooling size concatenation error (#1489) by Frank Lee
[fx] add profiler for fx nodes. (#1480) by Super Daniel
[fx] Fix ckpt functions' definitions in forward (#1476) by Boyuan Yao
[fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466) by Super Daniel
[fx] add rules to linearize computation graphs for searching. (#1461) by Super Daniel
[fx] Add use_reentrant=False to checkpoint in codegen (#1463) by Boyuan Yao
[fx] fix test and algorithm bugs in activation checkpointing. (#1451) by Super Daniel
[fx] Use colossalai checkpoint and add offload recognition in codegen (#1439) by Boyuan Yao
[fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446) by Super Daniel

Autoparallel

[autoparallel]add backward cost info into strategies (#1524) by YuliangLiu0306
[autoparallel] support fucntion in operator handler (#1529) by YuliangLiu0306
[autoparallel] change the merge node logic (#1533) by YuliangLiu0306
[autoparallel] added liveness analysis (#1516) by Frank Lee
[autoparallel] add more sharding strategies to conv (#1487) by YuliangLiu0306
[autoparallel] add cost graph class (#1481) by YuliangLiu0306
[autoparallel] added namespace constraints (#1490) by Frank Lee
[autoparallel] integrate auto parallel with torch fx (#1479) by Frank Lee
[autoparallel] added dot handler (#1475) by Frank Lee
[autoparallel] introduced baseclass for op handler and reduced code redundancy (#1471) by Frank Lee
[autoparallel] standardize the code structure (#1469) by Frank Lee
[autoparallel] Add conv handler to generate strategies and costs info for conv (#1467) by YuliangLiu0306

Utils

[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548) by ver217
[utils] optimize partitiontensorparallelstatedict (#1546) by ver217
[utils] Add usereetrant=False in utils.activationcheckpoint (#1460) by Boyuan Yao
[utils] Impl clipgradnorm for ColoTensor and ZeroOptimizer (#1442) by ver217

Hotfix

[hotfix] change namespace for meta_trace. (#1541) by Super Daniel
[hotfix] fix init context (#1543) by ver217
[hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530) by Super Daniel
[hotfix] fix coloproxy typos. (#1519) by Super Daniel

Pipeline/piplelineprocessgroup

[pipeline/piplelineprocessgroup] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) by Kirigaya Kazuto

Doc

[doc] docstring for FreqAwareEmbeddingBag (#1525) by Jiarui Fang
[doc] update readme with the new xTrimoMultimer project (#1477) by Sze-qq
[doc] update docstring in ProcessGroup (#1468) by Jiarui Fang
[Doc] add more doc for ColoTensor. (#1458) by Jiarui Fang

Autoparellel

[autoparellel]add strategies constructor (#1505) by YuliangLiu0306

Faw

[FAW] cpu caching operations (#1520) by Jiarui Fang
[FAW] refactor reorder() for CachedParamMgr (#1514) by Jiarui Fang
[FAW] LFU initialize with dataset freq (#1513) by Jiarui Fang
[FAW] shrink freq_cnter size (#1509) by CsRic
[FAW] remove code related to chunk (#1501) by Jiarui Fang
[FAW] add more docs and fix a warning (#1500) by Jiarui Fang
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494) by CsRic
[FAW] LFU cache for the FAW by CsRic
[FAW] init an LFU implementation for FAW (#1488) by Jiarui Fang
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448) by Geng Zhang

Pipeline/rpc

[pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) by Kirigaya Kazuto
[pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) by Kirigaya Kazuto
[pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483) by Kirigaya Kazuto
[pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) by Kirigaya Kazuto

Tensor

[tensor]add 1D device mesh (#1492) by YuliangLiu0306
[tensor] support runtime ShardingSpec apply (#1453) by YuliangLiu0306
[tensor] shape consistency generate transform path and communication cost (#1435) by YuliangLiu0306
[tensor] added linear implementation for the new sharding spec (#1416) by Frank Lee

Fce

[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) by Geng Zhang

Workflow

[workflow] added TensorNVMe to compatibility test (#1449) by Frank Lee

Test

[test] fixed the activation codegen test (#1447) by Frank Lee

Engin/schedule

[engin/schedule] use p2pv2 to recontruct pipelineschedule (#1408) by Kirigaya Kazuto

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.10...v0.1.9

- Python
Published by github-actions[bot] almost 4 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.9 Release Today!

What's Changed

Zero

[zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
[zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
[zero] add hasinfor_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
[zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
[zero] add AgChunk (#1417) by HELSON
[zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
[zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
[zero] chunk manager allows filtering ex-large params (#1393) by ver217
[zero] zero optim statedict takes onlyrank_0 (#1384) by ver217

Fx

[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) by Super Daniel
[fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) by Super Daniel
[fx] fixed torchaudio conformer tracing (#1392) by Frank Lee
[fx] patched torch.max and data movement operator (#1391) by Frank Lee
[fx] fixed indentation error in checkpointing codegen (#1385) by Frank Lee
[fx] patched torch.full for huggingface opt (#1386) by Frank Lee
[fx] update split module pass and add customized policy (#1373) by YuliangLiu0306
[fx] add torchaudio test (#1369) by Super Daniel
[fx] Add colotracer compatibility test on torchrec (#1370) by Boyuan Yao
[fx]add gpt2 passes for pipeline performance test (#1366) by YuliangLiu0306
[fx] added activation checkpoint codegen support for torch < 1.12 (#1359) by Frank Lee
[fx] added activation checkpoint codegen (#1355) by Frank Lee
[fx] fixed apex normalization patch exception (#1352) by Frank Lee
[fx] added activation checkpointing annotation (#1349) by Frank Lee
[fx] update MetaInforProp pass to process more complex node.meta (#1344) by YuliangLiu0306
[fx] refactor tracer to trace complete graph (#1342) by YuliangLiu0306
[fx] tested the complete workflow for auto-parallel (#1336) by Frank Lee
[fx]refactor tracer (#1335) by YuliangLiu0306
[fx] recovered skipped pipeline tests (#1338) by Frank Lee
[fx] fixed compatiblity issue with torch 1.10 (#1331) by Frank Lee
[fx] fixed unit tests for torch 1.12 (#1327) by Frank Lee
[fx] add balanced policy v2 (#1251) by YuliangLiu0306
[fx] Add unit test and fix bugs for transformmlppass (#1299) by XYE
[fx] added apex normalization to patched modules (#1300) by Frank Lee

Recommendation System

[FAW] export FAW in _ops (#1438) by Jiarui Fang
[FAW] move coloparam setting in test code. (#1429) by Jiarui Fang
[FAW] parallel FreqAwareEmbedding (#1424) by Jiarui Fang
[FAW] add cache manager for the cached embedding (#1419) by Jiarui Fang

Global Tensor

[tensor] add shape consistency feature to support auto spec transform (#1418) by YuliangLiu0306
[tensor]build sharding spec to replace distspec in future. (#1405) by YuliangLiu0306

Hotfix

[hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
[hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
[hotfix] fix a running error in testcolocheckpoint.py (#1387) by HELSON
[hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
[hotfix] fix zero optim save/load state dict (#1381) by ver217
[hotfix] fix zero ddp buffer cast (#1376) by ver217
[hotfix] fix no optimizer in save/load (#1363) by HELSON
[hotfix] fix megatroninit in testgpt2.py (#1357) by HELSON
[hotfix] ZeroDDP use new process group (#1333) by ver217
[hotfix] shared model returns cpu state_dict (#1328) by ver217
[hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
[hotfix] fix unit test testmodulespec (#1321) by HELSON
[hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
[hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
[hotfix] add missing file (#1308) by Jiarui Fang
[hotfix] remove potiential circle import (#1307) by Jiarui Fang
[hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
[hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
[hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang

Communication

[communication] add p2p_v2.py to support communication with ListAny by Kirigaya Kazuto

Device

[device] add DeviceMesh class to support logical device layout (#1394) by YuliangLiu0306

Chunk

[chunk] add PG check for tensor appending (#1383) by Jiarui Fang

DDP

[DDP] test ddp state dict uses more strict threshold (#1382) by ver217

Checkpoint

[checkpoint] add kwargs for loadstatedict (#1374) by HELSON
[checkpoint] use args, kwargs in savecheckpoint, loadcheckpoint (#1368) by HELSON
[checkpoint] sharded optim save/load grad scaler (#1350) by ver217
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
[checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
[checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang

Util

[util] standard checkpoint function naming (#1377) by Frank Lee

Nvme

[nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217

Colotensor

[colotensor] use cpu memory to store state_dict (#1367) by HELSON
[colotensor] add Tensor.view op and its unit test (#1343) by HELSON

Unit test

[unit test] add megatron init test in zero_optim (#1358) by HELSON

Docker

[docker] add tensornvme in docker (#1354) by ver217

Doc

[doc] update rst and docstring (#1351) by ver217

Refactor

[refactor] refactor ColoTensor's unit tests (#1340) by HELSON

Workflow

[workflow] update docker build workflow to use proxy (#1334) by Frank Lee
[workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
[workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
[workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
[workflow] updated release bdist workflow (#1318) by Frank Lee
[workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
[workflow] updated pytorch compatibility test (#1311) by Frank Lee

Test

[test] removed outdated unit test for meta context (#1329) by Frank Lee

Utils

[utils] integrated colotensor with lazy init context (#1324) by Frank Lee

Optimizer

[Optimizer] Remove useless ColoOptimizer (#1312) by Jiarui Fang
[Optimizer] polish the init method of ColoOptimizer (#1310) by Jiarui Fang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.9...v0.1.8

- Python
Published by github-actions[bot] almost 4 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.8 Release Today!

What's Changed

Hotfix

[hotfix] torchvison fx unittests miss import pytest (#1277) by Jiarui Fang
[hotfix] fix an assertion bug in base schedule. (#1250) by YuliangLiu0306
[hotfix] fix sharded optim step and clipgradnorm (#1226) by ver217
[hotfix] fx get comm size bugs (#1233) by Jiarui Fang
[hotfix] fx shard 1d pass bug fixing (#1220) by Jiarui Fang
[hotfix]fixed p2p process send stuck (#1181) by YuliangLiu0306
[hotfix]different overflow status lead to communication stuck. (#1175) by YuliangLiu0306
[hotfix]fix some bugs caused by refactored schedule. (#1148) by YuliangLiu0306

Tensor

[tensor] distributed checkpointing for parameters (#1240) by Jiarui Fang
[tensor] redistribute among different process groups (#1247) by Jiarui Fang
[tensor] a shorter shard and replicate spec (#1245) by Jiarui Fang
[tensor] redirect .data.get to a tensor instance (#1239) by HELSON
[tensor] add zero_like colo op, important for Optimizer (#1236) by Jiarui Fang
[tensor] fix some unittests (#1234) by Jiarui Fang
[tensor] fix a assertion in colotensor crossentropy (#1232) by HELSON
[tensor] add unitest for colotensor 1DTP crossentropy (#1230) by HELSON
[tensor] torch function return colotensor (#1229) by Jiarui Fang
[tensor] improve robustness of class 'ProcessGroup' (#1223) by HELSON
[tensor] sharded global process group (#1219) by Jiarui Fang
[Tensor] add cpu group to ddp (#1200) by Jiarui Fang
[tensor] remove gpc in tensor tests (#1186) by Jiarui Fang
[tensor] revert local view back (#1178) by Jiarui Fang
[Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176) by Jiarui Fang
[Tensor] rename parallel_action (#1174) by Ziyue Jiang
[Tensor] distributed view supports inter-process hybrid parallel (#1169) by Jiarui Fang
[Tensor] remove ParallelAction, use ComputeSpec instread (#1166) by Jiarui Fang
[tensor] add embedding bag op (#1156) by ver217
[tensor] add more element-wise ops (#1155) by ver217
[tensor] fixed non-serializable colo parameter during model checkpointing (#1153) by Frank Lee
[tensor] dist spec s2s uses all-to-all (#1136) by ver217
[tensor] added repr to spec (#1147) by Frank Lee

Fx

[fx] added ndim property to proxy (#1253) by Frank Lee
[fx] fixed tracing with apex-based T5 model (#1252) by Frank Lee
[fx] refactored the file structure of patched function and module (#1238) by Frank Lee
[fx] methods to get fx graph property. (#1246) by YuliangLiu0306
[fx]add split module pass and unit test from pipeline passes (#1242) by YuliangLiu0306
[fx] fixed huggingface OPT and T5 results misalignment (#1227) by Frank Lee
[fx]get communication size between partitions (#1224) by YuliangLiu0306
[fx] added patches for tracing swin transformer (#1228) by Frank Lee
[fx] fixed timm tracing result misalignment (#1225) by Frank Lee
[fx] added timm model tracing testing (#1221) by Frank Lee
[fx] added torchvision model tracing testing (#1216) by Frank Lee
[fx] temporarily used (#1215) by XYE
[fx] added testing for all albert variants (#1211) by Frank Lee
[fx] added testing for all gpt variants (#1210) by Frank Lee
[fx]add uniform policy (#1208) by YuliangLiu0306
[fx] added testing for all bert variants (#1207) by Frank Lee
[fx] supported model tracing for huggingface bert (#1201) by Frank Lee
[fx] added module patch for pooling layers (#1197) by Frank Lee
[fx] patched conv and normalization (#1188) by Frank Lee
[fx] supported data-dependent control flow in model tracing (#1185) by Frank Lee

Rename

[rename] converttodist -> redistribute (#1243) by Jiarui Fang

Checkpoint

[checkpoint] save sharded optimizer states (#1237) by Jiarui Fang
[checkpoint]support generalized scheduler (#1222) by Yi Zhao
[checkpoint] make unitest faster (#1217) by Jiarui Fang
[checkpoint] checkpoint for ColoTensor Model (#1196) by Jiarui Fang

Polish

[polish] polish repr for ColoTensor, DistSpec, ProcessGroup (#1235) by HELSON

Refactor

[refactor] move process group from _DistSpec to ColoTensor. (#1203) by Jiarui Fang
[refactor] remove gpc dependency in colotensor's _ops (#1189) by Jiarui Fang
[refactor] move chunk and chunkmgr to directory gemini (#1182) by Jiarui Fang

Context

[context]support arbitary module materialization. (#1193) by YuliangLiu0306
[context]use meta tensor to init model lazily. (#1187) by YuliangLiu0306

Ddp

[ddp] ColoDDP uses bucket all-reduce (#1177) by ver217
[ddp] refactor ColoDDP and ZeroDDP (#1146) by ver217

Colotensor

[ColoTensor] add independent process group (#1179) by Jiarui Fang
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) by Jiarui Fang
[ColoTensor] improves init functions. (#1150) by Jiarui Fang

Zero

[zero] sharded optim supports loading local state dict (#1170) by ver217
[zero] zero optim supports loading local state dict (#1171) by ver217

Workflow

[workflow] polish readme and dockerfile (#1165) by Frank Lee
[workflow] auto-publish docker image upon release (#1164) by Frank Lee
[workflow] fixed release post workflow (#1154) by Frank Lee
[workflow] fixed format error in yaml file (#1145) by Frank Lee
[workflow] added workflow to auto draft the release post (#1144) by Frank Lee

Gemini

[gemini] refactor gemini mgr (#1151) by ver217

Pipeline

[pipeline]add customized policy (#1139) by YuliangLiu0306
[pipeline]support more flexible pipeline (#1138) by YuliangLiu0306

Ci

[ci] added scripts to auto-generate release post text (#1142) by Frank Lee

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.8...v0.1.7

- Python
Published by github-actions[bot] about 4 years ago

https://github.com/hpcaitech/colossalai - Version v0.1.7 Released Today

Version v0.1.7 Released Today

Highlights

Started torch.fx for auto-parallel training
Update the zero mechanism with ColoTensor
Fixed various bugs

What's Changed

Hotfix

[hotfix] prevent nested ZeRO (#1140) by ver217
[hotfix]fix bugs caused by refactored pipeline (#1133) by YuliangLiu0306
[hotfix] fix param op hook (#1131) by ver217
[hotfix] fix zero init ctx numel (#1128) by ver217
[hotfix]change to fit latest p2p (#1100) by YuliangLiu0306
[hotfix] fix chunk comm src rank (#1072) by ver217

Zero

[zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
[zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
[zero] fixed api consistency (#1098) by Frank Lee
[zero] zero optim copy chunk rather than copy tensor (#1070) by ver217

Optim

[optim] refactor fused sgd (#1134) by ver217

Ddp

[ddp] add save/load state dict for ColoDDP (#1127) by ver217
[ddp] add setparamsto_ignore for ColoDDP (#1122) by ver217
[ddp] supported customized torch ddp configuration (#1123) by Frank Lee

Pipeline

[pipeline]support List of Dict data (#1125) by YuliangLiu0306
[pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) by Frank Lee
[pipeline] refactor the pipeline module (#1087) by Frank Lee

Fx

[fx]add autoparallel passes (#1121) by YuliangLiu0306
[fx] added unit test for coloproxy (#1119) by Frank Lee
[fx] added coloproxy (#1115) by Frank Lee

Gemini

[gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
[gemini] zero supports gemini (#1093) by ver217

Test

[test] fixed hybrid parallel test case on 8 GPUs (#1106) by Frank Lee
[test] skip tests when not enough GPUs are detected (#1090) by Frank Lee
[test] ignore 8 gpu test (#1080) by Frank Lee

Release

[release] update version.txt (#1103) by Frank Lee

Tensor

[tensor] refactor param op hook (#1097) by ver217
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) by ver217
[Tensor] fix equal assert (#1091) by Ziyue Jiang
[Tensor] 1d row embedding (#1075) by Ziyue Jiang
[tensor] chunk manager monitor mem usage (#1076) by ver217
[Tensor] fix optimizer for CPU parallel (#1069) by Ziyue Jiang
[Tensor] add hybrid device demo and fix bugs (#1059) by Ziyue Jiang

Amp

[amp] included dict for type casting of model output (#1102) by Frank Lee

Workflow

[workflow] fixed 8-gpu test workflow (#1101) by Frank Lee
[workflow] added regular 8 GPU testing (#1099) by Frank Lee
[workflow] disable p2p via shared memory on non-nvlink machine (#1086) by Frank Lee

Engine

[engine] fixed empty op hook check (#1096) by Frank Lee

Doc

[doc] added documentation to chunk and chunk manager (#1094) by Frank Lee

Context

[context] support lazy init of module (#1088) by Frank Lee
[context] maintain the context object in with statement (#1073) by Frank Lee

Refactory

[refactory] add nn.parallel module (#1068) by Jiarui Fang

Cudnn

[cudnn] set False to cudnn benchmark by default (#1063) by Frank Lee

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.7...v0.1.6

- Python
Published by FrankLeeeee about 4 years ago

https://github.com/hpcaitech/colossalai - v0.1.6 Released!

Main features

ColoTensor supports hybrid parallel (tensor parallel and data parallel)
ColoTensor supports ZeRO (with chunk)
Config tensor parallel by module via ColoTensor
ZeroInitContext and ShardedModelV2 support loading checkpoint and hugging face from_pretrain()

What's Changed

ColoTensor

[tensor] refactor colo-tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/992
[tensor] refactor parallel action by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1007
[tensor] impl ColoDDP for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1009
[Tensor] add module handler for linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1021
[Tensor] add module check and bert test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1031
[Tensor] add Parameter inheritance for ColoParameter by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1041
[tensor] ColoTensor supports ZeRo by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1015
[zero] add chunk size search for chunk manager by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1052

Zero

[zero] add loadstatedict for sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/894
[zero] add zero optimizer for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1046

Hotfix

[hotfix] fix colo init context by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1026
[hotfix] fix some bugs caused by size mismatch. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1011
[kernel] fixed the include bug in dropout kernel by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/999
fix typo in constants by @ryanrussell in https://github.com/hpcaitech/ColossalAI/pull/1027
[engine] fixed bug in gradient accumulation dataloader to keep the last step by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1030
[hotfix] fix dist spec mgr by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1045
[hotfix] fix import error in sharded model v2 by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1053

Unit test

[unit test] refactor test tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1005

CI

[ci] update the docker image name by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1017
[ci] added nightly build (#1018) by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1019
[ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1022
[ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1029
[ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1040

CLI

[cli] remove unused imports by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1001

Documentation

Hotfix/format by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/987
[doc] update docker instruction by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1020

Misc

[NFC] Hotfix/format by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/984
Revert "[NFC] Hotfix/format" by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/986
remove useless import in tensor dir by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/997
[NFC] fix download link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/998
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1003
[NFC] polish colossalai/kernel/cudanative/csrc/colossalC_frontend.c… by @zhengzangw in https://github.com/hpcaitech/ColossalAI/pull/1010
[NFC] fix paper link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1012
[p2p]add object list send/recv by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1024
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1034
[NFC] add inference by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1044
[titans]remove model zoo by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1042
[NFC] add inference submodule in path by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1047
[release] update version.txt by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1048
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1049
updated collective ops api by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/1054
[pipeline]refactor ppschedule to support tensor list by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1050

New Contributors

@ryanrussell made their first contribution in https://github.com/hpcaitech/ColossalAI/pull/1027

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.5...v0.1.6

- Python
Published by ver217 about 4 years ago

https://github.com/hpcaitech/colossalai - v0.1.5 Released!

Main Features

Enhance ColoTensor and build a demo to train BERT (from hugging face) using Tensor Parallelism without modifying model.

What's Changed

ColoTensor

[Tensor] add ColoTensor TP1Dcol Embedding by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/899
[Tensor] add embedding tp1d row by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/904
[Tensor] update pytest.mark.parametrize in tensor tests by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/913
[Tensor] init ColoParameter by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/914
[Tensor] add a basic bert. by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/911
[Tensor] polish model test by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/915
[Tensor] fix test_model by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/916
[Tensor] add 1d vocab loss by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/918
[Graph] building computing graph with ColoTensor, Linear only by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/917
[Tensor] add from_pretrained support and bert pretrained test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/921
[Tensor] test pretrain loading on multi-process by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/922
[tensor] hijack addmm for colo tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/923
[tensor] colo tensor overrides mul by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/927
[Tensor] simplify named param by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/928
[Tensor] fix init context by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/931
[Tensor] add optimizer to bert test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/933
[tensor] design DistSpec and DistSpecManager for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/934
[Tensor] add DistSpec for loss and test_model by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/947
[tensor] derive compute pattern from dist spec by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/971

Pipeline Parallelism

[pipelinable]use pipelinable to support GPT model. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/903

CI

[CI] add CI for releasing bdist wheel by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/901
[CI] fix release bdist CI by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/902
[ci] added wheel build scripts by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/910

Misc

[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/907
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/912
[setup] update cuda ext cc flags by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/919
[setup] support more cuda architectures by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/920
[NFC] update results on a single GPU, highlight quick view by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/981

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.4...v0.1.5

- Python
Published by ver217 about 4 years ago

https://github.com/hpcaitech/colossalai - v0.1.4 Released!

Main Features

Here are the main improvements of this release: 1. ColoTensor: A data structure that unifies the Tensor representation of different parallel methods. 2. Gemini: More efficient Genimi implementation reduces the overhead of model data statistic collection. 3. CLI: a command-line tool that helps users launch distributed training tasks more easily. 4. Pipeline Parallelism (PP): a more user-friendly API for PP.

What's Changed

ColoTensor

[tensor]fix colotensor torchfunction by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/825
[tensor]fix test_linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/826
[tensor] ZeRO use ColoTensor as the base class. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/828
[tensor] revert zero tensors back by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/829
[Tensor] overriding paramters() for Module using ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/889
[tensor] refine linear and add gather for laynorm by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/893
[Tensor] test parameters() as member function by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/896
[Tensor] activation is an attr of ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/897
[Tensor] initialize the ColoOptimizer by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/898
[tensor] reorganize files by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/820
[Tensor] apply ColoTensor on Torch functions by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/821
[Tensor] update ColoTensor torch_function by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/822
[tensor] lazy init by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/823
[WIP] Applying ColoTensor on TP-1D-row Linear. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/831
Init Conext supports lazy allocate model memory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/842
[Tensor] TP Linear 1D row by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/843
[Tensor] add assert for colo_tensor 1Drow by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/846
[Tensor] init a simple network training with ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/849
[Tensor ] Add 1Drow weight reshard by spec by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/854
[Tensor] add layer norm Op by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/852
[tensor] an initial dea of tensor spec by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/865
[Tensor] colo init context add device attr. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/866
[tensor] add crossentropyloss by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/868
[Tensor] Add function to spec and update linear 1Drow and unit tests by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/869
[tensor] customized op returns ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/875
[Tensor] get named parameters for model using ColoTensors by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/874
[Tensor] Add some attributes to ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/877
[Tensor] make a simple net works with 1D row TP by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/879
[tensor] wrap function in the torch_tensor to ColoTensor by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/881
[Tensor] make ColoTensor more robust for getattr by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/886
[Tensor] test model check results for a simple net by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/887
[tensor] add ColoTensor 1Dcol by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/888

Gemini + ZeRO

[zero] add zero tensor shard strategy by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/793
Revert "[zero] add zero tensor shard strategy" by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/806
[gemini] a new tensor structure by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/818
[gemini] APIs to set cpu memory capacity by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/809
[DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/808
[gemini] collect cpu-gpu moving volume in each iteration by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/813
[gemini] add GeminiMemoryManger by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/832
[zero] use GeminiMemoryManager when sampling model data by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/850
[gemini] polish code by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/855
[gemini] add stateful tensor container by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/867
[gemini] polish statefultensormgr by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/876
[gemini] accelerate adjust_layout() by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/878

CLI

[cli] added distributed launcher command by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/791
[cli] added micro benchmarking for tp by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/789
[cli] add missing requirement by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/805
[cli] fixed a bug in user args and refactored the module structure by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/807
[cli] fixed single-node process launching by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/812
[cli] added check installation cli by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/815
[CLI] refactored the launch CLI and fixed bugs in multi-node launching by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/844
[cli] refactored micro-benchmarking cli and added more metrics by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/858

Pipeline Parallelism

[pipelinable]use pipelinable context to initialize non-pipeline model by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/816
[pipelinable]use ColoTensor to replace dummy tensor. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/853
### Misc
[hotfix] fix auto tensor placement policy by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/775
[hotfix] change the check assert in split batch 2d by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/772
[hotfix] fix bugs in zero by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/781
[hotfix] fix grad offload when enabling reusefp16shard by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/784
[refactor] moving memtracer to gemini by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/801
[log] display tflops if available by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/802
[refactor] moving grad acc logic to engine by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/804
[log] local throughput metrics by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/811
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/810
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/819
[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/824
[setup] allow installation with python 3.6 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/834
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear." by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/835
[dependency] removed torchvision by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/833
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/827
[unittest] refactored unit tests for change in dependency by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/838
[setup] use env var instead of option for cuda ext by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/839
[hotfix] ColoTensor pin_memory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/840
modefied the pp build for ckpt adaptation by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/803
[hotfix] the bug of numel() in ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/845
[hotfix] fix postinit_method of zero init ctx by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/847
[hotfix] add deconstructor for stateful tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/848
[utils] refactor profiler by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/837
[ci] cache cuda extension by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/860
hotfix tensor unittest bugs by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/862
[usability] added assertion message in registry by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/864
[doc] improved docstring in the communication module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/863
[doc] improved docstring in the logging module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/861
[doc] improved docstring in the amp module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/857
[usability] improved error messages in the context module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/856
[doc] improved error messages in initialize by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/872
[doc] improved assertion messages in trainer by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/873
[doc] improved docstring and assertion messages for the engine module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/871
[hotfix] fix import error by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/880
[setup] add local version label by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/890
[model_zoo] change qkv processing by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/870

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.3...v0.1.4

- Python
Published by feifeibear about 4 years ago

https://github.com/hpcaitech/colossalai - V0.1.3 Released!

Overview

Here are the main improvements of this release: 1. Gemini: Heterogeneous memory space manager 2. Refactor the API of pipeline parallelism

What's Changed

Features

[zero] initialize a stateful tensor manager by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/614
[pipeline] refactor pipeline by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/679
[zero] stateful tensor manager by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/687
[zero] adapt zero hooks for unsharded module by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/699
[zero] refactor memstats collector by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/706
[zero] improve adaptability for not-shard parameters by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/708
[zero] check whether gradients have inf and nan in gpu by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/712
[refactor] refactor the memory utils by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/715
[util] support detection of number of processes on current node by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/723
[utils] add synchronized cuda memory monitor by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/740
[zero] refactor ShardedParamV2 by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/742
[zero] add tensor placement policies by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/743
[zero] use factory pattern for tensorplacementpolicy by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/752
[zero] refactor memstats_collector by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/746
[gemini] init genimi individual directory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/754
refactor shard and gather operation by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/773

Bug Fix

[zero] fix init bugs in zero context by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/686
[hotfix] update requirements-test by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/701
[hotfix] fix a bug in 3d vocab parallel embedding by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/707
[compatibility] fixed tensor parallel compatibility with torch 1.9 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/700
[hotfix]fixed bugs of assigning grad states to non leaf nodes by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/711
[hotfix] fix stateful tensor manager's cuda model data size by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/710
[bug] fixed broken testfoundinf by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/725
[util] fixed activation checkpointing on torch 1.9 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/719
[util] fixed communication API with PyTorch 1.9 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/721
[bug] removed zero installation requirements by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/731
[hotfix] remove duplicated param register to stateful tensor manager by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/728
[utils] correct cpu memory used and capacity in the context of multi-process by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/726
[bug] fixed grad scaler compatibility with torch 1.8 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/735
[bug] fixed DDP compatibility with torch 1.8 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/739
[hotfix] fix memory leak in backward of sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/741
[hotfix] fix initialize about zero by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/748
[hotfix] fix prepare grads in sharded optim by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/749
[hotfix] layernorm by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/750
[hotfix] fix auto tensor placement policy by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/753
[hotfix] fix reusefp16shard of sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/756
[hotfix] fix teststatefultensor_mgr by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/762
[compatibility] used backward-compatible API for global process group by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/758
[hotfix] fix the ckpt hook bugs when using DDP by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/769
[hotfix] polish sharded optim docstr and warning by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/770

Unit Testing

[ci] replace the ngc docker image with self-built pytorch image by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/672
[ci] fixed compatibility workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/678
[ci] update workflow trigger condition and support options by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/691
[ci] added missing field in workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/692
[ci] remove ipc config for rootless docker by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/694
[test] added missing decorators to model checkpointing tests by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/727
[unitest] add checkpoint for moe zero test by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/729
[test] added a decorator for address already in use error with backward compatibility by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/760
[test] refactored with the new rerun decorator by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/763

Documentation

add PaLM link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/704
[doc] removed outdated installation command by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/730
add video by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/732
[readme] polish readme by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/764
[readme] sync CN readme by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/766

Miscellaneous

[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/556
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/695
[refactor] zero directory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/724
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/751

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.2...v0.1.3

- Python
Published by ver217 over 4 years ago

https://github.com/hpcaitech/colossalai - V0.1.2 Released!

Overview

Here are the main improvements of this release: 1. MOE and BERT models can be trained with ZeRO. 2. Provide a uniform checkpoint for all kinds of parallelism. 3. Optimize ZeRO-offload, and improve model scaling. 4. Design a uniform model memory tracer. 5. Implement an efficient hybrid Adam (CPU and CUDA kernels). 6. Improve activation offloading. 7. Profiler TensorBoard plugin of Beta version. 8. Refactor pipeline module for closer integration with engine. 9. Chinese tutorials, WeChat and Slack user groups.

What's Changed

Features

[zero] get memory usage for sharded param by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/536
[zero] improve the accuracy of getmemoryusage of sharded param by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/538
[zero] refactor model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/537
[zero] get memory usage of sharded optim v2. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/542
[zero] polish ZeroInitContext by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/540
[zero] optimize grad offload by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/539
[zero] non model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/545
[zero] add zero config to neutralize zero context init by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/546
[zero] dump memory stats for sharded model by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/548
[zero] add stateful tensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/549
[zero] label state for param fp16 and grad by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/551
[zero] hijack p.grad in sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/554
[utils] update colo tensor moving APIs by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/553
[polish] rename colattr -> coloattr by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/558
[zero] trace states of fp16/32 grad and fp32 param by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/571
[zero] adapt zero for unsharded parameters by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/561
[refactor] memory utils by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/577
Feature/checkpoint gloo by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/589
[zero] add sampling time for memstats collector by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/610
[model checkpoint] checkpoint utils by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/592
[model checkpoint][hotfix] unified layers for save&load by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/593
Feature/checkpoint 2D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/595
Feature/checkpoint 1D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/594
[model checkpoint] CPU communication ops by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/590
Feature/checkpoint 2.5D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/596
Feature/Checkpoint 3D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/597
[model checkpoint] checkpoint hook by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/598
Feature/Checkpoint tests by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/599
[zero] adapt zero for unsharded parameters (Optimizer part) by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/601
[zero] polish init context by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/645
refactor pipeline---put runtime schedule into engine. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/627

Bug Fix

[Zero] process no-leaf-module in Zero by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/535
Add gather_out arg to Linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/541
[hoxfix] fix parallelinput flag for Linear1DCol gather_output by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/579
[hotfix] add hybrid adam to init by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/584
Hotfix/path check util by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/591
[hotfix] fix sharded optim zero grad by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/604
Add tensor parallel input check by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/621
[hotfix] Raise messages for indivisible batch sizes with tensor parallelism by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/622
[zero] fixed the activation offload by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/647
fixed bugs in CPU adam by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/633
Revert "[zero] polish init context" by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/657
[hotfix] fix a bug in model data stats tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/655
fix bugs for unsharded parameters when restore data by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/664

Unit Testing

[zero] test zero tensor utils by @FredHuang99 in https://github.com/hpcaitech/ColossalAI/pull/609
remove hybrid adam in testmoezero_optim by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/659

Documentation

Refactored docstring to google style by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/532
[docs] updatad docs of hybrid adam and cpu adam by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/552
html refactor by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/555
[doc] polish docstring of zero by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/612
[doc] update rst by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/615
[doc] polish amp docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/616
[doc] polish moe docsrting by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/618
[doc] polish optimizer docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/619
[doc] polish utils docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/620
[NFC] polish colossalai/kernel/cudanative/csrc/kernels/cudautil.cu … by @GaryGky in https://github.com/hpcaitech/ColossalAI/pull/625
[doc] polish checkpoint docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/637
update GPT-2 experiment result by @Sze-qq in https://github.com/hpcaitech/ColossalAI/pull/666
[NFC] polish code by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/646

Model Zoo

[model zoo] add activation offload for gpt model by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/582

Miscellaneous

[logging] polish logger format by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/543
[profiler] add MemProfiler by @raejaf in https://github.com/hpcaitech/ColossalAI/pull/356
[Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/501
[tool] create .clang-format for pre-commit by @BoxiangW in https://github.com/hpcaitech/ColossalAI/pull/578
[GitHub] Add prefix and label in issue template by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/652

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.1...v0.1.2

- Python
Published by ver217 over 4 years ago

https://github.com/hpcaitech/colossalai - V0.1.1 Released Today!

What's Changed

Features

[MOE] changed parallelmode to dist process group by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/460
[MOE] redirect moeenv from globalvariables to core by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/467
[zero] zero init ctx receives a dp process group by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/471
[zero] ZeRO supports pipeline parallel by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/477
add LinearGate for MOE in NaiveAMP context by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/480
[zero] polish sharded param name by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/484
[zero] sharded optim support hybrid cpu adam by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/486
[zero] polish sharded optimizer v2 by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/490
[MOE] support PR-MOE by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/488
[zero] sharded model manages ophooks individually by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/492
[MOE] remove old MoE legacy by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/493
[zero] sharded model support the reuse of fp16 shard by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/495
[polish] polish singleton and global context by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/500
[memory] add model data tensor moving api by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/503
[memory] set cuda mem frac by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/506
[zero] use colo model data api in sharded optimv2 by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/511
[MOE] add MOEGPT model by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/510
[zero] zero init ctx enable rmtorchpayloadonthe_fly by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/512
[zero] show model data cuda memory usage after zero context init. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/515
[log] polish disableexistingloggers by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/519
[zero] add model data tensor inline moving API by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/521
[cuda] modify the fused adam, support hybrid of fp16 and fp32 by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/497
[zero] refactor model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/522
[zero] added hybrid adam, removed loss scale in adam by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/527

Bug Fix

fix discussion buttom in issue template by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/504
[zero] fix grad offload by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/528

Unit Testing

[MOE] add unitest for MOE experts layout, gradient handler and kernel by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/469
[test] added rerun on exception for testing by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/475
[zero] fix init device bug in zero init context unittest by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/516
[test] fixed rerunonexception and adapted test cases by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/487

CI/CD

[devops] remove tsinghua source for pip by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/505
[devops] remove tsinghua source for pip by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/507
[devops] recover tsinghua pip source due to proxy issue by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/509

Documentation

[doc] update rst by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/470
Update Experiment result about Colossal-AI with ZeRO by @Sze-qq in https://github.com/hpcaitech/ColossalAI/pull/479
[doc] docs get correct release version by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/489
Update README.md by @fastalgo in https://github.com/hpcaitech/ColossalAI/pull/514
[doc] update apidoc by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/530

Model Zoo

[model zoo] fix attn mask shape of gpt by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/472
[model zoo] gpt embedding remove attn mask by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/474

Miscellaneous

[install] run with out rich by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/513
[refactor] remove old zero code by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/517
[format] polish name format for MOE by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/481

New Contributors

@fastalgo made their first contribution in https://github.com/hpcaitech/ColossalAI/pull/514

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.0...v0.1.1

- Python
Published by ver217 over 4 years ago

https://github.com/hpcaitech/colossalai - V0.1.0 Released Today!

Overview

We are happy to release the version v0.1.0 today. Compared to the previous version, we have a brand new zero module and updated many aspects of our system for better performance and usability. The latest version can be installed by pip install colossalai now. We will update our examples and documentation in the next few days accordingly.

Highlights:

Note: a. Only the major base commits are chosen to display. Successive commits which enhance/update the base commit are not shown.
b. Some commits do not have associated pull request ID for some unknown reasons.
c. The list is ordered by time.

Features - add moe context, moe utilities and refactor gradient handler (#455 )By @1SAA - [zero] Update initialize for ZeRO (#458 ) By @ver217 - [zero] hybrid cpu adam (#445 ) By @feifeibear - added Multiply Jitter and capacity factor eval for MOE (#434 ) By @1SAA - [fp16] refactored fp16 optimizer (#392 ) By @FrankLeeeee - [zero] memtracer to record cuda memory usage of model data and overall system (#395 ) By @feifeibear - Added tensor detector (#393 ) By @Gy-Lu - Added activation offload (#331 ) By @Gy-Lu - [zero] zero init context collect numel of model (#375 ) By @feifeibear - Added PCIE profiler to dectect data transmission (#373 ) By @1SAA - Added Profiler Context to manage all profilers (#340 ) By @1SAA - set criterion as optional in colossalai initialize (#336 ) By @FrankLeeeee - [zero] Update sharded model v2 using sharded param v2 (#323 ) By @ver217 - [zero] zero init context (#321 ) By @feifeibear - Added profiler communication operations By @1SAA - added buffer sync to naive amp model wrapper (#291 ) By @FrankLeeeee - [zero] cpu adam kernel (#288 ) By @Gy-Lu - Feature/zero (#279 ) By @feifeibear @FrankLeeeee @ver217 - impl shard optim v2 and add unit test By @ver217 - [profiler] primary memory tracer By @raejaf - add sharded adam By @ver217

Unit Testing - [test] fixed amp convergence comparison test (#454 ) By @FrankLeeeee - [test] optimized zero data parallel test (#452 ) By @FrankLeeeee - [test] make zero engine test really work (#447 ) By @feifeibear - optimized context test time consumption (#446 ) By @FrankLeeeee - [unitest] polish zero config in unittest (#438 ) By @feifeibear - added testing module (#435 ) By @FrankLeeeee - [zero] polish ShardedOptimV2 unittest (#385 ) By @feifeibear - [unit test] Refactored test cases with component func (#339 ) By @FrankLeeeee

Documentation - [doc] Update docstring for ZeRO (#459 ) By @ver217 - update README and images path (#384 ) By @binmakeswell - add badge and contributor list By @FrankLeeeee - add community group and update issue template (#271 ) By @binmakeswell - update experimental visualization (#253 ) By @Sze-qq - add Chinese README By @binmakeswell

CI/CD - update github CI with the current workflow (#441 ) By @FrankLeeeee - update unit testing CI rules By @FrankLeeeee - added compatibility CI and options for release ci By @FrankLeeeee - added pypi publication CI and remove formatting CI By @FrankLeeeee

Bug Fix - fix gpt attention mask (#461 ) By @ver217 - [bug] Fixed device placement bug in memory monitor thread (#433 ) By @FrankLeeeee - fixed fp16 optimizer none grad bug (#432 ) By @FrankLeeeee - fixed gpt attention mask in pipeline (#430 ) By @FrankLeeeee - [hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394 ) By @1SAA - fixed bug in activation checkpointing test (#387 ) By @FrankLeeeee - [profiler] Fixed bugs in CommProfiler and PcieProfiler (#377 ) By @1SAA - fixed CI dataset directory; fixed import error of 2.5d accuracy (#255 ) By @kurisusnowdeng - fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial By @kurisusnowdeng

Miscellaneous - [log] better logging display with rich (#426 ) By @feifeibear

- Python
Published by FrankLeeeee over 4 years ago

https://github.com/hpcaitech/colossalai - V0.0.2 Released Today!

Change Log

Added

Unifed distributed layers
MoE support
DevOps tools such as github action, code review automation, etc.
New project official website

Changes

refactored the APIs for usability, flexibility and modularity
adapted PyTorch AMP for tensor parallel
refactored utilities for tensor parallel and pipeline parallel
Separated benchmarks and examples as independent repositories
Updated pipeline parallelism to support non-interleaved and interleaved versions
refactored installation scripts for convenience

Fixed

zero level 3 runtime error
incorrect calculation in gradient clipping

- Python
Published by FrankLeeeee over 4 years ago

https://github.com/hpcaitech/colossalai - v0.0.1 Colossal-AI Beta Release

Features

Data Parallelism
Pipeline Parallelism (experimental)
1D, 2D, 2.5D, 3D and sequence tensor parallelism
Easy-to-use trainer and engine
Extensibility for user-defined parallelism
Mixed Precision Training
Zero Redundancy Optimizer (ZeRO)

- Python
Published by kurisusnowdeng over 4 years ago