Recent Releases of https://github.com/hpcaitech/colossalai
https://github.com/hpcaitech/colossalai - Version v0.5.0 Release Today!
What's Changed
- [HotFix] update load lora model Readme; by @duanjunwen in https://github.com/hpcaitech/ColossalAI/pull/6240
- Update README.md by @Yanjia0 in https://github.com/hpcaitech/ColossalAI/pull/6268
- [ci] update ci by @flybird11111 in https://github.com/hpcaitech/ColossalAI/pull/6254
- [upgrade]Upgrade transformers by @flybird11111 in https://github.com/hpcaitech/ColossalAI/pull/6320
- [release] release version by @flybird11111 in https://github.com/hpcaitech/ColossalAI/pull/6330
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.9...v0.5.0
- Python
Published by github-actions[bot] 9 months ago
https://github.com/hpcaitech/colossalai - Version v0.4.9 Release Today!
What's Changed
Release
- [release] update version (#6236) by Hongxin Liu
Hotfix
- [hotfix] fix lora load (#6231) by Hongxin Liu
Misc
- [misc] update torch version (#6206) by Hongxin Liu
Chat
- Merge pull request #6208 from hpcaitech/grpo_dev by YeAnbang
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.9...v0.4.8
- Python
Published by github-actions[bot] 12 months ago
https://github.com/hpcaitech/colossalai - Version v0.4.8 Release Today!
What's Changed
Release
- [release] update version (#6195) by Hongxin Liu
Doc
- [doc] DeepSeek V3/R1 news (#6199) by binmakeswell
Application
- [application] add lora sft example data (#6198) by Hongxin Liu
- [application] Update README (#6196) by Tong Li
- [application] add lora sft example (#6192) by Hongxin Liu
Pre-commit.ci
- Add GRPO and Support RLVR for PPO (#6186) by YeAnbang
Checkpointio
- [checkpointio] fix for async io (#6189) by flybird11111
- [checkpointio] fix checkpoint for 3d (#6187) by flybird11111
- [checkpointio] gather tensor before unpad it if the tensor is both padded and distributed (#6168) by Lemon Qin
- [checkpointio] support load-pin overlap (#6177) by Hongxin Liu
Hotfix
- [hotfix] fix zero optim save (#6191) by Hongxin Liu
- [hotfix] fix hybrid checkpointio for sp+dp (#6184) by flybird11111
Shardformer
- [shardformer] support pipeline for deepseek v3 and optimize lora save (#6188) by Hongxin Liu
- [shardformer] support ep for deepseek v3 (#6185) by Hongxin Liu
Ci
- [CI] Cleanup Dist Optim tests with shared helper funcs (#6125) by Wenxuan Tan
Issue template
- [Issue template] Add checkbox asking for details to reproduce error (#6104) by Wenxuan Tan
Inference
- [Inference]Fix example in readme (#6178) by Guangyao Zhang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.8...v0.4.7
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.7 Release Today!
What's Changed
Release
- [release] update version (#6174) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] pre-commit autoupdate (#6113) by pre-commit-ci[bot]
Sharderformer
- [Sharderformer] Support zbv in Sharderformer Policy (#6150) by duanjunwen
Checkpointio
- [checkpointio] support non blocking pin load (#6172) by Hongxin Liu
- [checkpointio]support asyncio for 3d (#6152) by flybird11111
- [checkpointio] fix async io (#6155) by flybird11111
- [checkpointio] support debug log (#6153) by Hongxin Liu
- [checkpointio] fix zero optimizer async save memory (#6151) by Hongxin Liu
- Merge pull request #6149 from ver217/hotfix/ckpt by Wang Binluo
- [checkpointio] disable buffering by ver217
- [checkpointio] fix pinned state dict by ver217
- [checkpointio] fix size compute by ver217
- [checkpointio] fix performance issue (#6139) by Hongxin Liu
- [checkpointio] support async model save (#6131) by Hongxin Liu
News
- [news] release colossalai for sora (#6166) by binmakeswell
Hotfix
- [hotfix] improve compatibility (#6165) by Hongxin Liu
- [Hotfix] hotfix normalization (#6163) by duanjunwen
- [hotfix] fix zero comm buffer init (#6154) by Hongxin Liu
- [hotfix] fix flash attn window_size err (#6132) by duanjunwen
Doc
- [doc] add bonus event (#6164) by binmakeswell
- [doc] update cloud link (#6148) by Sze-qq
- [doc] add hpc cloud intro (#6147) by Sze-qq
Device
- [Device]Support npu (#6159) by flybird11111
Fix
- [fix] fix bug caused by perf version (#6156) by duanjunwen
- [fix] multi-node backward slowdown (#6134) by Hanks
Optim
- [optim] hotfix adam load (#6146) by Hongxin Liu
Zerobubble
- [Zerobubble] merge main. (#6142) by duanjunwen
Async io
- [async io]supoort async io (#6137) by flybird11111
Ckpt
- [ckpt] Add async ckpt api (#6136) by Wang Binluo
Cli
- [cli] support run as module option (#6135) by Hongxin Liu
Zero
- [zero] support extra dp (#6123) by Hongxin Liu
Coati
- [Coati] Refine prompt for better inference (#6117) by Tong Li
Plugin
- [plugin] support getgradnorm (#6115) by Hongxin Liu
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.7...v0.4.6
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.6 Release Today!
What's Changed
Release
- [release] update version (#6109) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] pre-commit autoupdate (#6078) by pre-commit-ci[bot]
Checkpointio
- [checkpointio] fix hybrid plugin model save (#6106) by Hongxin Liu
Mcts
- [MCTS] Add self-refined MCTS (#6098) by Tong Li
Doc
- [doc] sora solution news (#6100) by binmakeswell
Extension
- [extension] hotfix compile check (#6099) by Hongxin Liu
Hotfix
- Merge pull request #6096 from BurkeHulk/hotfix/lora_ckpt by Hanks
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.6...v0.4.5
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.5 Release Today!
What's Changed
Release
- [release] update version (#6094) by Hongxin Liu
Misc
- [misc] fit torch api upgradation and remove legecy import (#6093) by Hongxin Liu
Fp8
- [fp8] add fallback and make compile option configurable (#6092) by Hongxin Liu
Chore
- [chore] refactor by botbw
Ckpt
- [ckpt] add safetensors util by botbw
Pipeline
- [pipeline] hotfix backward for multiple outputs (#6090) by Hongxin Liu
Ring attention
- [Ring Attention] Improve comments (#6085) by Wenxuan Tan
- Merge pull request #6071 from wangbluo/ring_attention by Wang Binluo
Coati
- [Coati] Train DPO using PP (#6054) by Tong Li
Shardformer
- [shardformer] optimize seq parallelism (#6086) by Hongxin Liu
- [shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) by Hongxin Liu
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.5...v0.4.4
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.4 Release Today!
What's Changed
Release
- [release] update version (#6062) by Hongxin Liu
Colossaleval
- [ColossalEval] support for vllm (#6056) by Camille Zhong
Moe
- [moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) by botbw
Sp
- Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
- Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo
Doc
- [doc] FP8 training and communication document (#6050) by Guangyao Zhang
- [doc] update sp doc (#6055) by flybird11111
Fp8
- [fp8] Disable allgather intranode. Disable Redundant allgather fp8 (#6059) by Guangyao Zhang
- [fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
- [fp8] hotfix backward hook (#6053) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Hotfix
- [hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) by botbw
Feature
- [Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.4...v0.4.3
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.3 Release Today!
What's Changed
Release
- [release] update version (#6041) by Hongxin Liu
Fp8
- [fp8] disable alltoall_fp8 in intranode (#6045) by Hanks
- [fp8] fix linear hook (#6046) by Hongxin Liu
- [fp8] optimize all-gather (#6043) by Hongxin Liu
- [FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
- Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
- Merge pull request #6033 from wangbluo/fix by Wang Binluo
- Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
- Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
- [fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
- [fp8] zero support fp8 linear. (#6006) by flybird11111
- [fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
- [fp8]update reduce-scatter test (#6002) by flybird11111
- [fp8] linear perf enhancement by botbw
- [fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
- [fp8] support asynchronous FP8 communication (#5997) by flybird11111
- [fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
- [fp8] support hybrid parallel plugin (#5982) by Wang Binluo
- [fp8]Moe support fp8 communication (#5977) by flybird11111
- [fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
- [fp8] support gemini plugin (#5978) by Hongxin Liu
- [fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
- [fp8] add fp8 linear (#5967) by Hongxin Liu
- [fp8]support all2all fp8 (#5953) by flybird11111
- [FP8] rebase main (#5963) by flybird11111
- Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
- [fp8] add fp8 comm for low level zero by ver217
Hotfix
- [Hotfix] Remove deprecated install (#6042) by Tong Li
- [Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
- [Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
- [Hotfix] README link (#5966) by Tong Li
- [hotfix] Remove unused plan section (#5957) by Tong Li
Colossalai/checkpoint_io/...
- [colossalai/checkpointio/...] fix bug in loadstatedictinto_model; format error msg (#6020) by Gao, Ruiyuan
Colossal-llama
- [Colossal-LLaMA] Refactor latest APIs (#6030) by Tong Li
Plugin
- [plugin] hotfix zero plugin (#6036) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) by Hongxin Liu
Ci
- [CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Colossalchat
- [ColossalChat] Add PP support (#6001) by Tong Li
Misc
- [misc] Use dist logger in plugins (#6011) by Edenzzzz
- [misc] update compatibility (#6008) by Hongxin Liu
- [misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
- [misc] remove useless condition by haze188
- [misc] fix ci failure: change default value to false in moe plugin by haze188
- [misc] remove incompatible test config by haze188
- [misc] remove debug/print code by haze188
- [misc] skip redunant test by haze188
- [misc] solve booster hang by rename the variable by haze188
Feature
- [Feature] Zigzag Ring attention (#5905) by Edenzzzz
- [Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
- [Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
- [Feature] MoE Ulysses Support (#5918) by Haze188
Chat
- [Chat] fix readme (#5989) by YeAnbang
- Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
- [Chat] Fix lora (#5946) by YeAnbang
Test ci
- [test ci]Feature/fp8 comm (#5981) by flybird11111
Docs
- [Docs] clarify launch port by Edenzzzz
Test
- [test] add zero fp8 test case by ver217
- [test] add check by hxwang
- [test] fix test: testzero12 by hxwang
- [test] add mixtral modelling test by botbw
- [test] pass mixtral shardformer test by botbw
- [test] mixtra pp shard test by hxwang
- [test] add mixtral transformer test by hxwang
- [test] add mixtral for sequence classification by hxwang
Lora
- [lora] lora support hybrid parallel plugin (#5956) by Wang Binluo
Feat
- [feat] Dist Loader for Eval (#5950) by Tong Li
Chore
- [chore] remove redundant test case, print string & reduce test tokens by botbw
- [chore] docstring by hxwang
- [chore] change moepgmesh to private by hxwang
- [chore] solve moe ckpt test failure and some other arg pass failure by hxwang
- [chore] minor fix after rebase by hxwang
- [chore] minor fix by hxwang
- [chore] arg pass & remove drop token by hxwang
- [chore] trivial fix by botbw
- [chore] manually revert unintended commit by botbw
- [chore] handle non member group by hxwang
Moe
- [moe] solve dp axis issue by botbw
- [moe] remove forceoverlapcomm flag and add warning instead by hxwang
- Revert "[moe] implement submesh initialization" by hxwang
- [moe] refactor mesh assignment by hxwang
- [moe] deepseek moe sp support by haze188
- [moe] remove ops by hxwang
- [moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
- [moe] finalize test (no pp) by hxwang
- [moe] init moe plugin comm setting with sp by hxwang
- [moe] clean legacy code by hxwang
- [moe] test deepseek by hxwang
- [moe] implement tp by botbw
- [moe] add mixtral dp grad scaling when not all experts are activated by botbw
- [moe] implement submesh initialization by botbw
- [moe] implement transit between non moe tp and ep by botbw
- [moe] fix plugin by hxwang
Doc
- [doc] add MoeHybridParallelPlugin docstring by botbw
Deepseek
- [deepseek] replace attn (a workaround for bug in transformers) by hxwang
Bug
- [bug] fix: somehow logger hangs the program by botbw
Zero
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.3...v0.4.2
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.2 Release Today!
What's Changed
Release
- [release] update version (#5952) by Hongxin Liu
Zero
- [zero] hotfix update master params (#5951) by Hongxin Liu
Feat
- [Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) by Runyu Lu
Shardformer
- [shardformer] hotfix attn mask (#5947) by Hongxin Liu
- [shardformer] hotfix attn mask (#5945) by Hongxin Liu
Chat
- Merge pull request #5922 from hpcaitech/kto by YeAnbang
Feature
- [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua
Hotfix
- [Hotfix] Fix ZeRO typo #5936 by Edenzzzz
Fix bug
- [FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
- [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua
Colossalchat
- [ColossalChat] Hotfix for ColossalChat (#5910) by Tong Li
Examples
- [Examples] Add lazy init to OPT and GPT examples (#5924) by Edenzzzz
Plugin
- [plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.2...v0.4.1
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.1 Release Today!
What's Changed
Release
- [release] update version (#5912) by Hongxin Liu
Misc
- [misc] support torch2.3 (#5893) by Hongxin Liu
Compatibility
- [compatibility] support torch 2.2 (#5875) by Guangyao Zhang
Chat
- Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
- Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang
Shardformer
- [ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
- [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
- [shardformer] DeepseekMoE support (#5871) by Haze188
- [shardformer] fix the moe (#5883) by Wang Binluo
- [Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
- [shardformer]delete xformers (#5859) by flybird11111
Auto parallel
- [Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö
Zero
- [zero] support all-gather overlap (#5898) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]
Feature
- [Feature] Enable PP + SP for llama (#5868) by Edenzzzz
Hotfix
- [HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
- [Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
- [hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188
Feat
- [Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) by Runyu Lu
Hoxfix
- [Hoxfix] Fix CUDADEVICEMAX_CONNECTIONS for comm overlap by Edenzzzz
Quant
- [quant] fix bitsandbytes version check (#5882) by Hongxin Liu
Doc
- [doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz
Moe/zero
- [MoE/ZeRO] Moe refactor with zero refactor (#5821) by Haze188
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.1...v0.4.0
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.4.0 Release Today!
What's Changed
Release
- [release] update version (#5864) by Hongxin Liu
Inference
- [Inference]Lazy Init Support (#5785) by Runyu Lu
Shardformer
- [shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang
Zero
- [zero] use bucket during allgather (#5860) by Hongxin Liu
Gemini
Feature
- [Feature] optimize PP overlap (#5735) by Edenzzzz
Doc
- [doc] add GPU cloud playground (#5851) by binmakeswell
- [doc] fix open sora model weight link (#5848) by binmakeswell
- [doc] opensora v1.2 news (#5846) by binmakeswell
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.4.0...v0.3.9
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.3.9 Release Today!
What's Changed
Release
- [release] update version (#5833) by Hongxin Liu
Fix
- [Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao
Shardformer
- [shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
- Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
- [shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
- [shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
- [shardformer] fix import (#5788) by Hongxin Liu
Devops
- [devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
- [devops] fix docker ci (#5780) by Hongxin Liu
Launch
- [launch] Support IPv4 host initialization in launch (#5822) by Kai Lv
Misc
- [misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
- [misc] update requirements (#5787) by Hongxin Liu
- [misc] fix dist logger (#5782) by Hongxin Liu
- [misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
- [misc] update dockerfile (#5776) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Gemini
- [gemini] quick fix on possible async operation (#5803) by botbw
- [Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
- [gemini] optimize reduce scatter d2h copy (#5760) by botbw
Inference
- [Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
- [Inference]refactor baichuan (#5791) by Runyu Lu
- Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
- [Inference]Add Streaming LLM (#5745) by yuehuayingxueluo
Test
- [test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
- [test] fix chatglm test kit (#5793) by Hongxin Liu
- [test] Fix/fix testcase (#5770) by duanjunwen
Colossalchat
- Merge pull request #5759 from hpcaitech/colossalchat_upgrade by YeAnbang
Install
- [install]fix setup (#5786) by flybird11111
Hotfix
- [hotfix] fix testcase in testfx/testtracer (#5779) by duanjunwen
- [hotfix] fix llama flash attention forward (#5777) by flybird11111
- [Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao
Test/ci
- [Test/CI] remove test cases to reduce CI duration (#5753) by botbw
Ci/tests
- [CI/tests] simplify some test case to reduce testing time (#5755) by Haze188
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.9...v0.3.8
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.3.8 Release Today!
What's Changed
Release
- [release] update version (#5752) by Hongxin Liu
Fix/example
- [Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao
Gemini
- Merge pull request #5749 from hpcaitech/prefetch by botbw
- Merge pull request #5754 from Hz188/prefetch by botbw
- [Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
- [gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
- Merge pull request #5733 from Hz188/feature/prefetch by botbw
- Merge pull request #5731 from botbw/prefetch by botbw
- [gemini] init auto policy prefetch by hxwang
- Merge pull request #5722 from botbw/prefetch by botbw
- [gemini] maxprefetch means maximum work to keep by hxwang
- [gemini] use compute_chunk to find next chunk by hxwang
- [gemini] prefetch chunks by hxwang
- [gemini]remove registered gradients hooks (#5696) by flybird11111
Chore
- [chore] refactor profiler utils by hxwang
- [chore] remove unnecessary assert since compute list might not be recorded by hxwang
- [chore] remove unnecessary test & changes by hxwang
- Merge pull request #5738 from botbw/prefetch by Haze188
- [chore] fix init error by hxwang
- [chore] Update placement_policy.py by botbw
- [chore] remove debugging info by hxwang
- [chore] remove print by hxwang
- [chore] refactor & sync by hxwang
- [chore] sync by hxwang
Bug
- [bug] continue fix by hxwang
- [bug] workaround for idx fix by hxwang
- [bug] fix early return (#5740) by botbw
Bugs
- [bugs] fix args.profile=False DummyProfiler errro by genghaozhe
Inference
- [inference] Fix running time of testcontinuousbatching (#5750) by Yuanheng Zhao
- [Inference]Fix readme and example for API server (#5742) by Jianghai
- [inference] release (#5747) by binmakeswell
- [Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
- [Inference] Fix API server, test and example (#5712) by Jianghai
- [Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
- [Inference]Adapt repetitionpenalty and norepeatngramsize (#5708) by yuehuayingxueluo
- [Inference] Add example test_ci script by CjhHa1
- [Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
- [Inference] resolve rebase conflicts by CjhHa1
- [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
- [Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
- [Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
- [Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
- [Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
- [Inference] Fix quant bits order (#5681) by 傅剑寒
- [inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
- [Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo
Feature
- [Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
- [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
- Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
- [Feature] qlora support (#5586) by linsj20
Example
- [example] add profile util for llama by hxwang
- [example] Update Inference Example (#5725) by Yuanheng Zhao
Colossal-inference
- Colossal-Inference Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao
Nfc
- [NFC] fix requirements (#5744) by Yuanheng Zhao
- [NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao
Ci
- [ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
- [ci] Fix example tests (#5714) by Yuanheng Zhao
Sync
- Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
- [sync] Sync feature/colossal-infer with main by Yuanheng Zhao
- [Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
- [sync] resolve conflicts of merging main by Yuanheng Zhao
Shardformer
- [Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
- [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
- [Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
- Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
- [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
- [shardformer] support biasgelujit_fused for models (#5647) by flybird11111
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Doc
- [doc] Update Inference Readme (#5736) by Yuanheng Zhao
Fix/inference
- [Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao
Lazy
- [lazy] fix lazy cls init (#5720) by flybird11111
Misc
- [misc] Update PyTorch version in docs (#5724) by binmakeswell
- [misc] Update PyTorch version in docs (#5711) by Edenzzzz
- [misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
- [misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu
Colossal-llama
- [Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li
Fix
- [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
- [Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
- [Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao
Feat
- [Feat]Inference RPC Server Support (#5705) by Runyu Lu
Hotfix
- [hotfix] fix inference typo (#5438) by hugo-syn
- [hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
- [hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) by Yuanheng Zhao
Inference/feat
- [Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) by 傅剑寒
- [Inference/Feat] Add quant kvcache interface (#5700) by 傅剑寒
- [Inference/Feat] Add quant kvcache support for decodekvcache_memcpy (#5686) by 傅剑寒
- [Inference/Feat] Add kvcache quant support for fusedrotaryembeddingcachecopy (#5680) by 傅剑寒
- [Inference/Feat] Feat quant kvcache step2 (#5674) by 傅剑寒
Online server
- [Online Server] Chat Api for streaming and not streaming response (#5470) by Jianghai
Zero
- [zero]remove registered gradients hooks (#5687) by flybird11111
Kernel
- [kernel] Support New KCache Layout - Triton Kernel (#5677) by Yuanheng Zhao
Inference/kernel
- [Inference/Kernel] refactor kvcache manager and rotaryembedding and kvcachememcpy oper… (#5663) by Steve Luo
Lowlevelzero
- [LowLevelZero] low level zero support lora (#5153) by flybird11111
Lora
- [lora] add lora APIs for booster, support lora for TorchDDP (#4981) by Baizhou Zhang
Devops
- [devops] fix release docker ci (#5665) by Hongxin Liu
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.8...v0.3.7
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/hpcaitech/colossalai - Version v0.3.7 Release Today!
What's Changed
Release
- [release] update version (#5654) by Hongxin Liu
- [release] grok-1 inference benchmark (#5500) by binmakeswell
- [release] grok-1 314b inference (#5490) by binmakeswell
Hotfix
- [hotfix] add soft link to support required files (#5661) by Tong Li
- [hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
- [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
- [hotfix] fix typo s/getdefualtparser /getdefaultparser (#5548) by digger yu
- [hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
- [hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
- [hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu
News
- [news] llama3 and open-sora v1.1 (#5655) by binmakeswell
Lazyinit
- [lazyinit] skip whisper test (#5653) by Hongxin Liu
Shardformer
- [shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
- [shardformer] fix chatglm implementation (#5644) by Hongxin Liu
- [shardformer] remove useless code (#5645) by flybird11111
- [shardformer] update transformers (#5583) by Wang Binluo
- [shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
- [shardformer] refactor embedding resize (#5603) by flybird11111
- [shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
- [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
- [shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
- [shardformer]Fix lm parallel. (#5480) by flybird11111
- [shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111
Fix
- [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
- [fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
- [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
- [fix] fix grok-1 example typo (#5506) by Yuanheng Zhao
Coloattention
- [coloattention]modify coloattention (#5627) by flybird11111
Example
- [example] llama3 (#5631) by binmakeswell
- [example] update Grok-1 inference (#5495) by Yuanheng Zhao
- [example] add grok-1 inference (#5485) by Hongxin Liu
Exampe
- [exampe] update llama example (#5626) by Hongxin Liu
Feature
- [Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li
Zero
- [zero] support multiple (partial) backward passes (#5596) by Hongxin Liu
Doc
- [doc] fix ColossalMoE readme (#5599) by Camille Zhong
- [doc] update open-sora demo (#5479) by binmakeswell
- [doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell
Devops
- [devops] remove post commit ci (#5566) by Hongxin Liu
- [devops] fix example test ci (#5504) by Hongxin Liu
- [devops] fix compatibility (#5444) by Hongxin Liu
Shardformer, pipeline
- [shardformer, pipeline] add
gradient_checkpointing_ratioand heterogenous shard policy for llama (#5508) by Wenhao Chen
Colossalchat
- [ColossalChat] Update RLHF V2 (#5286) by YeAnbang
Format
- [format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.7...v0.3.6
- Python
Published by github-actions[bot] almost 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.6 Release Today!
What's Changed
Release
- [release] update version (#5411) by Hongxin Liu
Colossal-llama2
- [colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong
Hotfix
- [hotfix] fix stable diffusion inference bug. (#5289) by Youngon
- [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
- [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
- [hotfix] fix typo change _descrption to _description (#5331) by digger yu
- [hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
- [hotfix] fix sd vit import error (#5420) by MickeyCHAN
- [hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
- [hotfix] fix variable type for top_p (#5313) by CZYCW
Doc
- [doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
- [doc] update some translations with README-zh-Hans.md (#5382) by digger yu
- [doc] sora release (#5425) by binmakeswell
- [doc] fix blog link by binmakeswell
- [doc] fix blog link by binmakeswell
- [doc] updated installation command (#5389) by Frank Lee
- [doc] Fix typo (#5361) by yixiaoer
Eval-hotfix
- [eval-hotfix] set fewshotdata to None when few shot is disabled (#5422) by Dongruixuan Li
Devops
- [devops] fix extention building (#5427) by Hongxin Liu
Example
- [example]add gpt2 benchmark example script. (#5295) by flybird11111
- [example] reuse flash attn patch (#5400) by Hongxin Liu
Workflow
- [workflow] added pypi channel (#5412) by Frank Lee
Shardformer
- [shardformer]gather llama logits (#5398) by flybird11111
Setup
- [setup] fixed nightly release (#5388) by Frank Lee
Fsdp
- [fsdp] impl save/load shard model/optimizer (#5357) by QinLuo
Extension
- [extension] hotfix jit extension setup (#5402) by Hongxin Liu
Llama
- [llama] fix training and inference scripts (#5384) by Hongxin Liu
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.6...v0.3.5
- Python
Published by github-actions[bot] almost 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.5 Release Today!
What's Changed
Release
- [release] update version (#5380) by Hongxin Liu
Llama
- Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
- [llama] fix memory issue (#5371) by Hongxin Liu
- [llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
- [llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
- [llama] add flash attn patch for npu (#5362) by Hongxin Liu
- [llama] update training script (#5360) by Hongxin Liu
- [llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu
Moe
- [moe] fix tests by ver217
- [moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
- [moe] fix mixtral forward default value (#5329) by Hongxin Liu
- [moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
- [moe] support mixtral (#5309) by Hongxin Liu
- [moe] update capacity computing (#5253) by Hongxin Liu
- [moe] init mixtral impl by Xuanlei Zhao
- [moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
- [moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
- [moe] merge moe into main (#4978) by Xuanlei Zhao
Lr-scheduler
- [lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu
Eval
- [eval] update llama npu eval (#5366) by Camille Zhong
Gemini
- [gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
- [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
- [gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
- [gemini] gemini support extra-dp (#5043) by flybird11111
- [gemini] gemini support tensor parallelism. (#4942) by flybird11111
Fix
- [fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen
Checkpointio
- [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu
Chat
- [Chat] fix sft loss nan (#5345) by YeAnbang
Extension
- [extension] fixed exception catch (#5342) by Frank Lee
Doc
- [doc] added docs for extensions (#5324) by Frank Lee
- [doc] add llama2-13B disyplay (#5285) by Desperado-Jia
- [doc] fix doc typo (#5256) by binmakeswell
- [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
- [doc] SwiftInfer release (#5236) by binmakeswell
- [doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
- [doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
- [doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
- [doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
- [doc] update pytorch version in documents. (#5177) by flybird11111
- [doc] fix colossalqa document (#5146) by Michelle
- [doc] updated paper citation (#5131) by Frank Lee
- [doc] add moe news (#5128) by binmakeswell
Tests
- [tests] fix t5 test. (#5322) by flybird11111
Accelerator
- Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
- [accelerator] fixed npu api by FrankLeeeee
- [accelerator] init the accelerator module (#5129) by Frank Lee
Workflow
- [workflow] updated CI image (#5318) by Frank Lee
- [workflow] fixed oom tests (#5275) by Frank Lee
- [workflow] fixed incomplete bash command (#5272) by Frank Lee
- [workflow] fixed build CI (#5240) by Frank Lee
Feat
- [feat] refactored extension module (#5298) by Frank Lee
Nfc
- [NFC] polish applications/Colossal-LLaMA-2/colossalllama2/tokenizer/inittokenizer.py code style (#5228) by 李文军
- [nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
- [nfc] fix typo change directoty to directory (#5111) by digger yu
- [nfc] fix typo and author name (#5089) by digger yu
- [nfc] fix typo in docs/ (#4972) by digger yu
Hotfix
- [hotfix] fix 3d plugin test (#5292) by Hongxin Liu
- [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
- [hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
- [hotfix] removed unused flag (#5242) by Frank Lee
- [hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
- [Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
- [hotfix]: modify createephierarchical_group and add test (#5032) by Wenhao Chen
- [hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
- [hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
- [hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang
Sync
- Merge pull request #5278 from ver217/sync/npu by Frank Lee
Shardformer
- [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
- [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
- [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
- [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
- [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
- [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao
Ci
- [ci] fix testhybridparallelplugincheckpoint_io.py (#5276) by flybird11111
- [ci] fix shardformer tests. (#5255) by flybird11111
- [ci] fixed ddp test (#5254) by Frank Lee
- [ci] fixed booster test (#5251) by Frank Lee
Npu
- [npu] change device to accelerator api (#5239) by Hongxin Liu
- [npu] use extension for op builder (#5172) by Xuanlei Zhao
- [npu] support triangle attention for llama (#5130) by Xuanlei Zhao
- [npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
- [npu] add npu support for gemini and zero (#5067) by Hongxin Liu
Pipeline
- [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
- [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
- [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
- [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen
Format
- [format] applied code formatting on changed files in pull request 5234 (#5235) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5115 (#5118) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5124 (#5125) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5088 (#5127) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5067 (#5072) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4926 (#5007) by github-actions[bot]
Colossal-llama-2
- [Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
- [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen
Devops
- [devops] update torch versoin in ci (#5217) by Hongxin Liu
Colossaleval
- [ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen
Colossalqa
- [colossalqa] fix pangu api (#5170) by Michelle
- [ColossalQA] refactor server and webui & add new feature (#5138) by Michelle
Plugin
- [plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111
Feature
- [FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) by Zian(Andy) Zheng
- [Feature] Add document retrieval QA (#5020) by YeAnbang
Inference
- [inference] refactor examples and fix schedule (#5077) by Hongxin Liu
- [inference] update examples and engine (#5073) by Xu Kai
- [inference] Refactor inference architecture (#5057) by Xu Kai
- [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai
Hotfix/hybridengine
- [hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
- [hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia
Misc
- [misc] remove outdated submodule (#5070) by Hongxin Liu
- [misc] add code owners (#5024) by Hongxin Liu
Kernels
- [Kernels]added flash-decoidng of triton (#5063) by Cuiqing Li (李崔卿)
- [Kernels]Update triton kernels into 2.1.0 (#5046) by Cuiqing Li (李崔卿)
Exampe
- [exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111
Pipeline,shardformer
- [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when
strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.5...v0.3.4
- Python
Published by github-actions[bot] almost 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.4 Release Today!
What's Changed
Release
- [release] update version (#4995) by Hongxin Liu
Pipeline inference
- [Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
- [Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
- [Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia
Doc
- [doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
- [doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
- Merge pull request #4889 from ppt0011/main by ppt0011
- [doc] add reminder for issue encountered with hybrid adam by ppt0011
- [doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
- Merge pull request #4858 from Shawlleyw/main by ppt0011
- [doc] update slack link (#4823) by binmakeswell
- [doc] add lazy init docs (#4808) by Hongxin Liu
- Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
- [doc] polish shardformer doc (#4779) by Baizhou Zhang
- [doc] add llama2 domain-specific solution news (#4789) by binmakeswell
Hotfix
- [hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
- [hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
- [hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
- [hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
- [hotfix] fix bug in sequence parallel test (#4887) by littsk
- [hotfix] Correct several erroneous code comments (#4794) by littsk
- [hotfix] fix norm type error in zero optimizer (#4795) by littsk
- [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing
Kernels
- [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li
Inference
- [Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
- [Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
- [inference] add reference and fix some bugs (#4937) by Xu Kai
- [inference] Add smmoothquant for llama (#4904) by Xu Kai
- [inference] add llama2 support (#4898) by Xu Kai
- [inference]fix import bug and delete down useless init (#4830) by Jianghai
Test
- [test] merge old components to test to model zoo (#4945) by Hongxin Liu
- [test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
- Merge pull request #4856 from KKZ20/test/modelsupportforlowlevel_zero by ppt0011
- [test] modify model supporting part of lowlevelzero plugin (including correspoding docs) by Zhongkai Zhao
Refactor
- [Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li
Nfc
- [nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
- [nfc] fix minor typo in README (#4846) by Blagoy Simandoff
- [NFC] polish code style (#4799) by Camille Zhong
- [NFC] polish colossalai/inference/quant/gptq/caigptq/init_.py code style (#4792) by Michelle
Format
- [format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]
Gemini
- [gemini] support gradient accumulation (#4869) by Baizhou Zhang
- [gemini] support amp o3 for gemini (#4872) by Hongxin Liu
Kernel
- [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu
Feature
- [feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
- [feature] Add clipgradnorm for hybridparallelplugin (#4837) by littsk
- [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen
Checkpointio
- [checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
- [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang
Infer
- [infer] fix test bug (#4838) by Xu Kai
- [Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
- [Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao
Chat
- [chat] fix gemini strategy (#4698) by flybird11111
Misc
- [misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu
Lazy
- [lazy] support from_pretrained (#4801) by Hongxin Liu
Fix
- [fix] fix weekly runing example (#4787) by flybird11111
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.4...v0.3.3
- Python
Published by github-actions[bot] over 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.3 Release Today!
What's Changed
Release
- [release] update version (#4775) by Hongxin Liu
Inference
- [inference] chatglm2 infer demo (#4724) by Jianghai
Feature
- [feature] add gptq for inference (#4754) by Xu Kai
- [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li
Bug
- [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
- [bug] fix getdefaultparser in examples (#4764) by Baizhou Zhang
Lazy
- [lazy] support torch 2.0 (#4763) by Hongxin Liu
Chat
- [chat]: add lora merge weights config (#4766) by Wenhao Chen
- [chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen
Doc
- [doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
- [doc] clean up outdated docs (#4765) by Hongxin Liu
- Merge pull request #4757 from ppt0011/main by ppt0011
- [doc] put native colossalai plugins first in description section by Pengtai Xu
- [doc] add model examples for each plugin by Pengtai Xu
- [doc] put individual plugin explanation in front by Pengtai Xu
- [doc] explain suitable use case for each plugin by Pengtai Xu
- [doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
- [doc] polish shardformer doc (#4735) by Baizhou Zhang
- [doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
- [doc] Add user document for Shardformer (#4702) by Baizhou Zhang
- [doc] fix llama2 code link (#4726) by binmakeswell
- [doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
- [doc] Update booster user documents. (#4669) by Baizhou Zhang
Shardformer
- [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
- [shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
- [shardformer] update seq parallel document (#4730) by Bin Jia
- [shardformer] update pipeline parallel document (#4725) by flybird11111
- [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
- [shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
- [shardformer] update shardformer readme (#4689) by flybird11111
- [shardformer]fix gpt2 double head (#4663) by flybird11111
- [shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
- [shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242
Misc
- [misc] update pre-commit and run all files (#4752) by Hongxin Liu
Format
- [format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]
Legacy
- [legacy] clean up legacy code (#4743) by Hongxin Liu
- Merge pull request #4738 from ppt0011/main by ppt0011
- [legacy] remove deterministic data loader test by Pengtai Xu
- [legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu
Kernel
- [kernel] update triton init #4740 (#4740) by Xuanlei Zhao
Example
- [example] llama2 add fine-tune example (#4673) by flybird11111
- [example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
- [example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang
Hotfix
- [hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
- [hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang
Devops
- [devops] fix concurrency group (#4667) by Hongxin Liu
- [devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu
Pipeline
- [pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.3...v0.3.2
- Python
Published by github-actions[bot] over 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.2 Release Today!
What's Changed
Release
- [release] update version (#4623) by Hongxin Liu
Shardformer
- Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
- [shardformer] update shardformer readme (#4617) by flybird11111
- [shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
- [shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
- [shardformer] Pytree fix (#4533) by Jianghai
- [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
- [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
- [shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
- [shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
- [shardformer] fix opt test hanging (#4521) by flybird11111
- [shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
- [shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
- [shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
- [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
- [shardformer] opt fix. (#4514) by flybird11111
- [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
- [shardformer] tests for 3d parallel (#4493) by Jianghai
- [shardformer] chatglm support sequence parallel (#4482) by flybird11111
- [shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
- [shardformer] Pipeline/whisper (#4456) by Jianghai
- [shardformer] bert support sequence parallel. (#4455) by flybird11111
- [shardformer] bloom support sequence parallel (#4465) by flybird11111
- [shardformer] support interleaved pipeline (#4448) by LuGY
- [shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
- [shardformer] fix import by ver217
- [shardformer] fix embedding by ver217
- [shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
- [shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
- [shardformer] update tests for all optimization (#4413) by flybird11111
- [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
- [shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
- [shardformer] test all optimizations (#4399) by flybird1111
- [shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
- [Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
- [shardformer] add util functions for shardformer tests/fix syncsharedparam (#4366) by Baizhou Zhang
- [shardformer] support Blip2 (#4243) by FoolPlayer
- [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
- [shardformer] pre-commit check files by klhhhhh
- [shardformer] register without auto policy by klhhhhh
- [shardformer] ChatGLM support layernorm sharding by klhhhhh
- [shardformer] delete some file by klhhhhh
- [shardformer] support chatglm without layernorm by klhhhhh
- [shardformer] polish code by klhhhhh
- [shardformer] polish chatglm code by klhhhhh
- [shardformer] add test kit in model zoo for chatglm by klhhhhh
- [shardformer] vit test finish and support by klhhhhh
- [shardformer] added tests by klhhhhh
- Feature/chatglm (#4240) by Kun Lin
- [shardformer] support whisper (#4212) by FoolPlayer
- [shardformer] support SAM (#4231) by FoolPlayer
- Feature/vit support (#4182) by Kun Lin
- [shardformer] support pipeline base vit model (#4284) by FoolPlayer
- [shardformer] support inplace sharding (#4251) by Hongxin Liu
- [shardformer] fix base policy (#4229) by Hongxin Liu
- [shardformer] support lazy init (#4202) by Hongxin Liu
- [shardformer] fix type hint by ver217
- [shardformer] rename policy file name by ver217
Legacy
- [legacy] move builder and registry to legacy (#4603) by Hongxin Liu
- [legacy] move engine to legacy (#4560) by Hongxin Liu
- [legacy] move trainer to legacy (#4545) by Hongxin Liu
Test
- [test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
- [test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
- [test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
- [test] skip some not compatible models by FoolPlayer
- [test] add shard util tests by ver217
- [test] update shardformer tests by ver217
- [test] remove useless tests (#4359) by Hongxin Liu
Zero
- [zero] hotfix master param sync (#4618) by Hongxin Liu
- [zero]fix zero ckptIO with offload (#4529) by LuGY
- [zero]support zero2 with gradient accumulation (#4511) by LuGY
Checkpointio
- [checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
- [checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu
Coati
- Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
- Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
- [coati] update ci by ver217
- [coati] add chatglm model (#4539) by yingliu-hpc
Doc
- [doc] add llama2 benchmark (#4604) by binmakeswell
- [DOC] hotfix/llama2news (#4595) by binmakeswell
- [doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
- [doc] update Coati README (#4405) by Wenhao Chen
- [doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
- [doc] Fix gradient accumulation doc. (#4349) by flybird1111
Pipeline
- [pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
- [pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
- [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
- [pipeline] add chatglm (#4363) by Jianghai
- [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
- [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
- [pipeline] add unit test for 1f1b (#4303) by LuGY
- [pipeline] fix returndict/fix purepipeline_test (#4331) by Baizhou Zhang
- [pipeline] add pipeline support for all T5 models (#4310) by Baizhou Zhang
- [pipeline] test pure pipeline process using llama (#4218) by Jianghai
- [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) by Baizhou Zhang
- [pipeline] reformat for unified design (#4283) by Jianghai
- [pipeline] OPT model pipeline (#4258) by Jianghai
- [pipeline] refactor gpt2 pipeline forwards (#4287) by Baizhou Zhang
- [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) by Baizhou Zhang
- [pipeline] finish bloom models pipeline and tests (#4223) by Jianghai
- [pipeline] All bert models (#4233) by Jianghai
- [pipeline] add pipeline forward for variants of gpt2 (#4238) by Baizhou Zhang
- [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) by Baizhou Zhang
- [pipeline] add bloom model pipeline (#4210) by Jianghai
- [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) by Jianghai
- [pipeline] Llama pipeline (#4205) by Jianghai
- [pipeline] Bert pipeline for shardformer and its tests (#4197) by Jianghai
- [pipeline] move bert related pipeline components to shardformer (#4187) by Jianghai
- [pipeline] add bertforpretraining bert_lmhead forward and policy (#4172) by Jianghai
- [pipeline] update shardformer docstring by ver217
- [pipeline] update shardformer policy by ver217
- [pipeline] build bloom model and policy , revise the base class of policy (#4161) by Jianghai
- [pipeline]add pipeline policy and bert forward (#4130) by Jianghai
- [pipeline] add stage manager (#4093) by Hongxin Liu
- [pipeline]add pipeline policy and bert forward (#4130) by Jianghai
- [pipeline] refactor 1f1b schedule (#4115) by Hongxin Liu
- [pipeline] implement p2p communication (#4100) by Hongxin Liu
- [pipeline] add stage manager (#4093) by Hongxin Liu
Fix
- [Fix] Fix compile error (#4357) by Mashiro
- [fix] coloattention support flash attention 2 (#4347) by flybird1111
Devops
- [devops] cancel previous runs in the PR (#4546) by Hongxin Liu
- [devops] add large-scale distributed test marker (#4452) by Hongxin Liu
Example
- [example] change accelerate version (#4431) by Tian Siyuan
- [example] update streamlit 0.73.1 to 1.11.1 (#4386) by ChengDaqi2023
- [example] add llama2 example (#4527) by Hongxin Liu
Shardformer/fix overlap bug
- [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) by Bin Jia
Format
- [format] applied code formatting on changed files in pull request 4479 (#4504) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4441 (#4445) by github-actions[bot]
Gemini
- [gemini] improve compatibility and add static placement policy (#4479) by Hongxin Liu
- [gemini] fix tensor storage cleaning in state dict collection (#4396) by Baizhou Zhang
Shardformer/sequence parallel
- [shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488) by Bin Jia
- [shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) by Bin Jia
- [shardformer/sequence parallel] Cherry pick commit to new branch (#4450) by Bin Jia
Chat
- [chat] update config and prompt (#4139) by Michelle
- [chat] fix bugs and add unit tests (#4213) by Wenhao Chen
Misc
- [misc] update requirements by ver217
- [misc] resolve code factor issues (#4433) by Hongxin Liu
Sharformer
- [sharformer] add first version of policy of chatglm by klhhhhh
Hotfix
- [hotfix] fix gemini and zero test (#4333) by Hongxin Liu
- [hotfix] fix opt pipeline (#4293) by Jianghai
- [hotfix] fix unsafe async comm in zero (#4404) by LuGY
- [hotfix] update gradio 3.11 to 3.34.0 (#4329) by caption
Plugin
- [plugin] add 3d parallel plugin (#4295) by Hongxin Liu
Bugs
- [bugs] hot fix some testing bugs for new models (#4268) by Jianghai
Cluster
- [cluster] add process group mesh (#4039) by Hongxin Liu
Kernel
- [kernel] updated unittests for coloattention (#4389) by flybird1111
Coloattention
- [coloattention] fix import error (#4380) by flybird1111
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.2...v0.3.1
- Python
Published by github-actions[bot] over 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.1 Release Today!
What's Changed
Release
- [release] update version (#4332) by Hongxin Liu
Chat
- [chat] fix computeapproxkl (#4338) by Wenhao Chen
- [chat] removed cache file (#4155) by Frank Lee
- [chat] use official transformers and fix some issues (#4117) by Wenhao Chen
- [chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
- [chat] refactor trainer class (#4080) by Wenhao Chen
- [chat]: fix chat evaluation possible bug (#4064) by Michelle
- [chat] refactor strategy class with booster api (#3987) by Wenhao Chen
- [chat] refactor actor class (#3968) by Wenhao Chen
- [chat] add distributed PPO trainer (#3740) by Hongxin Liu
Zero
- [zero] optimize the optimizer step time (#4221) by LuGY
- [zero] support shard optimizer state dict of zero (#4194) by LuGY
- [zero] add state dict for low level zero (#4179) by LuGY
- [zero] allow passing process group to zero12 (#4153) by LuGY
- [zero]support no_sync method for zero1 plugin (#4138) by LuGY
- [zero] refactor low level zero for shard evenly (#4030) by LuGY
Nfc
- [NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
- [NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
- [NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
- [NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
- [NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
- [NFC] polish applications/Chat/examples/trainrewardmodel.py code style (#4271) by Xu Kai
- [NFC] fix: format (#4270) by dayellow
- [NFC] polish runtimepreparationpass style (#4266) by Wenhao Chen
- [NFC] polish unaryelementwisegenerator.py code style (#4267) by YeAnbang
- [NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
- [NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
- [NFC] polish colossalai/booster/plugin/lowlevelzero_plugin.py code style (#4256) by 梁爽
- [NFC] polish colossalai/autoparallel/offload/ampoptimizer.py code style (#4255) by Yanjia0
- [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocdwithnaming
- [NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
- [NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
- [NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
- [NFC] Fix format for mixed precision (#4253) by Jianghai
- [nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
- [nfc] fix dim not defined and fix typo (#3991) by digger yu
- [nfc] fix typo colossalai/zero (#3923) by digger yu
- [nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
- [nfc] fix typo colossalai/nn (#3887) by digger yu
- [nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu
Example
- Fix/format (#4261) by Michelle
- [example] add llama pretraining (#4257) by binmakeswell
- [example] fix bucket size in example of gpt gemini (#4028) by LuGY
- [example] update ViT example using booster api (#3940) by Baizhou Zhang
- Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
- [example] update opt example using booster api (#3918) by Baizhou Zhang
- [example] Modify palm example with the new booster API (#3913) by Liu Ziming
- [example] update gemini examples (#3868) by jiangmingyan
Ci
- [ci] support testmon core pkg change detection (#4305) by Hongxin Liu
Checkpointio
- [checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
- Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
- [checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
- [checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang
Lazy
- [lazy] support init on cuda (#4269) by Hongxin Liu
- [lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
- [lazy] refactor lazy init (#3891) by Hongxin Liu
Kernels
- [Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li
Docker
- [docker] fixed ninja build command (#4203) by Frank Lee
- [docker] added ssh and rdma support for docker (#4192) by Frank Lee
Dtensor
- [dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
- [dtensor] updated api and doc (#3845) by Frank Lee
Workflow
- [workflow] show test duration (#4159) by Frank Lee
- [workflow] added status check for test coverage workflow (#4106) by Frank Lee
- [workflow] cover all public repositories in weekly report (#4069) by Frank Lee
- [workflow] fixed the directory check in build (#3980) by Frank Lee
- [workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
- [workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
- [workflow] added docker latest tag for release (#3920) by Frank Lee
- [workflow] fixed workflow check for docker build (#3849) by Frank Lee
Cli
- [cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu
Format
- [format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]
Shardformer
- [shardformer] added development protocol for standardization (#4149) by Frank Lee
- [shardformer] made tensor parallelism configurable (#4144) by Frank Lee
- [shardformer] refactored some doc and api (#4137) by Frank Lee
- [shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
- [shardformer] added embedding gradient check (#4124) by Frank Lee
- [shardformer] import huggingface implicitly (#4101) by Frank Lee
- [shardformer] integrate with data parallelism (#4103) by Frank Lee
- [shardformer] supported fused normalization (#4112) by Frank Lee
- [shardformer] supported bloom model (#4098) by Frank Lee
- [shardformer] support vision transformer (#4096) by Kun Lin
- [shardformer] shardformer support opt models (#4091) by jiangmingyan
- [shardformer] refactored layernorm (#4086) by Frank Lee
- [shardformer] Add layernorm (#4072) by FoolPlayer
- [shardformer] supported fused qkv checkpoint (#4073) by Frank Lee
- [shardformer] add linearconv1d test (#4067) by FoolPlayer
- [shardformer] support module saving and loading (#4062) by Frank Lee
- [shardformer] refactored the shardformer layer structure (#4053) by Frank Lee
- [shardformer] adapted T5 and LLaMa test to use kit (#4049) by Frank Lee
- [shardformer] add gpt2 test and layer class refactor (#4041) by FoolPlayer
- [shardformer] supported T5 and its variants (#4045) by Frank Lee
- [shardformer] adapted llama to the new API (#4036) by Frank Lee
- [shardformer] fix bert and gpt downstream with new api (#4024) by FoolPlayer
- [shardformer] updated doc (#4016) by Frank Lee
- [shardformer] removed inplace tensor sharding (#4018) by Frank Lee
- [shardformer] refactored embedding and dropout to parallel module (#4013) by Frank Lee
- [shardformer] integrated linear 1D with dtensor (#3996) by Frank Lee
- [shardformer] Refactor shardformer api (#4001) by FoolPlayer
- [shardformer] fix an error in readme (#3988) by FoolPlayer
- [Shardformer] Downstream bert (#3979) by FoolPlayer
- [shardformer] shardformer support t5 model (#3994) by wukong1992
- [shardformer] support llama model using shardformer (#3969) by wukong1992
- [shardformer] Add dropout layer in shard model and refactor policy api (#3949) by FoolPlayer
- [shardformer] Unit test (#3928) by FoolPlayer
- [shardformer] Align bert value (#3907) by FoolPlayer
- [shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
- [shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
- [shardformer] update readme with modules implement doc (#3834) by FoolPlayer
- [shardformer] refactored the user api (#3828) by Frank Lee
- [shardformer] updated readme (#3827) by Frank Lee
- [shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
- [shardformer] init shardformer code structure (#3731) by FoolPlayer
- [shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
- [shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
- [shardformer] update readme with modules implement doc (#3834) by FoolPlayer
- [shardformer] refactored the user api (#3828) by Frank Lee
- [shardformer] updated readme (#3827) by Frank Lee
- [shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
- [shardformer] init shardformer code structure (#3731) by FoolPlayer
Test
- [test] fixed tests failed due to dtensor change (#4082) by Frank Lee
- [test] fixed codefactor format report (#4026) by Frank Lee
Device
- [device] support init device mesh from process group (#3990) by Frank Lee
Hotfix
- [hotfix] fix import bug in checkpoint_io (#4142) by Baizhou Zhang
- [hotfix]fix argument naming in docs and examples (#4083) by Baizhou Zhang
Doc
- [doc] update and revise some typos and errs in docs (#4107) by Jianghai
- [doc] add a note about unit-testing to CONTRIBUTING.md (#3970) by Baizhou Zhang
- [doc] add lazy init tutorial (#3922) by Hongxin Liu
- [doc] fix docs about booster api usage (#3898) by Baizhou Zhang
- [doc]update moe chinese document. (#3890) by jiangmingyan
- [doc] update document of zero with chunk. (#3855) by jiangmingyan
- [doc] update nvme offload documents. (#3850) by jiangmingyan
Examples
- [examples] copy resnet example to image (#4090) by Jianghai
Testing
- [testing] move pytest to be inside the function (#4087) by Frank Lee
Gemini
- Merge pull request #4056 from Fridge003/hotfix/fixgeminichunkconfigsearching by Baizhou Zhang
- [gemini] fix argument naming during chunk configuration searching by Baizhou Zhang
- [gemini] fixed the gemini checkpoint io (#3934) by Frank Lee
- [gemini] fixed the gemini checkpoint io (#3934) by Frank Lee
Devops
- [devops] fix build on pr ci (#4043) by Hongxin Liu
- [devops] update torch version in compability test (#3919) by Hongxin Liu
- [devops] hotfix testmon cache clean logic (#3917) by Hongxin Liu
- [devops] hotfix CI about testmon cache (#3910) by Hongxin Liu
- [devops] improving testmon cache (#3902) by Hongxin Liu
Sync
- Merge pull request #4025 from hpcaitech/develop by Frank Lee
- Merge pull request #3967 from ver217/update-develop by Frank Lee
- Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer by FoolPlayer
- Revert "[sync] sync feature/shardformer with develop" by Frank Lee
- Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer by FoolPlayer
- Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop by Frank Lee
- Merge pull request #3915 from FrankLeeeee/update/develop by Frank Lee
Booster
- [booster] make optimizer argument optional for boost (#3993) by Wenhao Chen
- [booster] update bert example, using booster api (#3885) by wukong1992
Evaluate
- [evaluate] support gpt evaluation with reference (#3972) by Yuanchen
Feature
- Merge pull request #3926 from hpcaitech/feature/dtensor by Frank Lee
Bf16
- [bf16] add bf16 support (#3882) by Hongxin Liu
Evaluation
- [evaluation] improvement on evaluation (#3862) by Yuanchen
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.1...v0.3.0
- Python
Published by github-actions[bot] over 2 years ago
https://github.com/hpcaitech/colossalai - Version v0.3.0 Release Today!
What's Changed
Release
- [release] bump to v0.3.0 (#3830) by Frank Lee
Nfc
- [nfc] fix typo colossalai/ applications/ (#3831) by digger yu
- [NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
- [NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
- [NFC] fix typo with colossalai/autoparallel/tensorshard (#3742) by digger yu
- [NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
- [NFC] polish colossalai/engine/gradienthandler/init_.py code style (#3329) by Ofey Chan
- [NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
- [NFC] polish colossalai/fx/tracer/tracerutils.py (#3323) by Michelle
- [NFC] polish colossalai/gemini/paramhooks/paramhookmgr.py code style by Xu Kai
- [NFC] polish initializer_data.py code style (#3287) by RichardoLuo
- [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
- [NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/engine/gradientaccumulation/gradient_accumulation.py code style (#3277) by Sze-qq
- [NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
- [NFC] polish colossalai/engine/schedule/pipelineschedule_v2.py code style (#3275) by Zirui Zhu
- [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
- [NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
- [NFC] polish code style (#3273) by Xuanlei Zhao
- [NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
- [NFC] polish code style (#3268) by Yuanchen
- [NFC] polish tensorplacementpolicy.py code style (#3265) by Camille Zhong
- [NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
- [NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
- [NFC] polish colossalai/engine/gradienthandler/moegradienthandler.py (#3260) by LuGY
- [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow
Doc
- [doc] update document of gemini instruction. (#3842) by jiangmingyan
- Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
- [doc]fix by jiangmingyan
- [doc]fix by jiangmingyan
- [doc] add warning about fsdp plugin (#3813) by Hongxin Liu
- [doc] add removed change of config.py by jiangmingyan
- [doc] add removed warning by jiangmingyan
- [doc] update amp document by Mingyan Jiang
- [doc] update amp document by Mingyan Jiang
- [doc] update amp document by Mingyan Jiang
- [doc] update gradient accumulation (#3771) by jiangmingyan
- [doc] update gradient cliping document (#3778) by jiangmingyan
- [doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
- [doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
- [doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
- [doc] add tutorial for booster plugins (#3758) by Hongxin Liu
- [doc] add tutorial for cluster utils (#3763) by Hongxin Liu
- [doc] update hybrid parallelism doc (#3770) by jiangmingyan
- [doc] update booster tutorials (#3718) by jiangmingyan
- [doc] fix chat spelling error (#3671) by digger-yu
- [Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
- [doc] Fix typo under colossalai and doc(#3618) by digger-yu
- [doc] .github/workflows/README.md (#3605) by digger-yu
- [doc] fix setup.py typo (#3603) by digger-yu
- [doc] fix op_builder/README.md (#3597) by digger-yu
- [doc] Update .github/workflows/README.md (#3577) by digger-yu
- [doc] Update 1Dtensorparallel.md (#3573) by digger-yu
- [doc] Update 1Dtensorparallel.md (#3563) by digger-yu
- [doc] Update README.md (#3549) by digger-yu
- [doc] Update README-zh-Hans.md (#3541) by digger-yu
- [doc] hide diffusion in application path (#3519) by binmakeswell
- [doc] add requirement and highlight application (#3516) by binmakeswell
- [doc] Add docs for clip args in zero optim (#3504) by YH
- [doc] updated contributor list (#3474) by Frank Lee
- [doc] polish diffusion example (#3386) by Jan Roudaut
- [doc] add Intel cooperation news (#3333) by binmakeswell
- [doc] added authors to the chat application (#3307) by Fazzie-Maqianli
Workflow
- [workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
- [workflow] fixed testmon cache in build CI (#3806) by Frank Lee
- [workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
- [workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
- [workflow] enable testing for develop & feature branch (#3801) by Frank Lee
- [workflow] fixed the docker build workflow (#3794) by Frank Lee
Booster
- [booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
- [booster] torch fsdp fix ckpt (#3788) by wukong1992
- [booster] removed models that don't support fsdp (#3744) by wukong1992
- [booster] support torch fsdp plugin in booster (#3697) by wukong1992
- [booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
- [booster] fix no_sync method (#3709) by Hongxin Liu
- [booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
- [booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
- [booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
- [booster] add low level zero plugin (#3594) by Hongxin Liu
- [booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
- [booster] implement Gemini plugin (#3352) by ver217
Docs
- [docs] change placememtpolicy to placementpolicy (#3829) by digger yu
Evaluation
- [evaluation] add automatic evaluation pipeline (#3821) by Yuanchen
Docker
Api
- [API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan
Test
- [test] fixed lazy init test import error (#3799) by Frank Lee
- Update test_ci.sh by Camille Zhong
- [test] refactor tests with spawn (#3452) by Frank Lee
- [test] reorganize zero/gemini tests (#3445) by ver217
- [test] fixed gemini plugin test (#3411) by Frank Lee
Format
- [format] applied code formatting on changed files in pull request 3786 (#3787) by github-actions[bot]
- [format] Run lint on colossalai.engine (#3367) by Hakjin Lee
Plugin
- [plugin] a workaround for zero plugins' optimizer checkpoint (#3780) by Hongxin Liu
- [plugin] torch ddp plugin supports sharded model checkpoint (#3775) by Hongxin Liu
Chat
- [chat] add performance and tutorial (#3786) by binmakeswell
- [chat] fix bugs in stage 3 training (#3759) by Yuanchen
- [chat] fix community example ray (#3719) by MisterLin1995
- [chat] fix train_prompts.py gemini strategy bug (#3666) by zhang-yi-chi
- [chat] PPO stage3 doc enhancement (#3679) by Camille Zhong
- [chat] add opt attn kernel (#3655) by Hongxin Liu
- [chat] typo accimulationsteps -> accumulationsteps (#3662) by tanitna
- Merge pull request #3656 from TongLi3701/chat/update_eval by Tong Li
- [chat] set default zero2 strategy (#3667) by binmakeswell
- [chat] refactor model save/load logic (#3654) by Hongxin Liu
- [chat] remove lm model class (#3653) by Hongxin Liu
- [chat] refactor trainer (#3648) by Hongxin Liu
- [chat] polish performance evaluator (#3647) by Hongxin Liu
- Merge pull request #3621 from zhang-yi-chi/fix/chat-train-prompts-single-gpu by Tong Li
- [Chat] Remove duplicate functions (#3625) by ddobokki
- [chat] fix enable single gpu training bug by zhang-yi-chi
- [chat] polish code note typo (#3612) by digger-yu
- [chat] update reward model sh (#3578) by binmakeswell
- [chat] ChatGPT train prompts on ray example (#3309) by MisterLin1995
- [chat] polish tutorial doc (#3551) by binmakeswell
- [chat]add examples of training with limited resources in chat readme (#3536) by Yuanchen
- [chat]: add vf_coef argument for PPOTrainer (#3318) by zhang-yi-chi
- [chat] add zero2 cpu strategy for sft training (#3520) by ver217
- [chat] fix stage3 PPO sample sh command (#3477) by binmakeswell
- [Chat]Add Peft support & fix the ptx bug (#3433) by YY Lin
- [chat]fix save_model(#3377) by Dr-Corgi
- [chat]fix readme (#3429) by kingkingofall
- [Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453) by Camille Zhong
- [chat]fix sft training for bloom, gpt and opt (#3418) by Yuanchen
- [chat] correcting a few obvious typos and grammars errors (#3338) by Andrew
Devops
- [devops] fix doc test on pr (#3782) by Hongxin Liu
- [devops] fix ci for document check (#3751) by Hongxin Liu
- [devops] make build on PR run automatically (#3748) by Hongxin Liu
- [devops] update torch version of CI (#3725) by Hongxin Liu
- [devops] fix chat ci (#3628) by Hongxin Liu
Amp
- [amp] Add naive amp demo (#3774) by jiangmingyan
Auto
- [auto] fix install cmd (#3772) by binmakeswell
Fix
- [fix] Add init to fix import error when importing _analyzer (#3668) by Ziyue Jiang
Ci
- [CI] fix typo with tests/ etc. (#3727) by digger-yu
- [CI] fix typo with tests components (#3695) by digger-yu
- [CI] fix some spelling errors (#3707) by digger-yu
- [CI] Update testshardedoptimwithsync_bn.py (#3688) by digger-yu
Example
- [example] add train resnet/vit with booster example (#3694) by Hongxin Liu
- [example] add finetune bert with booster example (#3693) by Hongxin Liu
- [example] fix community doc (#3586) by digger-yu
- [example] reorganize for community examples (#3557) by binmakeswell
- [example] remove redundant texts & update roberta (#3493) by mandoxzhang
- [example] update roberta with newer ColossalAI (#3472) by mandoxzhang
- [example] update examples related to zero/gemini (#3431) by ver217
Tensor
- [tensor] Refactor handletransspec in DistSpecManager by YH
Zero
- [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173) by YH
- [zero] reorganize zero/gemini folder structure (#3424) by ver217
Gemini
- [gemini] accelerate inference (#3641) by Hongxin Liu
- [gemini] state dict supports fp16 (#3590) by Hongxin Liu
- [gemini] support save state dict in shards (#3581) by Hongxin Liu
- [gemini] gemini supports lazy init (#3379) by Hongxin Liu
Bot
- [bot] Automated submodule synchronization (#3596) by github-actions[bot]
Misc
- [misc] op_builder/builder.py (#3593) by digger-yu
- [misc] add verbose arg for zero and op builder (#3552) by Hongxin Liu
Coati
- [coati] fix install cmd (#3592) by binmakeswell
- [coati] add costom model suppor tguide (#3579) by Fazzie-Maqianli
- [coati] Fix LlamaCritic (#3475) by gongenlei
Fx
- [fx] fix meta tensor registration (#3589) by Hongxin Liu
Chatgpt
- [chatgpt] Detached PPO Training (#3195) by csric
- [chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223) by Camille Zhong
Lazyinit
- [lazyinit] fix clone and deepcopy (#3553) by Hongxin Liu
Checkpoint
- [checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479) by jiangmingyan
- [checkpoint] support huggingface style sharded checkpoint (#3461) by jiangmingyan
- [checkpoint] refactored the API and added safetensors support (#3427) by Frank Lee
Chat community
- [Chat Community] Update README.md (fixed#3487) (#3506) by NatalieC323
Dreambooth
- Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) by NatalieC323
- [dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378) by NatalieC323
Autoparallel
- [autoparallel]integrate auto parallel feature with new tracer (#3408) by YuliangLiu0306
- [autoparallel] adapt autoparallel with new analyzer (#3261) by YuliangLiu0306
Moe
- [moe] add checkpoint for moe models (#3354) by HELSON
Hotfix
- [hotfix] metatensorcompatibilitywithtorch2 by YuliangLiu0306
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.0...v0.2.8
- Python
Published by github-actions[bot] almost 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.8 Release Today!
What's Changed
Release
- [release] v0.2.8 (#3305) by Frank Lee
Format
- [format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]
Doc
- [doc] add ColossalChat news (#3304) by binmakeswell
- [doc] add ColossalChat (#3297) by binmakeswell
- [doc] fix typo (#3222) by binmakeswell
- [doc] update chatgpt doc paper link (#3229) by Camille Zhong
- [doc] add community contribution guide (#3153) by binmakeswell
- [doc] add Intel cooperation for biomedicine (#3108) by binmakeswell
Application
- [application] updated the README (#3301) by Frank Lee
Chat
Coati
- [coati] fix inference profanity check (#3299) by ver217
- [coati] inference supports profanity check (#3295) by ver217
- [coati] add repetition_penalty for inference (#3294) by ver217
- [coati] fix inference output (#3285) by ver217
- [Coati] first commit (#3283) by Fazzie-Maqianli
Colossalchat
- [ColossalChat]add cite for datasets (#3292) by Fazzie-Maqianli
Examples
- [examples] polish AutoParallel readme (#3270) by YuliangLiu0306
- [examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323
Fx
- [fx] meta registration compatibility (#3253) by HELSON
- [FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306
Booster
- [booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
- [booster] implemented the cluster module (#3191) by Frank Lee
- [booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
- [booster] added the accelerator implementation (#3159) by Frank Lee
- [booster] implemented mixed precision class (#3151) by Frank Lee
Ci
- [CI] Fix pre-commit workflow (#3238) by Hakjin Lee
Api
- [API] implement device mesh manager (#3221) by YuliangLiu0306
- [api] implemented the checkpoint io module (#3205) by Frank Lee
Hotfix
- [hotfix] skip torchaudio tracing test (#3211) by YuliangLiu0306
- [hotfix] layout converting issue (#3188) by YuliangLiu0306
Chatgpt
- [chatgpt] add precision option for colossalai (#3233) by ver217
- [chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
- [chatgpt] support instuct training (#3216) by Fazzie-Maqianli
- [chatgpt]add reward model code for deberta (#3199) by Yuanchen
- [chatgpt]support llama (#3070) by Fazzie-Maqianli
- [chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
- [chatgpt]Reward Model Training Process update (#3133) by BlueRum
- [chatgpt] fix trainer generate kwargs (#3166) by ver217
- [chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
- [chatgpt]update ci (#3087) by BlueRum
- [chatgpt]Fix examples (#3116) by BlueRum
- [chatgpt] fix lora support for gpt (#3113) by BlueRum
- [chatgpt] type miss of kwargs (#3107) by hiko2MSP
- [chatgpt] fix lora save bug (#3099) by BlueRum
Lazyinit
- [lazyinit] combine lazy tensor with dtensor (#3204) by ver217
- [lazyinit] add correctness verification (#3147) by ver217
- [lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217
Auto
- [auto] fix requirements typo for issue #3125 (#3209) by Yan Fang
Analyzer
- [Analyzer] fix analyzer tests (#3197) by YuliangLiu0306
Dreambooth
- [dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323
Auto-parallel
- [auto-parallel] add auto-offload feature (#3154) by Zihao
Zero
- [zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH
Test
- [test] fixed torchrec registration in model zoo (#3177) by Frank Lee
- [test] fixed torchrec model test (#3167) by Frank Lee
- [test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
- [test] added transformers models to test model zoo (#3135) by Frank Lee
- [test] added torchvision models to test model zoo (#3132) by Frank Lee
- [test] added timm models to test model zoo (#3129) by Frank Lee
Refactor
- [refactor] update docs (#3174) by Saurav Maheshkar
Tests
- [tests] model zoo add torchaudio models (#3138) by ver217
- [tests] diffuser models in model zoo (#3136) by HELSON
Docker
- [docker] Add opencontainers image-spec to
Dockerfile(#3006) by Saurav Maheshkar
Dtensor
- [DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306
Workflow
- [workflow] purged extension cache before GPT test (#3128) by Frank Lee
Autochunk
- [autochunk] support complete benchmark (#3121) by Xuanlei Zhao
Tutorial
- [tutorial] update notes for TransformerEngine (#3098) by binmakeswell
Nvidia
- [NVIDIA] Add FP8 example using TE (#3080) by Kirthi Shankar Sivamani
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.8...v0.2.7
- Python
Published by github-actions[bot] almost 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.6 Release Today!
What's Changed
Release
- [release] v0.2.6 (#3057) by Frank Lee
Doc
- [doc] moved doc test command to bottom (#3075) by Frank Lee
- [doc] specified operating system requirement (#3019) by Frank Lee
- [doc] update nvme offload doc (#3014) by ver217
- [doc] add ISC tutorial (#2997) by binmakeswell
- [doc] add deepspeed citation and copyright (#2996) by ver217
- [doc] added reference to related works (#2994) by Frank Lee
- [doc] update news (#2983) by binmakeswell
- [doc] fix chatgpt inference typo (#2964) by binmakeswell
- [doc] add env scope (#2933) by binmakeswell
- [doc] added readme for documentation (#2935) by Frank Lee
- [doc] removed read-the-docs (#2932) by Frank Lee
- [doc] update installation for GPT (#2922) by binmakeswell
- [doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
- [doc] fix GPT tutorial (#2860) by dawei-wang
- [doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
- [doc] update OPT serving (#2804) by binmakeswell
- [doc] update example and OPT serving link (#2769) by binmakeswell
- [doc] add opt service doc (#2747) by Frank Lee
- [doc] fixed a typo in GPT readme (#2736) by cloudhuang
- [doc] updated documentation version list (#2730) by Frank Lee
Workflow
- [workflow] fixed doc build trigger condition (#3072) by Frank Lee
- [workflow] supported conda package installation in doc test (#3028) by Frank Lee
- [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
- [workflow] added auto doc test on PR (#2929) by Frank Lee
- [workflow] moved pre-commit to post-commit (#2895) by Frank Lee
Booster
- [booster] init module structure and definition (#3056) by Frank Lee
Example
- [example] fix redundant note (#3065) by binmakeswell
- [example] fixed opt model downloading from huggingface by Tomek
- [example] add LoRA support (#2821) by Haofan Wang
Autochunk
- [autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao
Chatgpt
- [chatgpt] change critic input as state (#3042) by wenjunyang
- [chatgpt] fix readme (#3025) by BlueRum
- [chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
- [chatgpt]fix inference model load (#2988) by BlueRum
- [chatgpt] allow shard init and display warning (#2986) by ver217
- [chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
- [chatgpt] making experience support dp (#2971) by ver217
- [chatgpt]fix lora bug (#2974) by BlueRum
- [chatgpt] fix inference demo loading bug (#2969) by BlueRum
- [ChatGPT] fix README (#2966) by Fazzie-Maqianli
- [chatgpt]add inference example (#2944) by BlueRum
- [chatgpt]support opt & gpt for rm training (#2876) by BlueRum
- [chatgpt] Support saving ckpt in examples (#2846) by BlueRum
- [chatgpt] fix rm eval (#2829) by BlueRum
- [chatgpt] add test checkpoint (#2797) by ver217
- [chatgpt] update readme about checkpoint (#2792) by ver217
- [chatgpt] startegy add prepare method (#2766) by ver217
- [chatgpt] disable shard init for colossalai (#2767) by ver217
- [chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
- [chatgpt]fix train_rm bug with lora (#2741) by BlueRum
Dtensor
- [DTensor] refactor CommSpec (#3034) by YuliangLiu0306
- [DTensor] refactor sharding spec (#2987) by YuliangLiu0306
- [DTensor] implementation of dtensor (#2946) by YuliangLiu0306
Hotfix
- [hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
- [hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
- [hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
- [hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
- [hotfix] fix chunk size can not be divided (#2867) by HELSON
- Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
- [hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
- [hotfix] add correct device for fake_param (#2796) by HELSON
Revert] recover "[refactor
- [revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee
Format
- [format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]
Pipeline
- [pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang
Fx
- [fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel
Refactor
- [refactor] restructure configuration files (#2977) by Saurav Maheshkar
Kernel
- [kernel] cached the op kernel and fixed version check (#2886) by Frank Lee
Misc
- [misc] add reference (#2930) by ver217
Autoparallel
- [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
- [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
- [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.where(#2822) by Boyuan Yao - [autoparallel] Patch meta information of
torch.tanh()andtorch.nn.Dropout(#2773) by Boyuan Yao - [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
- [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.nn.Embedding(#2760) by Boyuan Yao - [autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306
Zero
- [zero] trivial zero optimizer refactoring (#2869) by YH
- [zero] fix wrong import (#2777) by Boyuan Yao
Cli
- [cli] handled version check exceptions (#2848) by Frank Lee
Triton
- [triton] added copyright information for flash attention (#2835) by Frank Lee
Nfc
- [NFC] polish colossalai/engine/schedule/pipelineschedule.py code style (#2744) by Michelle
- [NFC] polish code format by binmakeswell
- [NFC] polish colossalai/autoparallel/tensorshard/deprecated/graph_analysis.py code style (#2737) by xyupeng
- [NFC] polish colossalai/context/processgroupinitializer/initializer_2d.py code style (#2726) by Zirui Zhu
- [NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/batchnorm_handler.py code style (#2728) by Zangwei Zheng
- [NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)
Exmaple
- [exmaple] add bert and albert (#2824) by Jiarui Fang
Ci/cd
- [CI/CD] fix nightly release CD running on forked repo (#2812) by LuGY
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.6...v0.2.5
- Python
Published by github-actions[bot] almost 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.7 Release Today!
What's Changed
Release
Chatgpt
- [chatgpt]add flag of action mask in critic(#3086) by Fazzie-Maqianli
- [chatgpt] change critic input as state (#3042) by wenjunyang
- [chatgpt] fix readme (#3025) by BlueRum
- [chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
- [chatgpt]fix inference model load (#2988) by BlueRum
- [chatgpt] allow shard init and display warning (#2986) by ver217
- [chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
- [chatgpt] making experience support dp (#2971) by ver217
- [chatgpt]fix lora bug (#2974) by BlueRum
- [chatgpt] fix inference demo loading bug (#2969) by BlueRum
- [ChatGPT] fix README (#2966) by Fazzie-Maqianli
- [chatgpt]add inference example (#2944) by BlueRum
- [chatgpt]support opt & gpt for rm training (#2876) by BlueRum
- [chatgpt] Support saving ckpt in examples (#2846) by BlueRum
- [chatgpt] fix rm eval (#2829) by BlueRum
- [chatgpt] add test checkpoint (#2797) by ver217
- [chatgpt] update readme about checkpoint (#2792) by ver217
- [chatgpt] startegy add prepare method (#2766) by ver217
- [chatgpt] disable shard init for colossalai (#2767) by ver217
- [chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
- [chatgpt]fix train_rm bug with lora (#2741) by BlueRum
Kernel
- [kernel] added kernel loader to softmax autograd function (#3093) by Frank Lee
- [kernel] cached the op kernel and fixed version check (#2886) by Frank Lee
Analyzer
- [analyzer] a minimal implementation of static graph analyzer (#2852) by Super Daniel
Diffusers
- [diffusers] fix ci and docker (#3085) by Fazzie-Maqianli
Doc
- [doc] fixed typos in docs/README.md (#3082) by Frank Lee
- [doc] moved doc test command to bottom (#3075) by Frank Lee
- [doc] specified operating system requirement (#3019) by Frank Lee
- [doc] update nvme offload doc (#3014) by ver217
- [doc] add ISC tutorial (#2997) by binmakeswell
- [doc] add deepspeed citation and copyright (#2996) by ver217
- [doc] added reference to related works (#2994) by Frank Lee
- [doc] update news (#2983) by binmakeswell
- [doc] fix chatgpt inference typo (#2964) by binmakeswell
- [doc] add env scope (#2933) by binmakeswell
- [doc] added readme for documentation (#2935) by Frank Lee
- [doc] removed read-the-docs (#2932) by Frank Lee
- [doc] update installation for GPT (#2922) by binmakeswell
- [doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
- [doc] fix GPT tutorial (#2860) by dawei-wang
- [doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
- [doc] update OPT serving (#2804) by binmakeswell
- [doc] update example and OPT serving link (#2769) by binmakeswell
- [doc] add opt service doc (#2747) by Frank Lee
- [doc] fixed a typo in GPT readme (#2736) by cloudhuang
- [doc] updated documentation version list (#2730) by Frank Lee
Autochunk
- [autochunk] support vit (#3084) by Xuanlei Zhao
- [autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao
Dtensor
- [DTensor] implement layout converter (#3055) by YuliangLiu0306
- [DTensor] refactor CommSpec (#3034) by YuliangLiu0306
- [DTensor] refactor sharding spec (#2987) by YuliangLiu0306
- [DTensor] implementation of dtensor (#2946) by YuliangLiu0306
Workflow
- [workflow] fixed doc build trigger condition (#3072) by Frank Lee
- [workflow] supported conda package installation in doc test (#3028) by Frank Lee
- [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
- [workflow] added auto doc test on PR (#2929) by Frank Lee
- [workflow] moved pre-commit to post-commit (#2895) by Frank Lee
Booster
- [booster] init module structure and definition (#3056) by Frank Lee
Example
- [example] fix redundant note (#3065) by binmakeswell
- [example] fixed opt model downloading from huggingface by Tomek
- [example] add LoRA support (#2821) by Haofan Wang
Hotfix
- [hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
- [hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
- [hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
- [hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
- [hotfix] fix chunk size can not be divided (#2867) by HELSON
- Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
- [hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
- [hotfix] add correct device for fake_param (#2796) by HELSON
Revert] recover "[refactor
- [revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee
Format
- [format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]
Pipeline
- [pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang
Fx
- [fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel
Refactor
- [refactor] restructure configuration files (#2977) by Saurav Maheshkar
Misc
- [misc] add reference (#2930) by ver217
Autoparallel
- [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
- [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
- [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.where(#2822) by Boyuan Yao - [autoparallel] Patch meta information of
torch.tanh()andtorch.nn.Dropout(#2773) by Boyuan Yao - [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
- [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.nn.Embedding(#2760) by Boyuan Yao - [autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306
Zero
- [zero] trivial zero optimizer refactoring (#2869) by YH
- [zero] fix wrong import (#2777) by Boyuan Yao
Cli
- [cli] handled version check exceptions (#2848) by Frank Lee
Triton
- [triton] added copyright information for flash attention (#2835) by Frank Lee
Nfc
- [NFC] polish colossalai/engine/schedule/pipelineschedule.py code style (#2744) by Michelle
- [NFC] polish code format by binmakeswell
- [NFC] polish colossalai/autoparallel/tensorshard/deprecated/graph_analysis.py code style (#2737) by xyupeng
- [NFC] polish colossalai/context/processgroupinitializer/initializer_2d.py code style (#2726) by Zirui Zhu
- [NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/batchnorm_handler.py code style (#2728) by Zangwei Zheng
- [NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)
Exmaple
- [exmaple] add bert and albert (#2824) by Jiarui Fang
Ci/cd
- [CI/CD] fix nightly release CD running on forked repo (#2812) by LuGY
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.7...v0.2.5
- Python
Published by github-actions[bot] almost 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.5 Release Today!
What's Changed
Chatgpt
- [chatgpt] optimize generation kwargs (#2717) by ver217
Autoparallel
- [autoparallel] add shard option (#2696) by YuliangLiu0306
- [autoparallel] fix parameters sharding bug (#2716) by YuliangLiu0306
- [autoparallel] refactor runtime pass (#2644) by YuliangLiu0306
- [autoparallel] remove deprecated codes (#2664) by YuliangLiu0306
- [autoparallel] test compatibility for gemini and auto parallel (#2700) by YuliangLiu0306
Doc
- [doc] updated documentation version list (#2715) by Frank Lee
- [doc] add open-source contribution invitation (#2714) by binmakeswell
- [doc] add Quick Preview (#2706) by binmakeswell
- [doc] resize figure (#2705) by binmakeswell
- [doc] add ChatGPT (#2703) by binmakeswell
Devops
- [devops] add chatgpt ci (#2713) by ver217
Workflow
- [workflow] fixed tensor-nvme build caching (#2711) by Frank Lee
App
- [app] fix ChatGPT requirements (#2704) by binmakeswell
- [app] add chatgpt application (#2698) by ver217
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.5...v0.2.4
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.4 Release Today!
What's Changed
Release
- [release] update version (#2691) by ver217
Doc
- [doc] update auto parallel paper link (#2686) by binmakeswell
- [doc] added documentation sidebar translation (#2670) by Frank Lee
Zero1&2
- [zero1&2] only append parameters with gradients (#2681) by HELSON
Gemini
- [gemini] fix coloinitcontext (#2683) by ver217
- [gemini] add fakereleasechunk for keep-gathered chunk in the inference mode (#2671) by HELSON
Workflow
- [workflow] fixed communtity report ranking (#2680) by Frank Lee
- [workflow] added trigger to build doc upon release (#2678) by Frank Lee
- [workflow] added doc build test (#2675) by Frank Lee
Autoparallel
- [autoparallel] Patch meta information of
torch.nn.functional.softmaxandtorch.nn.Softmax(#2674) by Boyuan Yao
Dooc
- [dooc] fixed the sidebar itemm key (#2672) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.4...v0.2.3
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.3 Release Today!
What's Changed
Release
- [release] v0.2.3 (#2669) by Frank Lee
Doc
- [doc] add CVPR tutorial (#2666) by binmakeswell
Docs
- [Docs] layout converting management (#2665) by YuliangLiu0306
Autoparallel
- [autoparallel] Patch meta information of
torch.nn.LayerNorm(#2647) by Boyuan Yao
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.3...v0.2.2
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.2 Release Today!
What's Changed
Release
- [release] v0.2.2 (#2661) by Frank Lee
Workflow
- [workflow] fixed gpu memory check condition (#2659) by Frank Lee
- [workflow] fixed the test coverage report (#2614) by Frank Lee
- [workflow] fixed test coverage report (#2611) by Frank Lee
Example
- [example] Polish README.md (#2658) by Jiatong (Julius) Han
Doc
- [doc] fixed compatiblity with docusaurus (#2657) by Frank Lee
- [doc] added docusaurus-based version control (#2656) by Frank Lee
- [doc] migrate the markdown files (#2652) by Frank Lee
- [doc] fix typo of BLOOM (#2643) by binmakeswell
- [doc] removed pre-built wheel installation from readme (#2637) by Frank Lee
- [doc] updated the sphinx theme (#2635) by Frank Lee
- [doc] fixed broken badge (#2623) by Frank Lee
Autoparallel
- [autoparallel] refactor handlers which reshape input tensors (#2615) by YuliangLiu0306
- [autoparallel] adapt autoparallel tests with latest api (#2626) by YuliangLiu0306
- [autoparallel] Patch meta information of
torch.matmul(#2584) by Boyuan Yao
Tutorial
- [tutorial] added energonai to opt inference requirements (#2625) by Frank Lee
- [tutorial] add video link (#2619) by binmakeswell
Autochunk
- [autochunk] support diffusion for autochunk (#2621) by oahzxl
Build
- [build] fixed the doc build process (#2618) by Frank Lee
Test
- [test] fixed the triton version for testing (#2608) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.2...v0.2.1
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.1 Release Today!
What's Changed
Workflow
- [workflow] fixed broken rellease workflows (#2604) by Frank Lee
- [workflow] added cuda extension build test before release (#2598) by Frank Lee
- [workflow] hooked pypi release with lark (#2596) by Frank Lee
- [workflow] hooked docker release with lark (#2594) by Frank Lee
- [workflow] added test-pypi check before release (#2591) by Frank Lee
- [workflow] fixed the typo in the example check workflow (#2589) by Frank Lee
- [workflow] hook compatibility test failure to lark (#2586) by Frank Lee
- [workflow] hook example test alert with lark (#2585) by Frank Lee
- [workflow] added notification if scheduled build fails (#2574) by Frank Lee
- [workflow] added discussion stats to community report (#2572) by Frank Lee
- [workflow] refactored compatibility test workflow for maintenability (#2560) by Frank Lee
- [workflow] adjust the GPU memory threshold for scheduled unit test (#2558) by Frank Lee
- [workflow] fixed example check workflow (#2554) by Frank Lee
- [workflow] fixed typos in the leaderboard workflow (#2567) by Frank Lee
- [workflow] added contributor and user-engagement report (#2564) by Frank Lee
- [workflow] only report coverage for changed files (#2524) by Frank Lee
- [workflow] fixed the precommit CI (#2525) by Frank Lee
- [workflow] fixed changed file detection (#2515) by Frank Lee
- [workflow] fixed the skip condition of example weekly check workflow (#2481) by Frank Lee
- [workflow] automated bdist wheel build (#2459) by Frank Lee
- [workflow] automated the compatiblity test (#2453) by Frank Lee
- [workflow] fixed the on-merge condition check (#2452) by Frank Lee
- [workflow] make test coverage report collapsable (#2436) by Frank Lee
- [workflow] report test coverage even if below threshold (#2431) by Frank Lee
- [workflow]auto comment with test coverage report (#2419) by Frank Lee
- [workflow] auto comment if precommit check fails (#2417) by Frank Lee
- [workflow] added translation for non-english comments (#2414) by Frank Lee
- [workflow] added precommit check for code consistency (#2401) by Frank Lee
- [workflow] refactored the example check workflow (#2411) by Frank Lee
- [workflow] added nightly release to pypi (#2403) by Frank Lee
- [workflow] added missing file change detection output (#2387) by Frank Lee
- [workflow]New version: Create workflow files for examples' auto check (#2298) by ziyuhuang123
- [workflow] fixed pypi release workflow error (#2328) by Frank Lee
- [workflow] fixed pypi release workflow error (#2327) by Frank Lee
- [workflow] added workflow to release to pypi upon version change (#2320) by Frank Lee
- [workflow] removed unused assign reviewer workflow (#2318) by Frank Lee
- [workflow] rebuild cuda kernels when kernel-related files change (#2317) by Frank Lee
Release
- [release] v0.2.1 (#2602) by Frank Lee
Doc
- [doc] updated readme for CI/CD (#2600) by Frank Lee
- [doc] fixed issue link in pr template (#2577) by Frank Lee
- [doc] updated the CHANGE_LOG.md for github release page (#2552) by Frank Lee
- [doc] fixed the typo in pr template (#2556) by Frank Lee
- [doc] added pull request template (#2550) by Frank Lee
- [doc] update example link (#2520) by binmakeswell
- [doc] update opt and tutorial links (#2509) by binmakeswell
- [doc] added documentation for CI/CD (#2420) by Frank Lee
- [doc] updated kernel-related optimisers' docstring (#2385) by Frank Lee
- [doc] updated readme regarding pypi installation (#2406) by Frank Lee
- [doc] hotfix #2377 by Jiarui Fang
- [doc] hotfix #2377 by jiaruifang
- [doc] update stable diffusion link (#2322) by binmakeswell
- [doc] update diffusion doc (#2296) by binmakeswell
- [doc] update news (#2295) by binmakeswell
- [doc] update news by binmakeswell
Setup
- [setup] fixed inconsistent version meta (#2578) by Frank Lee
- [setup] refactored setup.py for dependency graph (#2413) by Frank Lee
- [setup] support pre-build and jit-build of cuda kernels (#2374) by Frank Lee
- [setup] make cuda extension build optional (#2336) by Frank Lee
- [setup] remove torch dependency (#2333) by Frank Lee
- [setup] removed the build dependency on colossalai (#2307) by Frank Lee
Tutorial
- [tutorial] polish README (#2568) by binmakeswell
- [tutorial] update fastfold tutorial (#2565) by oahzxl
Polish
- [polish] polish ColoTensor and its submodules (#2537) by HELSON
- [polish] polish code for getstatictorch_model (#2405) by HELSON
Kernel
- [kernel] fixed repeated loading of kernels (#2549) by Frank Lee
Hotfix
- [hotfix] fix zero ddp warmup check (#2545) by ver217
- [hotfix] fix autoparallel demo (#2533) by YuliangLiu0306
- [hotfix] fix lightning error (#2529) by HELSON
- [hotfix] meta tensor default device. (#2510) by Super Daniel
- [hotfix] gpt example titans bug #2493 (#2494) by Jiarui Fang
- [hotfix] gpt example titans bug #2493 by jiaruifang
- [hotfix] add norm clearing for the overflow step (#2416) by HELSON
- [hotfix] add DISTPAN argument for benchmark (#2412) by HELSON
- [hotfix] fix gpt gemini example (#2404) by HELSON
- [hotfix] issue #2388 by Jiarui Fang
- [hotfix] issue #2388 by jiaruifang
- [hotfix] fix implement error in diffusers by Jiarui Fang
- [hotfix] fix implement error in diffusers by 1SAA
Autochunk
- [autochunk] add benchmark for transformer and alphafold (#2543) by oahzxl
- [autochunk] support multi outputs chunk search (#2538) by oahzxl
- [autochunk] support transformer (#2526) by oahzxl
- [autochunk] support parsing blocks (#2506) by oahzxl
- [autochunk] support autochunk on evoformer (#2497) by oahzxl
- [autochunk] support evoformer tracer (#2485) by oahzxl
- [autochunk] add autochunk feature by Jiarui Fang
Git
- [git] remove invalid submodule (#2540) by binmakeswell
Gemini
- [gemini] add profiler in the demo (#2534) by HELSON
- [gemini] update the gpt example (#2527) by HELSON
- [gemini] update ddp strict mode (#2518) by HELSON
- [gemini] add get static torch model (#2356) by HELSON
Example
- [example] Add fastfold tutorial (#2528) by LuGY
- [example] update lightning dependency for stable diffusion (#2522) by Jiarui Fang
- Merge pull request #2499 from feifeibear/dev0116_10 by Fazzie-Maqianli
- [example] dreambooth example by jiaruifang
- [example] fix requirements (#2488) by binmakeswell
- [example] titans for gpt (#2484) by Jiarui Fang
- [example] titans for gpt by jiaruifang
- [example] stable diffusion add roadmap (#2482) by Jiarui Fang
- [example] stable diffusion add roadmap by jiaruifang
- [example] update gpt gemini example ci test (#2477) by ver217
- [example] integrate seq-parallel tutorial with CI (#2463) by Frank Lee
- [example] update vit ci script (#2469) by ver217
- [example] integrate autoparallel demo with CI (#2466) by Frank Lee
- [example] fixed seed error in traindreamboothcolossalai.py (#2445) by Haofan Wang
- [example] updated large-batch optimizer tutorial (#2448) by Frank Lee
- [example] updated the hybrid parallel tutorial (#2444) by Frank Lee
- [example] improved the clarity yof the example readme (#2427) by Frank Lee
- [example] removed duplicated stable diffusion example (#2424) by Frank Lee
- [example] gpt, shard init on all processes (#2366) by Jiarui Fang
- [example] upload auto parallel gpt2 demo (#2354) by YuliangLiu0306
- [example] add google doc for benchmark results of GPT (#2355) by Jiarui Fang
- [example] make gpt example directory more clear (#2353) by Jiarui Fang
- [example] simplify opt example (#2344) by Jiarui Fang
- [example] add example requirement (#2345) by binmakeswell
- [example] diffusion update diffusion,Dreamblooth (#2329) by Fazzie-Maqianli
- [example] update diffusion readme with official lightning (#2304) by Jiarui Fang
- [example] update gemini benchmark bash (#2306) by HELSON
Zero
- [zero] add zero wrappers (#2523) by HELSON
- [zero] fix gradient clipping in hybrid parallelism (#2521) by HELSON
- [zero] add strict ddp mode (#2508) by HELSON
- [zero] add unit testings for hybrid parallelism (#2486) by HELSON
- [zero] add unit test for low-level zero init (#2474) by HELSON
- [zero] polish low level optimizer (#2473) by HELSON
- [zero] low level optim supports ProcessGroup (#2464) by Jiarui Fang
- [zero] add warning for ignored parameters (#2446) by HELSON
- [zero] fix statedict and loadstate_dict for ddp ignored parameters (#2443) by HELSON
- [zero] add inference mode and its unit test (#2418) by HELSON
Autoparallel
- [autoparallel] accelerate gpt2 training (#2495) by YuliangLiu0306
- [autoparallel] support origin activation ckpt on autoprallel system (#2468) by YuliangLiu0306
- [autoparallel] update binary elementwise handler (#2451) by YuliangLiu0306
- [autoparallel] integrate device mesh initialization into autoparallelize (#2393) by YuliangLiu0306
- [autoparallel] add shard option (#2423) by YuliangLiu0306
- [autoparallel] bypass MetaInfo when unavailable and modify BCASTFUNCOP metainfo (#2293) by Boyuan Yao
Utils
- [utils] lazy init. (#2148) by Super Daniel
Auto-chunk
- [auto-chunk] support extramsa (#3) (#2504) by oahzxl
Fx
- [fx] allow control of ckpt_codegen init (#2498) by oahzxl
- [fx] allow native ckpt trace and codegen. (#2438) by Super Daniel
Ci
- [CI] add test_ci.sh for palm, opt and gpt (#2475) by Jiarui Fang
Cli
- [cli] fixed hostname mismatch error (#2465) by Frank Lee
- [cli] provided more details if colossalai run fail (#2442) by Frank Lee
- [cli] updated installation check cli for aot/jit build (#2395) by Frank Lee
Examples
- [examples] update autoparallel tutorial demo (#2449) by YuliangLiu0306
- [examples] adding tflops to PaLM (#2365) by ZijianYY
- [examples]adding tp to PaLM (#2319) by ZijianYY
- [exmaple] fix dreamblooth format (#2315) by Fazzie-Maqianli
Ddp
- [ddp] add isddpignored (#2434) by HELSON
Docker
- [docker] updated Dockerfile and release workflow (#2410) by Frank Lee
Worfklow
- [worfklow] added coverage test (#2399) by Frank Lee
Device
- [device] find best logical mesh by Jiarui Fang
- [device] find best logical mesh by YuliangLiu0306
- [device] alpha beta profiler (#2311) by YuliangLiu0306
Pipeline
- [Pipeline] Refine GPT PP Example by Jiarui Fang
Builder
- [builder] correct readme (#2375) by Jiarui Fang
- [builder] reconfig op_builder for pypi install (#2314) by Jiarui Fang
- [builder] MOE builder (#2277) by Jiarui Fang
Auto-parallel
- [auto-parallel] refactoring ColoTracer (#2118) by Zihao
Amp
- [amp] add gradient clipping for unit tests (#2283) by HELSON
Autockpt
- Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel by Boyuan Yao
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.1...v0.2.0
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.2.0 Release Today!
What's Changed
Version
- [version] 0.1.14 -> 0.2.0 (#2286) by Jiarui Fang
Examples
- [examples] using args and combining two versions for PaLM (#2284) by ZijianYY
- [examples] replace einsum with matmul (#2210) by ZijianYY
Doc
- [doc] add feature diffusion v2, bloom, auto-parallel (#2282) by binmakeswell
- [doc] updated the stable diffussion on docker usage (#2244) by Frank Lee
Zero
- [zero] polish low level zero optimizer (#2275) by HELSON
- [zero] fix error for BEiT models (#2169) by HELSON
Example
- [example] add benchmark (#2276) by Ziyue Jiang
- [example] fix save_load bug for dreambooth (#2280) by BlueRum
- [example] GPT polish readme (#2274) by Jiarui Fang
- [example] fix gpt example with 0.1.10 (#2265) by HELSON
- [example] clear diffuser image (#2262) by Fazzie-Maqianli
- [example] diffusion install from docker (#2239) by Jiarui Fang
- [example] fix benchmark.sh for gpt example (#2229) by HELSON
- [example] make palm + GeminiDPP work (#2227) by Jiarui Fang
- [example] Palm adding gemini, still has bugs (#2221) by ZijianYY
- [example] update gpt example (#2225) by HELSON
- [example] add benchmark.sh for gpt (#2226) by Jiarui Fang
- [example] update gpt benchmark (#2219) by HELSON
- [example] update GPT example benchmark results (#2212) by Jiarui Fang
- [example] update gpt example for larger model scale (#2211) by Jiarui Fang
- [example] update gpt readme with performance (#2206) by Jiarui Fang
- [example] polish doc (#2201) by ziyuhuang123
- [example] Change some training settings for diffusion (#2195) by BlueRum
- [example] support Dreamblooth (#2188) by Fazzie-Maqianli
- [example] gpt demo more accuracy tflops (#2178) by Jiarui Fang
- [example] add palm pytorch version (#2172) by Jiarui Fang
- [example] update vit readme (#2155) by Jiarui Fang
- [example] add zero1, zero2 example in GPT examples (#2146) by HELSON
Hotfix
- [hotfix] fix fp16 optimzier bug (#2273) by YuliangLiu0306
- [hotfix] fix error for torch 2.0 (#2243) by xcnick
- [hotfix] Fixing the bug related to ipv6 support by Tongping Liu
- [hotfix] correcnt cpu_optim runtime compilation (#2197) by Jiarui Fang
- [hotfix] add kwargs for colo_addmm (#2171) by Tongping Liu
- [hotfix] Jit type hint #2161 (#2164) by アマデウス
- [hotfix] fix auto policy of testshardedoptim_v2 (#2157) by Jiarui Fang
- [hotfix] fix aten default bug (#2158) by YuliangLiu0306
Autoparallel
- [autoparallel] fix spelling error (#2270) by YuliangLiu0306
- [autoparallel] gpt2 autoparallel examples (#2267) by YuliangLiu0306
- [autoparallel] patch torch.flatten metainfo for autoparallel (#2247) by Boyuan Yao
- [autoparallel] autoparallel initialize (#2238) by YuliangLiu0306
- [autoparallel] fix construct meta info. (#2245) by Super Daniel
- [autoparallel] record parameter attribute in colotracer (#2217) by YuliangLiu0306
- [autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162) by Boyuan Yao
- [autoparallel] new metainfoprop based on metainfo class (#2179) by Boyuan Yao
- [autoparallel] update getitem handler (#2207) by YuliangLiu0306
- [autoparallel] updategetattrhandler (#2193) by YuliangLiu0306
- [autoparallel] add gpt2 performance test code (#2194) by YuliangLiu0306
- [autoparallel] integrategptrelated_tests (#2134) by YuliangLiu0306
- [autoparallel] memory estimation for shape consistency (#2144) by Boyuan Yao
- [autoparallel] use metainfo in handler (#2149) by YuliangLiu0306
Gemini
- [Gemini] fix the converttotorch_module bug (#2269) by Jiarui Fang
Pipeline middleware
- [Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232) by Ziyue Jiang
Builder
- [builder] builder for scaleduppertriangmaskedsoftmax (#2234) by Jiarui Fang
- [builder] polish builder with better base class (#2216) by Jiarui Fang
- [builder] raise Error when CUDA_HOME is not set (#2213) by Jiarui Fang
- [builder] multihead attn runtime building (#2203) by Jiarui Fang
- [builder] unified cpuoptim fusedoptim inferface (#2190) by Jiarui Fang
- [builder] use runtime builder for fused_optim (#2189) by Jiarui Fang
- [builder] runtime adam and fused_optim builder (#2184) by Jiarui Fang
- [builder] use builder() for cpu adam and fused optim in setup.py (#2187) by Jiarui Fang
Logger
- [logger] hotfix, missing _FORMAT (#2231) by Super Daniel
Diffusion
- [diffusion] update readme (#2214) by HELSON
Testing
- [testing] add beit model for unit testings (#2196) by HELSON
NFC
- [NFC] fix some typos' (#2175) by ziyuhuang123
- [NFC] update news link (#2191) by binmakeswell
- [NFC] fix a typo 'stable-diffusion-typo-fine-tune' by Arsmart1
Exmaple
- [exmaple] diffuser, support quant inference for stable diffusion (#2186) by BlueRum
- [exmaple] add vit missing functions (#2154) by Jiarui Fang
Pipeline middleware
- [Pipeline Middleware ] Fix deadlock when nummicrobatch=numstage (#2156) by Ziyue Jiang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.0...v0.1.13
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.13 Release Today!
What's Changed
Version
- [version] 0.1.13 (#2152) by Jiarui Fang
- Revert "[version] version to v0.1.13 (#2139)" (#2153) by Jiarui Fang
- [version] version to v0.1.13 (#2139) by Jiarui Fang
Gemini
- [Gemini] GeminiDPP convert to PyTorch Module. (#2151) by Jiarui Fang
- [Gemini] Update coloinitctx to support metatensor (#2147) by BlueRum
- [Gemini] revert ZeROInitCtx related tracer (#2138) by Jiarui Fang
- [Gemini] update API of the chunkmemstatscollector. (#2129) by Jiarui Fang
- [Gemini] update the non model data record method in runtime memory tracer (#2128) by Jiarui Fang
- [Gemini] test step-tensor mapping using repeatedcomputedlayers.py (#2127) by Jiarui Fang
- [Gemini] update non model data calculation method (#2126) by Jiarui Fang
- [Gemini] hotfix the unittest bugs (#2125) by Jiarui Fang
- [Gemini] mapping of preop timestep and param (#2124) by Jiarui Fang
- [Gemini] chunk init using runtime visited param order (#2115) by Jiarui Fang
- [Gemini] chunk init use OrderedParamGenerator (#2110) by Jiarui Fang
Nfc
- [NFC] remove useless graph node code (#2150) by Jiarui Fang
- [NFC] update chunk manager API (#2119) by Jiarui Fang
- [NFC] polish comments for Chunk class (#2116) by Jiarui Fang
Autoparallel
- [autoparallel] process size nodes in runtime pass (#2130) by YuliangLiu0306
- [autoparallel] implement softmax handler (#2132) by YuliangLiu0306
- [autoparallel] gpt2lp runtimee test (#2113) by YuliangLiu0306
Example
- Merge pull request #2120 from Fazziekey/example/stablediffusion-v2 by Fazzie-Maqianli
Optimizer
- [optimizer] add div_scale for optimizers (#2117) by HELSON
Pp middleware
- [PP Middleware] Add bwd and step for PP middleware (#2111) by Ziyue Jiang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.13...v0.1.12
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.12 Release Today!
What's Changed
Zero
- [zero] add L2 gradient clipping for ZeRO (#2112) by HELSON
Gemini
- [gemini] get the param visited order during runtime (#2108) by Jiarui Fang
- [Gemini] NFC, polish searchchunkconfiguration (#2107) by Jiarui Fang
- [Gemini] gemini use the runtime memory tracer (RMT) (#2099) by Jiarui Fang
- [Gemini] make RuntimeMemTracer work correctly (#2096) by Jiarui Fang
- [Gemini] remove eval in gemini unittests! (#2092) by Jiarui Fang
- [Gemini] remove GLOBALMODELDATA_TRACER (#2091) by Jiarui Fang
- [Gemini] remove GLOBALCUDAMEM_INFO (#2090) by Jiarui Fang
- [Gemini] use MemStats in Runtime Memory tracer (#2088) by Jiarui Fang
- [Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) by Jiarui Fang
- [Gemini] remove static tracer (#2083) by Jiarui Fang
- [Gemini] ParamOpHook -> ColoParamOpHook (#2080) by Jiarui Fang
- [Gemini] polish runtime tracer tests (#2077) by Jiarui Fang
- [Gemini] rename hooks related to runtime mem tracer (#2076) by Jiarui Fang
- [Gemini] add albert in test models. (#2075) by Jiarui Fang
- [Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) by Jiarui Fang
- [Gemini] remove not used MemtracerWrapper (#2072) by Jiarui Fang
- [Gemini] fix grad unreleased issue and param recovery issue (#2052) by Zihao
Hotfix
- [hotfix] fix a type in ColoInitContext (#2106) by Jiarui Fang
- [hotfix] update test for latest version (#2060) by YuliangLiu0306
- [hotfix] skip gpt tracing test (#2064) by YuliangLiu0306
Colotensor
- [ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105) by Jiarui Fang
Autoparallel
- [autoparallel] support linear function bias addition (#2104) by YuliangLiu0306
- [autoparallel] support addbmm computation (#2102) by YuliangLiu0306
- [autoparallel] add sum handler (#2101) by YuliangLiu0306
- [autoparallel] add bias addtion function class (#2098) by YuliangLiu0306
- [autoparallel] complete gpt related module search (#2097) by YuliangLiu0306
- [autoparallel]add embedding handler (#2089) by YuliangLiu0306
- [autoparallel] add tensor constructor handler (#2082) by YuliangLiu0306
- [autoparallel] add non_split linear strategy (#2078) by YuliangLiu0306
- [autoparallel] Add F.conv metainfo (#2069) by Boyuan Yao
- [autoparallel] complete gpt block searching (#2065) by YuliangLiu0306
- [autoparallel] add binary elementwise metainfo for auto parallel (#2058) by Boyuan Yao
- [autoparallel] fix forward memory calculation (#2062) by Boyuan Yao
- [autoparallel] adapt solver with self attention (#2037) by YuliangLiu0306
Version
- [version] 0.1.11rc5 -> 0.1.12 (#2103) by Jiarui Fang
Pipeline middleware
- [Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) by Ziyue Jiang
- [Pipeline Middleware] Adapt scheduler for Topo (#2066) by Ziyue Jiang
Fx
- [fx] An experimental version of ColoTracer.' (#2002) by Super Daniel
Example
- [example] update GPT README (#2095) by ZijianYY
Device
- [device] update flatten device mesh usage (#2079) by YuliangLiu0306
Test
- [test] bert test in non-distributed way (#2074) by Jiarui Fang
Pipeline
- [Pipeline] Add Topo Class (#2059) by Ziyue Jiang
Examples
- [examples] update autoparallel demo (#2061) by YuliangLiu0306
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.12...v0.1.11rc5
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.11rc5 Release Today!
What's Changed
Release
- [release] update to 0.1.11rc5 (#2053) by Frank Lee
Cli
- [cli] updated installation cheheck with more inforamtion (#2050) by Frank Lee
Gemini
- [gemini] fix init bugs for modules (#2047) by HELSON
- [gemini] add arguments (#2046) by HELSON
- [Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) by Zihao
- [Gemini] more tests for Gemini (#2038) by Jiarui Fang
- [Gemini] more rigorous unit tests for runfwdbwd (#2034) by Jiarui Fang
- [Gemini] paramWrapper paramTracerHook unitest (#2030) by Zihao
- [Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) by Jiarui Fang
- [gemini] paramtracehook (#2020) by Zihao
- [Gemini] add unitests to check gemini correctness (#2015) by Jiarui Fang
- [Gemini] ParamMemHook (#2008) by Zihao
- [Gemini] paramtracerwrapper and test case (#2009) by Zihao
Setup
- [setup] supported conda-installed torch (#2048) by Frank Lee
Test
- [test] align model name with the file name. (#2045) by Jiarui Fang
Hotfix
- [hotfix] hotfix Gemini for no leaf modules bug (#2043) by Jiarui Fang
- [hotfix] add bert test for gemini fwd bwd (#2035) by Jiarui Fang
- [hotfix] revert bug PRs (#2016) by Jiarui Fang
Zero
- [zero] fix testing parameters (#2042) by HELSON
- [zero] fix unit-tests (#2039) by HELSON
- [zero] test gradient accumulation (#1964) by HELSON
Testing
- [testing] fix testing models (#2036) by HELSON
Rpc
- [rpc] split with dag (#2028) by Ziyue Jiang
Autoparallel
- [autoparallel] add split handler (#2032) by YuliangLiu0306
- [autoparallel] add experimental permute handler (#2029) by YuliangLiu0306
- [autoparallel] add runtime pass and numerical test for view handler (#2018) by YuliangLiu0306
- [autoparallel] add experimental view handler (#2011) by YuliangLiu0306
- [autoparallel] mix gather (#1977) by Genghan Zhang
Fx
- [fx]Split partition with DAG information (#2025) by Ziyue Jiang
Github
- [GitHub] update issue template (#2023) by binmakeswell
Workflow
- [workflow] removed unused pypi release workflow (#2022) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc5...v0.1.11rc4
- Python
Published by github-actions[bot] about 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.11rc4 Release Today!
What's Changed
Workflow
- [workflow] fixed the python and cpu arch mismatch (#2010) by Frank Lee
- [workflow] fixed the typo in condarc (#2006) by Frank Lee
- [workflow] added conda cache and fixed no-compilation bug in release (#2005) by Frank Lee
Gemini
- [Gemini] add an inlineopmodule to common test models and polish unitests. (#2004) by Jiarui Fang
- [Gemini] open grad checkpoint when model building (#1984) by Jiarui Fang
- [Gemini] add bert for MemtracerWrapper unintests (#1982) by Jiarui Fang
- [Gemini] MemtracerWrapper unittests (#1981) by Jiarui Fang
- [Gemini] memory trace hook (#1978) by Jiarui Fang
- [Gemini] independent runtime tracer (#1974) by Jiarui Fang
- [Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) by Jiarui Fang
- [Gemini] clean no used MemTraceOp (#1970) by Jiarui Fang
- [Gemini] polish memstats collector (#1962) by Jiarui Fang
- [Gemini] add GeminiAdamOptimizer (#1960) by Jiarui Fang
Autoparallel
- [autoparallel] Add metainfo support for F.linear (#1987) by Boyuan Yao
- [autoparallel] use pytree map style to process data (#1989) by YuliangLiu0306
- [autoparallel] adapt handlers with attention block (#1990) by YuliangLiu0306
- [autoparallel] support more flexible data type (#1967) by YuliangLiu0306
- [autoparallel] add pooling metainfo (#1968) by Boyuan Yao
- [autoparallel] support distributed dataloader option (#1906) by YuliangLiu0306
- [autoparallel] Add alpha beta (#1973) by Genghan Zhang
- [autoparallel] add torch.nn.ReLU metainfo (#1868) by Boyuan Yao
- [autoparallel] support addmm in tracer and solver (#1961) by YuliangLiu0306
- [autoparallel] remove redundancy comm node (#1893) by YuliangLiu0306
Fx
- [fx] add more meta_registry for MetaTensor execution. (#2000) by Super Daniel
Hotfix
- [hotfix] make Gemini work for conv DNN (#1998) by Jiarui Fang
Example
- [example] add diffusion inference (#1986) by Fazzie-Maqianli
- [example] enhance GPT demo (#1959) by Jiarui Fang
- [example] add vit (#1942) by Jiarui Fang
Kernel
- [kernel] move all symlinks of kernel to
colossalai._C(#1971) by ver217
Polish
- [polish] remove useless file memtracer_hook.py (#1963) by Jiarui Fang
Zero
- [zero] fix memory leak for zero2 (#1955) by HELSON
Colotensor
- [ColoTensor] reconfig ColoInitContext, decouple defaultpg and defaultdist_spec. (#1953) by Jiarui Fang
- [ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) by Jiarui Fang
Tutorial
- [tutorial] polish all README (#1946) by binmakeswell
- [tutorial] added missing dummy dataloader (#1944) by Frank Lee
- [tutorial] fixed pipeline bug for sequence parallel (#1943) by Frank Lee
Tensorparallel
- [tensorparallel] fixed tp layers (#1938) by アマデウス
Sc demo
- [sc demo] add requirements to spmd README (#1941) by YuliangLiu0306
Sc
- [SC] remove redundant hands on (#1939) by Boyuan Yao
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc4...v0.1.11rc3
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.11rc3 Release Today!
What's Changed
Release
- [release] update version (#1931) by ver217
Tutorial
- [tutorial] polish README and OPT files (#1930) by binmakeswell
- [tutorial] add synthetic dataset for opt (#1924) by ver217
- [tutorial] updated hybrid parallel readme (#1928) by Frank Lee
- [tutorial] added synthetic data for sequence parallel (#1927) by Frank Lee
- [tutorial] removed huggingface model warning (#1925) by Frank Lee
- Hotfix/tutorial readme index (#1922) by Frank Lee
- [tutorial] modify hands-on of auto activation checkpoint (#1920) by Boyuan Yao
- [tutorial] added synthetic data for hybrid parallel (#1921) by Frank Lee
- [tutorial] added synthetic data for hybrid parallel (#1919) by Frank Lee
- [tutorial] added synthetic dataset for auto parallel demo (#1918) by Frank Lee
- [tutorial] updated auto parallel demo with latest data path (#1917) by Frank Lee
- [tutorial] added data script and updated readme (#1916) by Frank Lee
- [tutorial] add cifar10 for diffusion (#1907) by binmakeswell
- [tutorial] removed duplicated tutorials (#1904) by Frank Lee
- [tutorial] edited hands-on practices (#1899) by BoxiangW
Example
- [example] update auto_parallel img path (#1910) by binmakeswell
- [example] add cifar10 dadaset for diffusion (#1902) by Fazzie-Maqianli
- [example] migrate diffusion and auto_parallel hands-on (#1871) by binmakeswell
- [example] initialize tutorial (#1865) by binmakeswell
- Merge pull request #1842 from feifeibear/jiarui/polish by Fazzie-Maqianli
- [example] polish diffusion readme by jiaruifang
Sc
- [SC] add GPT example for auto checkpoint (#1889) by Boyuan Yao
- [sc] add examples for auto checkpoint. (#1880) by Super Daniel
Nfc
- [NFC] polish colossalai/amp/naiveamp/init_.py code style (#1905) by Junming Wu
- [NFC] remove redundant dependency (#1869) by binmakeswell
- [NFC] polish .github/workflows/scripts/buildcolossalaiwheel.py code style (#1856) by yuxuan-lou
- [NFC] polish .github/workflows/scripts/generatereleasedraft.py code style (#1855) by Ofey Chan
- [NFC] polish workflows code style (#1854) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/amp/apexamp/init_.py code style (#1853) by LuGY
- [NFC] polish .readthedocs.yaml code style (#1852) by nuszzh
- [NFC] polish <.github/workflows/release_nightly.yml> code style (#1851) by RichardoLuo
- [NFC] polish amp.naiveamp.gradscaler code style by zbian
- [NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/operatorhandler.py code style (#1845) by HELSON
- [NFC] polish ./colossalai/amp/torchamp/init_.py code style (#1836) by Genghan Zhang
- [NFC] polish .github/workflows/build.yml code style (#1837) by xyupeng
- [NFC] polish colossalai/autoparallel/tensorshard/deprecated/ophandler/convhandler.py code style (#1829) by Sze-qq
- [NFC] polish colossalai/amp/torchamp/grad_scaler.py code style (#1823) by Ziyue Jiang
- [NFC] polish .github/workflows/release_docker.yml code style by Maruyama_Aya
- [NFC] polish .github/workflows/submodule.yml code style (#1822) by shenggan
- [NFC] polish .github/workflows/draftgithubrelease_post.yml code style (#1820) by Arsmart1
- [NFC] polish colossalai/amp/naiveamp/fp16_optimizer.py code style (#1819) by Fazzie-Maqianli
- [NFC] polish colossalai/amp/naiveamp/utils.py code style (#1816) by CsRic
- [NFC] polish .github/workflows/buildgpu8.yml code style (#1813) by Zangwei Zheng
- [NFC] polish MANIFEST.in code style (#1814) by Zirui Zhu
- [NFC] polish strategies_constructor.py code style (#1806) by binmakeswell
Doc
- [doc] add news (#1901) by binmakeswell
Zero
- [zero] migrate zero1&2 (#1878) by HELSON
Autoparallel
- [autoparallel] user-friendly API for CheckpointSolver. (#1879) by Super Daniel
- [autoparallel] fix linear logical convert issue (#1857) by YuliangLiu0306
Fx
- [fx] metainfo_trace as an API. (#1873) by Super Daniel
Hotfix
- [hotfix] pass testcompleteworkflow (#1877) by Jiarui Fang
Inference
- [inference] overlap comm and compute in Linear1DRow when streamchunk_num > 1 (#1876) by Jiarui Fang
- [inference] streaming Linear 1D Row inference (#1874) by Jiarui Fang
Amp
- [amp] add torch amp test (#1860) by xcnick
Diffusion
- [diffusion] fix package conflicts (#1875) by HELSON
Utils
- [utils] fixed lazy init context (#1867) by Frank Lee
- [utils] remove lazymemoryallocate from ColoInitContext (#1844) by Jiarui Fang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc3...v0.1.11rc2
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.11rc2 Release Today!
What's Changed
Autoparallel
- [autoparallel] fix bugs caused by negative dim key (#1808) by YuliangLiu0306
- [autoparallel] fix bias addition module (#1800) by YuliangLiu0306
- [autoparallel] add batch norm metainfo (#1815) by Boyuan Yao
- [autoparallel] add conv metainfo class for auto parallel (#1796) by Boyuan Yao
- [autoparallel]add essential CommActions for broadcast oprands (#1793) by YuliangLiu0306
- [autoparallel] refactor and add rotorc. (#1789) by Super Daniel
- [autoparallel] add getattr handler (#1767) by YuliangLiu0306
- [autoparallel] added matmul handler (#1763) by Frank Lee
- [autoparallel] fix conv handler numerical test (#1771) by YuliangLiu0306
- [autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764) by Super Daniel
- [autoparallel] add numerical test for handlers (#1769) by YuliangLiu0306
- [autoparallel] update CommSpec to CommActions (#1768) by YuliangLiu0306
- [autoparallel] add numerical test for node strategies (#1760) by YuliangLiu0306
- [autoparallel] refactor the runtime apply pass and add docstring to passes (#1757) by YuliangLiu0306
- [autoparallel] added binary elementwise node handler (#1758) by Frank Lee
- [autoparallel] fix param hook issue in transform pass (#1755) by YuliangLiu0306
- [autoparallel] added addbmm handler (#1751) by Frank Lee
- [autoparallel] shard param and buffer as expected (#1753) by YuliangLiu0306
- [autoparallel] add sequential order to communication actions (#1735) by YuliangLiu0306
- [autoparallel] recovered skipped test cases (#1748) by Frank Lee
- [autoparallel] fixed wrong sharding strategy in conv handler (#1747) by Frank Lee
- [autoparallel] fixed wrong generated strategy for dot op (#1746) by Frank Lee
- [autoparallel] handled illegal sharding strategy in shape consistency (#1744) by Frank Lee
- [autoparallel] handled illegal strategy in node handler (#1743) by Frank Lee
- [autoparallel] handled illegal sharding strategy (#1728) by Frank Lee
Kernel
- [kernel] added jit warmup (#1792) by アマデウス
- [kernel] more flexible flashatt interface (#1804) by oahzxl
- [kernel] skip tests of flash_attn and triton when they are not available (#1798) by Jiarui Fang
Gemini
- [Gemini] make gemini usage simple (#1821) by Jiarui Fang
Checkpointio
- [CheckpointIO] a uniform checkpoint I/O module (#1689) by ver217
Doc
- [doc] polish diffusion README (#1840) by binmakeswell
- [doc] remove obsolete API demo (#1833) by binmakeswell
- [doc] add diffusion (#1827) by binmakeswell
- [doc] add FastFold (#1766) by binmakeswell
Example
- [example] remove useless readme in diffusion (#1831) by Jiarui Fang
- [example] add TP to GPT example (#1828) by Jiarui Fang
- [example] add stable diffuser (#1825) by Fazzie-Maqianli
- [example] simplify the GPT2 huggingface example (#1826) by Jiarui Fang
- [example] opt does not depend on Titans (#1811) by Jiarui Fang
- [example] add GPT by Jiarui Fang
- [example] add opt model in lauguage (#1809) by Jiarui Fang
- [example] add diffusion to example (#1805) by Jiarui Fang
Nfc
- [NFC] update gitignore remove DS_Store (#1830) by Jiarui Fang
- [NFC] polish type hint for shape consistency (#1801) by Jiarui Fang
- [NFC] polish tests/testlayers/test3d/test_3d.py code style (#1740) by Ziheng Qin
- [NFC] polish tests/testlayers/test3d/checks_3d/common.py code style (#1733) by lucasliunju
- [NFC] polish colossalai/nn/metric/_utils.py code style (#1727) by Sze-qq
- [NFC] polish tests/testlayers/test3d/checks3d/checklayer_3d.py code style (#1731) by Xue Fuzhao
- [NFC] polish tests/testlayers/testsequence/checksseq/checklayer_seq.py code style (#1723) by xyupeng
- [NFC] polish accuracy_2d.py code style (#1719) by Ofey Chan
- [NFC] polish .github/workflows/scripts/buildcolossalaiwheel.py code style (#1721) by Arsmart1
- [NFC] polish checkpointhook.py code style (#1722) by LuGY
- [NFC] polish test2p5d/checks2p5d/checkoperation2p5d.py code style (#1718) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/zero/shardedparam/init_.py code style (#1717) by CsRic
- [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) by yuxuan-lou
- [NFC] polish tests/testlayers/test2d/checks2d/checkoperation_2d.py code style (#1715) by binmakeswell
- [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) by shenggan
Fx
- [fx] add a symbolic_trace api. (#1812) by Super Daniel
- [fx] skip diffusers unitest if it is not installed (#1799) by Jiarui Fang
- [fx] Add linear metainfo class for auto parallel (#1783) by Boyuan Yao
- [fx] support module with bias addition (#1780) by YuliangLiu0306
- [fx] refactor memory utils and extend shard utils. (#1754) by Super Daniel
- [fx] test tracer on diffuser modules. (#1750) by Super Daniel
Hotfix
- [hotfix] fix build error when torch version >= 1.13 (#1803) by xcnick
- [hotfix] polish flash attention (#1802) by oahzxl
- [hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786) by HELSON
- [hotfix] polish chunk import (#1787) by Jiarui Fang
- [hotfix] autoparallel unit test (#1752) by YuliangLiu0306
Pipeline
- [Pipeline]Adapt to Pipelinable OPT (#1782) by Ziyue Jiang
Ci
- [CI] downgrade fbgemm. (#1778) by Super Daniel
Compatibility
- [compatibility] ChunkMgr import error (#1772) by Jiarui Fang
Feat
- [feat] add flash attention (#1762) by oahzxl
Fx/profiler
- [fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730) by Super Daniel
Workflow
- [workflow] handled the git directory ownership error (#1741) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc2...v0.1.11rc1
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.11rc1 Release Today!
What's Changed
Hotfix
- [hotfix] resharding cost issue (#1742) by YuliangLiu0306
- [hotfix] solver bug caused by dict type comm cost (#1686) by YuliangLiu0306
- [hotfix] fix wrong type name in profiler (#1678) by Boyuan Yao
- [hotfix]unit test (#1670) by YuliangLiu0306
- [hotfix] add recompile after graph manipulatation (#1621) by YuliangLiu0306
- [hotfix] got sliced types (#1614) by YuliangLiu0306
Release
- [release] update to v0.1.11 (#1736) by Frank Lee
Doc
- [doc] update recommendation system catalogue (#1732) by binmakeswell
- [doc] update recommedation system urls (#1725) by Jiarui Fang
Zero
- [zero] add chunk init function for users (#1729) by HELSON
- [zero] add constant placement policy (#1705) by HELSON
Pre-commit
- [pre-commit] update pre-commit (#1726) by HELSON
Autoparallel
- [autoparallel] runtimebackwardapply (#1720) by YuliangLiu0306
- [autoparallel] moved tests to testtensorshard (#1713) by Frank Lee
- [autoparallel] resnet block runtime apply (#1709) by YuliangLiu0306
- [autoparallel] fixed broken node handler tests (#1708) by Frank Lee
- [autoparallel] refactored the autoparallel module for organization (#1706) by Frank Lee
- [autoparallel] adapt runtime passes (#1703) by YuliangLiu0306
- [autoparallel] collated all deprecated files (#1700) by Frank Lee
- [autoparallel] init new folder structure (#1696) by Frank Lee
- [autoparallel] adapt solver and CostGraph with new handler (#1695) by YuliangLiu0306
- [autoparallel] add output handler and placeholder handler (#1694) by YuliangLiu0306
- [autoparallel] add pooling handler (#1690) by YuliangLiu0306
- [autoparallel] wherehandlerv2 (#1688) by YuliangLiu0306
- [autoparallel] fix C version rotor inconsistency (#1691) by Boyuan Yao
- [autoparallel] added sharding spec conversion for linear handler (#1687) by Frank Lee
- [autoparallel] add reshape handler v2 and fix some previous bug (#1683) by YuliangLiu0306
- [autoparallel] add unary element wise handler v2 (#1674) by YuliangLiu0306
- [autoparallel] add following node generator (#1673) by YuliangLiu0306
- [autoparallel] add layer norm handler v2 (#1671) by YuliangLiu0306
- [autoparallel] fix insecure subprocess (#1680) by Boyuan Yao
- [autoparallel] add rotor C version (#1658) by Boyuan Yao
- [autoparallel] added utils for broadcast operation (#1665) by Frank Lee
- [autoparallel] update CommSpec (#1667) by YuliangLiu0306
- [autoparallel] added bias comm spec to matmul strategy (#1664) by Frank Lee
- [autoparallel] add batch norm handler v2 (#1666) by YuliangLiu0306
- [autoparallel] remove no strategy nodes (#1652) by YuliangLiu0306
- [autoparallel] added compute resharding costs for node handler (#1662) by Frank Lee
- [autoparallel] added new strategy constructor template (#1661) by Frank Lee
- [autoparallel] added node handler for bmm (#1655) by Frank Lee
- [autoparallel] add conv handler v2 (#1663) by YuliangLiu0306
- [autoparallel] adapt solver with gpt (#1653) by YuliangLiu0306
- [autoparallel] implemented all matmul strategy generator (#1650) by Frank Lee
- [autoparallel] change the following nodes strategies generation logic (#1636) by YuliangLiu0306
- [autoparallel] where handler (#1651) by YuliangLiu0306
- [autoparallel] implemented linear projection strategy generator (#1639) by Frank Lee
- [autoparallel] adapt solver with mlp (#1638) by YuliangLiu0306
- [autoparallel] Add pofo sequence annotation (#1637) by Boyuan Yao
- [autoparallel] add elementwise handler (#1622) by YuliangLiu0306
- [autoparallel] add embedding handler (#1620) by YuliangLiu0306
- [autoparallel] protect bcast handler from invalid strategies (#1631) by YuliangLiu0306
- [autoparallel] add layernorm handler (#1629) by YuliangLiu0306
- [autoparallel] recover the merged node strategy index (#1613) by YuliangLiu0306
- [autoparallel] added new linear module handler (#1616) by Frank Lee
- [autoparallel] added new node handler (#1612) by Frank Lee
- [autoparallel]add bcast matmul strategies (#1605) by YuliangLiu0306
- [autoparallel] refactored the data structure for sharding strategy (#1610) by Frank Lee
- [autoparallel] add bcast op handler (#1600) by YuliangLiu0306
- [autoparallel] added all non-bcast matmul strategies (#1603) by Frank Lee
- [autoparallel] added strategy generator and bmm strategies (#1602) by Frank Lee
- [autoparallel] add reshape handler (#1594) by YuliangLiu0306
- [autoparallel] refactored shape consistency to remove redundancy (#1591) by Frank Lee
- [autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) by YuliangLiu0306
- [autoparallel] added generateshardingspec to utils (#1590) by Frank Lee
- [autoparallel] added solver option dataclass (#1588) by Frank Lee
- [autoparallel] adapt solver with resnet (#1583) by YuliangLiu0306
Fx/meta/rpc
- [fx/meta/rpc] move metaregistration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) by Super Daniel
Embeddings
- [embeddings] add doc in readme (#1711) by Jiarui Fang
- [embeddings] more detailed timer (#1692) by Jiarui Fang
- [embeddings] cache option (#1635) by Jiarui Fang
- [embeddings] use cacheratio instead of cudarow_num (#1611) by Jiarui Fang
- [embeddings] add alreadysplitalong_rank flag for tablewise mode (#1584) by CsRic
Unittest
- [unittest] added doc for the pytest wrapper (#1704) by Frank Lee
- [unittest] supported condititonal testing based on env var (#1701) by Frank Lee
Embedding
- [embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) by Jiarui Fang
- [embedding] polish async copy (#1657) by Jiarui Fang
- [embedding] add more detail profiling (#1656) by Jiarui Fang
- [embedding] print profiling results (#1654) by Jiarui Fang
- [embedding] non-blocking cpu-gpu copy (#1647) by Jiarui Fang
- [embedding] isolate cache_op from forward (#1645) by CsRic
- [embedding] rollback for better FAW performance (#1625) by Jiarui Fang
- [embedding] updates some default parameters by Jiarui Fang
Fx/profiler
- [fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679) by Super Daniel
- [fx/profiler] provide a table of summary. (#1634) by Super Daniel
- [fx/profiler] tuned the calculation of memory estimation (#1619) by Super Daniel
Pipeline/fix-bug
- [pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684) by Kirigaya Kazuto
Pipeline/rank_recorder
- [pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) by Kirigaya Kazuto
Feature
- [feature] A new ZeRO implementation (#1644) by HELSON
- Revert "[feature] new zero implementation (#1623)" (#1643) by Jiarui Fang
- [feature] new zero implementation (#1623) by HELSON
Fx
- [fx] Add concrete info prop (#1677) by Boyuan Yao
- [fx] refactor code for profiler / enable fake tensor movement. (#1646) by Super Daniel
- [fx] fix offload codegen test (#1648) by Boyuan Yao
- [fx] Modify offload codegen (#1618) by Boyuan Yao
- [fx] PoC of runtime shape consistency application (#1607) by YuliangLiu0306
- [fx] Add pofo solver (#1608) by Boyuan Yao
- [fx] Add offload codegen (#1598) by Boyuan Yao
- [fx] provide an accurate estimation of memory. (#1587) by Super Daniel
- [fx] Improve linearize and rotor solver (#1586) by Boyuan Yao
- [fx] Add nested checkpoint in activation checkpoint codegen (#1585) by Boyuan Yao
Pipeline/pytree
- [pipeline/pytree] add pytree to process args and kwargs | provide
data_process_functo process args and kwargs after forward (#1642) by Kirigaya Kazuto
Fix
- [fix] fixed the collective pattern name for consistency (#1649) by Frank Lee
Moe
- [moe] initialize MoE groups by ProcessGroup (#1640) by HELSON
- [moe] fix moe bugs (#1633) by HELSON
- [moe] fix MoE bugs (#1628) by HELSON
Tensor
- [tensor] use communication autograd func (#1617) by YuliangLiu0306
Pipeline/chimera
- [pipeline/chimera] test chimera | fix bug of initializing (#1615) by Kirigaya Kazuto
- [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595) by Kirigaya Kazuto
Workflow
- [workflow] deactivate conda environment before removing (#1606) by Frank Lee
Fx/tuning
- [fx/tuning] tune performance on rotor with meta info. (#1599) by Super Daniel
Hotfix/rotor
- [hotfix/rotor] fix variable names (#1597) by Super Daniel
Nfc
- [NFC] add OPT serving (#1581) by binmakeswell
- [NFC] polish ./colossalai/trainer/hooks/lrscheduler_hook.py code style (#1576) by Boyuan Yao
- [NFC] polish colossalai/zero/shardedmodel/reducescatter.py code style (#1554) by Fazzie-Maqianli
- [NFC] polish utils/tensordetector/init_.py code style (#1573) by CsRic
- [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) by Sze-qq
- [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571) by superhao1995
- [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570) by Jiatong Han
- [NFC] polish colossalai/pipeline/utils.py code style (#1562) by Zirui Zhu
- [NFC] polish colossalai/fx/tracer/metapatch/patchedmodule/convolution.py code style (#1563) by Xue Fuzhao
- [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) by Zangwei Zheng
- [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) by DouJS
- [NFC] polish colossalai/utils/tensordetector/tensordetector.py code style (#1566) by LuGY
- [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) by BigOneLiXiaoMing
- [NFC] polish colossalai/builder/init.py code style (#1560) by Ziheng Qin
- [NFC] polish colossalai/testing/comparison.py code style. (#1558) by Super Daniel
- [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) by Ofey Chan
- [NFC] polish code colossalai/gemini/update/search_utils.py (#1557) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) by yuxuan-lou
- [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) by shenggan
- [NFC] polish colossalai/nn/ops/embeddingbag.py code style (#1552) by Maruyama_Aya
- [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style by binmakeswell
- [NFC] polish colossalai/utils/multitensorapply/multitensorapply.py code style (#1559) by Kirigaya Kazuto
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc1...v0.1.10
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.10 Release Today!
What's Changed
Embedding
- [embedding] cache_embedding small improvement (#1564) by CsRic
- [embedding] polish parallel embedding tablewise (#1545) by Jiarui Fang
- [embedding] freqawareembedding: add small functions for caller application (#1537) by CsRic
- [embedding] fix a bug in table wise sharding (#1538) by Jiarui Fang
- [embedding] tablewise sharding polish (#1535) by Jiarui Fang
- [embedding] add tablewise sharding for FAW (#1526) by CsRic
Nfc
- [NFC] polish test component gpt code style (#1567) by アマデウス
- [NFC] polish doc style for ColoTensor (#1457) by Jiarui Fang
- [NFC] global vars should be upper case (#1456) by Jiarui Fang
Pipeline/tuning
- [pipeline/tuning] improve dispatch performance both time and space cost (#1544) by Kirigaya Kazuto
Fx
- [fx] provide a stable but not accurate enough version of profiler. (#1547) by Super Daniel
- [fx] Add common node in model linearize (#1542) by Boyuan Yao
- [fx] support meta tracing for aten level computation graphs like functorch. (#1536) by Super Daniel
- [fx] Modify solver linearize and add corresponding test (#1531) by Boyuan Yao
- [fx] add test for meta tensor. (#1527) by Super Daniel
- [fx]patch nn.functional convolution (#1528) by YuliangLiu0306
- [fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521) by Boyuan Yao
- [fx] hack torch_dispatch for meta tensor and autograd. (#1515) by Super Daniel
- [fx] Fix activation codegen dealing with checkpointing first op (#1510) by Boyuan Yao
- [fx] fix the discretize bug (#1506) by Boyuan Yao
- [fx] fix wrong variable name in solver rotor (#1502) by Boyuan Yao
- [fx] Add activation checkpoint solver rotor (#1496) by Boyuan Yao
- [fx] add more op patches for profiler and error message for unsupported ops. (#1495) by Super Daniel
- [fx] fixed adapative pooling size concatenation error (#1489) by Frank Lee
- [fx] add profiler for fx nodes. (#1480) by Super Daniel
- [fx] Fix ckpt functions' definitions in forward (#1476) by Boyuan Yao
- [fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466) by Super Daniel
- [fx] add rules to linearize computation graphs for searching. (#1461) by Super Daniel
- [fx] Add use_reentrant=False to checkpoint in codegen (#1463) by Boyuan Yao
- [fx] fix test and algorithm bugs in activation checkpointing. (#1451) by Super Daniel
- [fx] Use colossalai checkpoint and add offload recognition in codegen (#1439) by Boyuan Yao
- [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446) by Super Daniel
Autoparallel
- [autoparallel]add backward cost info into strategies (#1524) by YuliangLiu0306
- [autoparallel] support fucntion in operator handler (#1529) by YuliangLiu0306
- [autoparallel] change the merge node logic (#1533) by YuliangLiu0306
- [autoparallel] added liveness analysis (#1516) by Frank Lee
- [autoparallel] add more sharding strategies to conv (#1487) by YuliangLiu0306
- [autoparallel] add cost graph class (#1481) by YuliangLiu0306
- [autoparallel] added namespace constraints (#1490) by Frank Lee
- [autoparallel] integrate auto parallel with torch fx (#1479) by Frank Lee
- [autoparallel] added dot handler (#1475) by Frank Lee
- [autoparallel] introduced baseclass for op handler and reduced code redundancy (#1471) by Frank Lee
- [autoparallel] standardize the code structure (#1469) by Frank Lee
- [autoparallel] Add conv handler to generate strategies and costs info for conv (#1467) by YuliangLiu0306
Utils
- [utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548) by ver217
- [utils] optimize partitiontensorparallelstatedict (#1546) by ver217
- [utils] Add usereetrant=False in utils.activationcheckpoint (#1460) by Boyuan Yao
- [utils] Impl clipgradnorm for ColoTensor and ZeroOptimizer (#1442) by ver217
Hotfix
- [hotfix] change namespace for meta_trace. (#1541) by Super Daniel
- [hotfix] fix init context (#1543) by ver217
- [hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530) by Super Daniel
- [hotfix] fix coloproxy typos. (#1519) by Super Daniel
Pipeline/piplelineprocessgroup
- [pipeline/piplelineprocessgroup] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) by Kirigaya Kazuto
Doc
- [doc] docstring for FreqAwareEmbeddingBag (#1525) by Jiarui Fang
- [doc] update readme with the new xTrimoMultimer project (#1477) by Sze-qq
- [doc] update docstring in ProcessGroup (#1468) by Jiarui Fang
- [Doc] add more doc for ColoTensor. (#1458) by Jiarui Fang
Autoparellel
- [autoparellel]add strategies constructor (#1505) by YuliangLiu0306
Faw
- [FAW] cpu caching operations (#1520) by Jiarui Fang
- [FAW] refactor reorder() for CachedParamMgr (#1514) by Jiarui Fang
- [FAW] LFU initialize with dataset freq (#1513) by Jiarui Fang
- [FAW] shrink freq_cnter size (#1509) by CsRic
- [FAW] remove code related to chunk (#1501) by Jiarui Fang
- [FAW] add more docs and fix a warning (#1500) by Jiarui Fang
- [FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494) by CsRic
- [FAW] LFU cache for the FAW by CsRic
- [FAW] init an LFU implementation for FAW (#1488) by Jiarui Fang
- [FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448) by Geng Zhang
Pipeline/rpc
- [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) by Kirigaya Kazuto
- [pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) by Kirigaya Kazuto
- [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483) by Kirigaya Kazuto
- [pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) by Kirigaya Kazuto
Tensor
- [tensor]add 1D device mesh (#1492) by YuliangLiu0306
- [tensor] support runtime ShardingSpec apply (#1453) by YuliangLiu0306
- [tensor] shape consistency generate transform path and communication cost (#1435) by YuliangLiu0306
- [tensor] added linear implementation for the new sharding spec (#1416) by Frank Lee
Fce
- [FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) by Geng Zhang
Workflow
- [workflow] added TensorNVMe to compatibility test (#1449) by Frank Lee
Test
- [test] fixed the activation codegen test (#1447) by Frank Lee
Engin/schedule
- [engin/schedule] use p2pv2 to recontruct pipelineschedule (#1408) by Kirigaya Kazuto
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.10...v0.1.9
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.9 Release Today!
What's Changed
Zero
- [zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
- [zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
- [zero] add hasinfor_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
- [zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
- [zero] add AgChunk (#1417) by HELSON
- [zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
- [zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
- [zero] chunk manager allows filtering ex-large params (#1393) by ver217
- [zero] zero optim statedict takes onlyrank_0 (#1384) by ver217
Fx
- [fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) by Super Daniel
- [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) by Super Daniel
- [fx] fixed torchaudio conformer tracing (#1392) by Frank Lee
- [fx] patched torch.max and data movement operator (#1391) by Frank Lee
- [fx] fixed indentation error in checkpointing codegen (#1385) by Frank Lee
- [fx] patched torch.full for huggingface opt (#1386) by Frank Lee
- [fx] update split module pass and add customized policy (#1373) by YuliangLiu0306
- [fx] add torchaudio test (#1369) by Super Daniel
- [fx] Add colotracer compatibility test on torchrec (#1370) by Boyuan Yao
- [fx]add gpt2 passes for pipeline performance test (#1366) by YuliangLiu0306
- [fx] added activation checkpoint codegen support for torch < 1.12 (#1359) by Frank Lee
- [fx] added activation checkpoint codegen (#1355) by Frank Lee
- [fx] fixed apex normalization patch exception (#1352) by Frank Lee
- [fx] added activation checkpointing annotation (#1349) by Frank Lee
- [fx] update MetaInforProp pass to process more complex node.meta (#1344) by YuliangLiu0306
- [fx] refactor tracer to trace complete graph (#1342) by YuliangLiu0306
- [fx] tested the complete workflow for auto-parallel (#1336) by Frank Lee
- [fx]refactor tracer (#1335) by YuliangLiu0306
- [fx] recovered skipped pipeline tests (#1338) by Frank Lee
- [fx] fixed compatiblity issue with torch 1.10 (#1331) by Frank Lee
- [fx] fixed unit tests for torch 1.12 (#1327) by Frank Lee
- [fx] add balanced policy v2 (#1251) by YuliangLiu0306
- [fx] Add unit test and fix bugs for transformmlppass (#1299) by XYE
- [fx] added apex normalization to patched modules (#1300) by Frank Lee
Recommendation System
- [FAW] export FAW in _ops (#1438) by Jiarui Fang
- [FAW] move coloparam setting in test code. (#1429) by Jiarui Fang
- [FAW] parallel FreqAwareEmbedding (#1424) by Jiarui Fang
- [FAW] add cache manager for the cached embedding (#1419) by Jiarui Fang
Global Tensor
- [tensor] add shape consistency feature to support auto spec transform (#1418) by YuliangLiu0306
- [tensor]build sharding spec to replace distspec in future. (#1405) by YuliangLiu0306
Hotfix
- [hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
- [hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
- [hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
- [hotfix] fix a running error in testcolocheckpoint.py (#1387) by HELSON
- [hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
- [hotfix] fix zero optim save/load state dict (#1381) by ver217
- [hotfix] fix zero ddp buffer cast (#1376) by ver217
- [hotfix] fix no optimizer in save/load (#1363) by HELSON
- [hotfix] fix megatroninit in testgpt2.py (#1357) by HELSON
- [hotfix] ZeroDDP use new process group (#1333) by ver217
- [hotfix] shared model returns cpu state_dict (#1328) by ver217
- [hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
- [hotfix] fix unit test testmodulespec (#1321) by HELSON
- [hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
- [hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
- [hotfix] add missing file (#1308) by Jiarui Fang
- [hotfix] remove potiential circle import (#1307) by Jiarui Fang
- [hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
- [hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
- [hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang
Communication
- [communication] add p2p_v2.py to support communication with ListAny by Kirigaya Kazuto
Device
- [device] add DeviceMesh class to support logical device layout (#1394) by YuliangLiu0306
Chunk
- [chunk] add PG check for tensor appending (#1383) by Jiarui Fang
DDP
- [DDP] test ddp state dict uses more strict threshold (#1382) by ver217
Checkpoint
- [checkpoint] add kwargs for loadstatedict (#1374) by HELSON
- [checkpoint] use args, kwargs in savecheckpoint, loadcheckpoint (#1368) by HELSON
- [checkpoint] sharded optim save/load grad scaler (#1350) by ver217
- [checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
- [checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
- [checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang
Util
- [util] standard checkpoint function naming (#1377) by Frank Lee
Nvme
- [nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217
Colotensor
- [colotensor] use cpu memory to store state_dict (#1367) by HELSON
- [colotensor] add Tensor.view op and its unit test (#1343) by HELSON
Unit test
- [unit test] add megatron init test in zero_optim (#1358) by HELSON
Docker
- [docker] add tensornvme in docker (#1354) by ver217
Doc
- [doc] update rst and docstring (#1351) by ver217
Refactor
- [refactor] refactor ColoTensor's unit tests (#1340) by HELSON
Workflow
- [workflow] update docker build workflow to use proxy (#1334) by Frank Lee
- [workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
- [workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
- [workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
- [workflow] updated release bdist workflow (#1318) by Frank Lee
- [workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
- [workflow] updated pytorch compatibility test (#1311) by Frank Lee
Test
- [test] removed outdated unit test for meta context (#1329) by Frank Lee
Utils
- [utils] integrated colotensor with lazy init context (#1324) by Frank Lee
Optimizer
- [Optimizer] Remove useless ColoOptimizer (#1312) by Jiarui Fang
- [Optimizer] polish the init method of ColoOptimizer (#1310) by Jiarui Fang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.9...v0.1.8
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.8 Release Today!
What's Changed
Hotfix
- [hotfix] torchvison fx unittests miss import pytest (#1277) by Jiarui Fang
- [hotfix] fix an assertion bug in base schedule. (#1250) by YuliangLiu0306
- [hotfix] fix sharded optim step and clipgradnorm (#1226) by ver217
- [hotfix] fx get comm size bugs (#1233) by Jiarui Fang
- [hotfix] fx shard 1d pass bug fixing (#1220) by Jiarui Fang
- [hotfix]fixed p2p process send stuck (#1181) by YuliangLiu0306
- [hotfix]different overflow status lead to communication stuck. (#1175) by YuliangLiu0306
- [hotfix]fix some bugs caused by refactored schedule. (#1148) by YuliangLiu0306
Tensor
- [tensor] distributed checkpointing for parameters (#1240) by Jiarui Fang
- [tensor] redistribute among different process groups (#1247) by Jiarui Fang
- [tensor] a shorter shard and replicate spec (#1245) by Jiarui Fang
- [tensor] redirect .data.get to a tensor instance (#1239) by HELSON
- [tensor] add zero_like colo op, important for Optimizer (#1236) by Jiarui Fang
- [tensor] fix some unittests (#1234) by Jiarui Fang
- [tensor] fix a assertion in colotensor crossentropy (#1232) by HELSON
- [tensor] add unitest for colotensor 1DTP crossentropy (#1230) by HELSON
- [tensor] torch function return colotensor (#1229) by Jiarui Fang
- [tensor] improve robustness of class 'ProcessGroup' (#1223) by HELSON
- [tensor] sharded global process group (#1219) by Jiarui Fang
- [Tensor] add cpu group to ddp (#1200) by Jiarui Fang
- [tensor] remove gpc in tensor tests (#1186) by Jiarui Fang
- [tensor] revert local view back (#1178) by Jiarui Fang
- [Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176) by Jiarui Fang
- [Tensor] rename parallel_action (#1174) by Ziyue Jiang
- [Tensor] distributed view supports inter-process hybrid parallel (#1169) by Jiarui Fang
- [Tensor] remove ParallelAction, use ComputeSpec instread (#1166) by Jiarui Fang
- [tensor] add embedding bag op (#1156) by ver217
- [tensor] add more element-wise ops (#1155) by ver217
- [tensor] fixed non-serializable colo parameter during model checkpointing (#1153) by Frank Lee
- [tensor] dist spec s2s uses all-to-all (#1136) by ver217
- [tensor] added repr to spec (#1147) by Frank Lee
Fx
- [fx] added ndim property to proxy (#1253) by Frank Lee
- [fx] fixed tracing with apex-based T5 model (#1252) by Frank Lee
- [fx] refactored the file structure of patched function and module (#1238) by Frank Lee
- [fx] methods to get fx graph property. (#1246) by YuliangLiu0306
- [fx]add split module pass and unit test from pipeline passes (#1242) by YuliangLiu0306
- [fx] fixed huggingface OPT and T5 results misalignment (#1227) by Frank Lee
- [fx]get communication size between partitions (#1224) by YuliangLiu0306
- [fx] added patches for tracing swin transformer (#1228) by Frank Lee
- [fx] fixed timm tracing result misalignment (#1225) by Frank Lee
- [fx] added timm model tracing testing (#1221) by Frank Lee
- [fx] added torchvision model tracing testing (#1216) by Frank Lee
- [fx] temporarily used (#1215) by XYE
- [fx] added testing for all albert variants (#1211) by Frank Lee
- [fx] added testing for all gpt variants (#1210) by Frank Lee
- [fx]add uniform policy (#1208) by YuliangLiu0306
- [fx] added testing for all bert variants (#1207) by Frank Lee
- [fx] supported model tracing for huggingface bert (#1201) by Frank Lee
- [fx] added module patch for pooling layers (#1197) by Frank Lee
- [fx] patched conv and normalization (#1188) by Frank Lee
- [fx] supported data-dependent control flow in model tracing (#1185) by Frank Lee
Rename
- [rename] converttodist -> redistribute (#1243) by Jiarui Fang
Checkpoint
- [checkpoint] save sharded optimizer states (#1237) by Jiarui Fang
- [checkpoint]support generalized scheduler (#1222) by Yi Zhao
- [checkpoint] make unitest faster (#1217) by Jiarui Fang
- [checkpoint] checkpoint for ColoTensor Model (#1196) by Jiarui Fang
Polish
- [polish] polish repr for ColoTensor, DistSpec, ProcessGroup (#1235) by HELSON
Refactor
- [refactor] move process group from _DistSpec to ColoTensor. (#1203) by Jiarui Fang
- [refactor] remove gpc dependency in colotensor's _ops (#1189) by Jiarui Fang
- [refactor] move chunk and chunkmgr to directory gemini (#1182) by Jiarui Fang
Context
- [context]support arbitary module materialization. (#1193) by YuliangLiu0306
- [context]use meta tensor to init model lazily. (#1187) by YuliangLiu0306
Ddp
- [ddp] ColoDDP uses bucket all-reduce (#1177) by ver217
- [ddp] refactor ColoDDP and ZeroDDP (#1146) by ver217
Colotensor
- [ColoTensor] add independent process group (#1179) by Jiarui Fang
- [ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) by Jiarui Fang
- [ColoTensor] improves init functions. (#1150) by Jiarui Fang
Zero
- [zero] sharded optim supports loading local state dict (#1170) by ver217
- [zero] zero optim supports loading local state dict (#1171) by ver217
Workflow
- [workflow] polish readme and dockerfile (#1165) by Frank Lee
- [workflow] auto-publish docker image upon release (#1164) by Frank Lee
- [workflow] fixed release post workflow (#1154) by Frank Lee
- [workflow] fixed format error in yaml file (#1145) by Frank Lee
- [workflow] added workflow to auto draft the release post (#1144) by Frank Lee
Gemini
- [gemini] refactor gemini mgr (#1151) by ver217
Pipeline
- [pipeline]add customized policy (#1139) by YuliangLiu0306
- [pipeline]support more flexible pipeline (#1138) by YuliangLiu0306
Ci
- [ci] added scripts to auto-generate release post text (#1142) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.8...v0.1.7
- Python
Published by github-actions[bot] over 3 years ago
https://github.com/hpcaitech/colossalai - Version v0.1.7 Released Today
Version v0.1.7 Released Today
Highlights
- Started torch.fx for auto-parallel training
- Update the zero mechanism with ColoTensor
- Fixed various bugs
What's Changed
Hotfix
- [hotfix] prevent nested ZeRO (#1140) by ver217
- [hotfix]fix bugs caused by refactored pipeline (#1133) by YuliangLiu0306
- [hotfix] fix param op hook (#1131) by ver217
- [hotfix] fix zero init ctx numel (#1128) by ver217
- [hotfix]change to fit latest p2p (#1100) by YuliangLiu0306
- [hotfix] fix chunk comm src rank (#1072) by ver217
Zero
- [zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
- [zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
- [zero] fixed api consistency (#1098) by Frank Lee
- [zero] zero optim copy chunk rather than copy tensor (#1070) by ver217
Optim
- [optim] refactor fused sgd (#1134) by ver217
Ddp
- [ddp] add save/load state dict for ColoDDP (#1127) by ver217
- [ddp] add setparamsto_ignore for ColoDDP (#1122) by ver217
- [ddp] supported customized torch ddp configuration (#1123) by Frank Lee
Pipeline
- [pipeline]support List of Dict data (#1125) by YuliangLiu0306
- [pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) by Frank Lee
- [pipeline] refactor the pipeline module (#1087) by Frank Lee
Fx
- [fx]add autoparallel passes (#1121) by YuliangLiu0306
- [fx] added unit test for coloproxy (#1119) by Frank Lee
- [fx] added coloproxy (#1115) by Frank Lee
Gemini
- [gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
- [gemini] zero supports gemini (#1093) by ver217
Test
- [test] fixed hybrid parallel test case on 8 GPUs (#1106) by Frank Lee
- [test] skip tests when not enough GPUs are detected (#1090) by Frank Lee
- [test] ignore 8 gpu test (#1080) by Frank Lee
Release
- [release] update version.txt (#1103) by Frank Lee
Tensor
- [tensor] refactor param op hook (#1097) by ver217
- [tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) by ver217
- [Tensor] fix equal assert (#1091) by Ziyue Jiang
- [Tensor] 1d row embedding (#1075) by Ziyue Jiang
- [tensor] chunk manager monitor mem usage (#1076) by ver217
- [Tensor] fix optimizer for CPU parallel (#1069) by Ziyue Jiang
- [Tensor] add hybrid device demo and fix bugs (#1059) by Ziyue Jiang
Amp
- [amp] included dict for type casting of model output (#1102) by Frank Lee
Workflow
- [workflow] fixed 8-gpu test workflow (#1101) by Frank Lee
- [workflow] added regular 8 GPU testing (#1099) by Frank Lee
- [workflow] disable p2p via shared memory on non-nvlink machine (#1086) by Frank Lee
Engine
- [engine] fixed empty op hook check (#1096) by Frank Lee
Doc
- [doc] added documentation to chunk and chunk manager (#1094) by Frank Lee
Context
- [context] support lazy init of module (#1088) by Frank Lee
- [context] maintain the context object in with statement (#1073) by Frank Lee
Refactory
- [refactory] add nn.parallel module (#1068) by Jiarui Fang
Cudnn
- [cudnn] set False to cudnn benchmark by default (#1063) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.7...v0.1.6
- Python
Published by FrankLeeeee over 3 years ago
https://github.com/hpcaitech/colossalai - v0.1.6 Released!
Main features
- ColoTensor supports hybrid parallel (tensor parallel and data parallel)
- ColoTensor supports ZeRO (with chunk)
- Config tensor parallel by module via ColoTensor
- ZeroInitContext and ShardedModelV2 support loading checkpoint and hugging face
from_pretrain()
What's Changed
ColoTensor
- [tensor] refactor colo-tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/992
- [tensor] refactor parallel action by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1007
- [tensor] impl ColoDDP for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1009
- [Tensor] add module handler for linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1021
- [Tensor] add module check and bert test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1031
- [Tensor] add Parameter inheritance for ColoParameter by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1041
- [tensor] ColoTensor supports ZeRo by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1015
- [zero] add chunk size search for chunk manager by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1052
Zero
- [zero] add loadstatedict for sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/894
- [zero] add zero optimizer for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1046
Hotfix
- [hotfix] fix colo init context by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1026
- [hotfix] fix some bugs caused by size mismatch. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1011
- [kernel] fixed the include bug in dropout kernel by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/999
- fix typo in constants by @ryanrussell in https://github.com/hpcaitech/ColossalAI/pull/1027
- [engine] fixed bug in gradient accumulation dataloader to keep the last step by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1030
- [hotfix] fix dist spec mgr by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1045
- [hotfix] fix import error in sharded model v2 by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1053
Unit test
- [unit test] refactor test tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1005
CI
- [ci] update the docker image name by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1017
- [ci] added nightly build (#1018) by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1019
- [ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1022
- [ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1029
- [ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1040
CLI
- [cli] remove unused imports by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1001
Documentation
- Hotfix/format by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/987
- [doc] update docker instruction by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1020
Misc
- [NFC] Hotfix/format by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/984
- Revert "[NFC] Hotfix/format" by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/986
- remove useless import in tensor dir by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/997
- [NFC] fix download link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/998
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1003
- [NFC] polish colossalai/kernel/cudanative/csrc/colossalC_frontend.c… by @zhengzangw in https://github.com/hpcaitech/ColossalAI/pull/1010
- [NFC] fix paper link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1012
- [p2p]add object list send/recv by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1024
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1034
- [NFC] add inference by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1044
- [titans]remove model zoo by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1042
- [NFC] add inference submodule in path by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1047
- [release] update version.txt by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1048
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1049
- updated collective ops api by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/1054
- [pipeline]refactor ppschedule to support tensor list by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1050
New Contributors
- @ryanrussell made their first contribution in https://github.com/hpcaitech/ColossalAI/pull/1027
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.5...v0.1.6
- Python
Published by ver217 over 3 years ago
https://github.com/hpcaitech/colossalai - v0.1.5 Released!
Main Features
- Enhance ColoTensor and build a demo to train BERT (from hugging face) using Tensor Parallelism without modifying model.
What's Changed
ColoTensor
- [Tensor] add ColoTensor TP1Dcol Embedding by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/899
- [Tensor] add embedding tp1d row by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/904
- [Tensor] update pytest.mark.parametrize in tensor tests by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/913
- [Tensor] init ColoParameter by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/914
- [Tensor] add a basic bert. by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/911
- [Tensor] polish model test by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/915
- [Tensor] fix test_model by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/916
- [Tensor] add 1d vocab loss by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/918
- [Graph] building computing graph with ColoTensor, Linear only by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/917
- [Tensor] add from_pretrained support and bert pretrained test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/921
- [Tensor] test pretrain loading on multi-process by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/922
- [tensor] hijack addmm for colo tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/923
- [tensor] colo tensor overrides mul by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/927
- [Tensor] simplify named param by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/928
- [Tensor] fix init context by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/931
- [Tensor] add optimizer to bert test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/933
- [tensor] design DistSpec and DistSpecManager for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/934
- [Tensor] add DistSpec for loss and test_model by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/947
- [tensor] derive compute pattern from dist spec by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/971
Pipeline Parallelism
- [pipelinable]use pipelinable to support GPT model. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/903
CI
- [CI] add CI for releasing bdist wheel by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/901
- [CI] fix release bdist CI by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/902
- [ci] added wheel build scripts by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/910
Misc
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/907
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/912
- [setup] update cuda ext cc flags by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/919
- [setup] support more cuda architectures by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/920
- [NFC] update results on a single GPU, highlight quick view by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/981
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.4...v0.1.5
- Python
Published by ver217 almost 4 years ago
https://github.com/hpcaitech/colossalai - v0.1.4 Released!
Main Features
Here are the main improvements of this release: 1. ColoTensor: A data structure that unifies the Tensor representation of different parallel methods. 2. Gemini: More efficient Genimi implementation reduces the overhead of model data statistic collection. 3. CLI: a command-line tool that helps users launch distributed training tasks more easily. 4. Pipeline Parallelism (PP): a more user-friendly API for PP.
What's Changed
ColoTensor
- [tensor]fix colotensor torchfunction by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/825
- [tensor]fix test_linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/826
- [tensor] ZeRO use ColoTensor as the base class. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/828
- [tensor] revert zero tensors back by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/829
- [Tensor] overriding paramters() for Module using ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/889
- [tensor] refine linear and add gather for laynorm by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/893
- [Tensor] test parameters() as member function by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/896
- [Tensor] activation is an attr of ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/897
- [Tensor] initialize the ColoOptimizer by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/898
- [tensor] reorganize files by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/820
- [Tensor] apply ColoTensor on Torch functions by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/821
- [Tensor] update ColoTensor torch_function by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/822
- [tensor] lazy init by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/823
- [WIP] Applying ColoTensor on TP-1D-row Linear. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/831
- Init Conext supports lazy allocate model memory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/842
- [Tensor] TP Linear 1D row by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/843
- [Tensor] add assert for colo_tensor 1Drow by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/846
- [Tensor] init a simple network training with ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/849
- [Tensor ] Add 1Drow weight reshard by spec by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/854
- [Tensor] add layer norm Op by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/852
- [tensor] an initial dea of tensor spec by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/865
- [Tensor] colo init context add device attr. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/866
- [tensor] add crossentropyloss by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/868
- [Tensor] Add function to spec and update linear 1Drow and unit tests by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/869
- [tensor] customized op returns ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/875
- [Tensor] get named parameters for model using ColoTensors by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/874
- [Tensor] Add some attributes to ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/877
- [Tensor] make a simple net works with 1D row TP by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/879
- [tensor] wrap function in the torch_tensor to ColoTensor by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/881
- [Tensor] make ColoTensor more robust for getattr by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/886
- [Tensor] test model check results for a simple net by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/887
- [tensor] add ColoTensor 1Dcol by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/888
Gemini + ZeRO
- [zero] add zero tensor shard strategy by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/793
- Revert "[zero] add zero tensor shard strategy" by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/806
- [gemini] a new tensor structure by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/818
- [gemini] APIs to set cpu memory capacity by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/809
- [DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/808
- [gemini] collect cpu-gpu moving volume in each iteration by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/813
- [gemini] add GeminiMemoryManger by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/832
- [zero] use GeminiMemoryManager when sampling model data by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/850
- [gemini] polish code by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/855
- [gemini] add stateful tensor container by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/867
- [gemini] polish statefultensormgr by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/876
- [gemini] accelerate adjust_layout() by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/878
CLI
- [cli] added distributed launcher command by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/791
- [cli] added micro benchmarking for tp by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/789
- [cli] add missing requirement by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/805
- [cli] fixed a bug in user args and refactored the module structure by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/807
- [cli] fixed single-node process launching by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/812
- [cli] added check installation cli by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/815
- [CLI] refactored the launch CLI and fixed bugs in multi-node launching by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/844
- [cli] refactored micro-benchmarking cli and added more metrics by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/858
Pipeline Parallelism
- [pipelinable]use pipelinable context to initialize non-pipeline model by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/816
- [pipelinable]use ColoTensor to replace dummy tensor. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/853
- ### Misc
- [hotfix] fix auto tensor placement policy by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/775
- [hotfix] change the check assert in split batch 2d by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/772
- [hotfix] fix bugs in zero by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/781
- [hotfix] fix grad offload when enabling reusefp16shard by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/784
- [refactor] moving memtracer to gemini by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/801
- [log] display tflops if available by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/802
- [refactor] moving grad acc logic to engine by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/804
- [log] local throughput metrics by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/811
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/810
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/819
- [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/824
- [setup] allow installation with python 3.6 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/834
- Revert "[WIP] Applying ColoTensor on TP-1D-row Linear." by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/835
- [dependency] removed torchvision by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/833
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/827
- [unittest] refactored unit tests for change in dependency by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/838
- [setup] use env var instead of option for cuda ext by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/839
- [hotfix] ColoTensor pin_memory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/840
- modefied the pp build for ckpt adaptation by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/803
- [hotfix] the bug of numel() in ColoTensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/845
- [hotfix] fix postinit_method of zero init ctx by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/847
- [hotfix] add deconstructor for stateful tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/848
- [utils] refactor profiler by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/837
- [ci] cache cuda extension by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/860
- hotfix tensor unittest bugs by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/862
- [usability] added assertion message in registry by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/864
- [doc] improved docstring in the communication module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/863
- [doc] improved docstring in the logging module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/861
- [doc] improved docstring in the amp module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/857
- [usability] improved error messages in the context module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/856
- [doc] improved error messages in initialize by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/872
- [doc] improved assertion messages in trainer by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/873
- [doc] improved docstring and assertion messages for the engine module by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/871
- [hotfix] fix import error by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/880
- [setup] add local version label by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/890
- [model_zoo] change qkv processing by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/870
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.3...v0.1.4
- Python
Published by feifeibear almost 4 years ago
https://github.com/hpcaitech/colossalai - V0.1.3 Released!
Overview
Here are the main improvements of this release: 1. Gemini: Heterogeneous memory space manager 2. Refactor the API of pipeline parallelism
What's Changed
Features
- [zero] initialize a stateful tensor manager by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/614
- [pipeline] refactor pipeline by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/679
- [zero] stateful tensor manager by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/687
- [zero] adapt zero hooks for unsharded module by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/699
- [zero] refactor memstats collector by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/706
- [zero] improve adaptability for not-shard parameters by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/708
- [zero] check whether gradients have inf and nan in gpu by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/712
- [refactor] refactor the memory utils by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/715
- [util] support detection of number of processes on current node by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/723
- [utils] add synchronized cuda memory monitor by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/740
- [zero] refactor ShardedParamV2 by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/742
- [zero] add tensor placement policies by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/743
- [zero] use factory pattern for tensorplacementpolicy by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/752
- [zero] refactor memstats_collector by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/746
- [gemini] init genimi individual directory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/754
- refactor shard and gather operation by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/773
Bug Fix
- [zero] fix init bugs in zero context by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/686
- [hotfix] update requirements-test by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/701
- [hotfix] fix a bug in 3d vocab parallel embedding by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/707
- [compatibility] fixed tensor parallel compatibility with torch 1.9 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/700
- [hotfix]fixed bugs of assigning grad states to non leaf nodes by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/711
- [hotfix] fix stateful tensor manager's cuda model data size by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/710
- [bug] fixed broken testfoundinf by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/725
- [util] fixed activation checkpointing on torch 1.9 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/719
- [util] fixed communication API with PyTorch 1.9 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/721
- [bug] removed zero installation requirements by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/731
- [hotfix] remove duplicated param register to stateful tensor manager by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/728
- [utils] correct cpu memory used and capacity in the context of multi-process by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/726
- [bug] fixed grad scaler compatibility with torch 1.8 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/735
- [bug] fixed DDP compatibility with torch 1.8 by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/739
- [hotfix] fix memory leak in backward of sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/741
- [hotfix] fix initialize about zero by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/748
- [hotfix] fix prepare grads in sharded optim by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/749
- [hotfix] layernorm by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/750
- [hotfix] fix auto tensor placement policy by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/753
- [hotfix] fix reusefp16shard of sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/756
- [hotfix] fix teststatefultensor_mgr by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/762
- [compatibility] used backward-compatible API for global process group by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/758
- [hotfix] fix the ckpt hook bugs when using DDP by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/769
- [hotfix] polish sharded optim docstr and warning by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/770
Unit Testing
- [ci] replace the ngc docker image with self-built pytorch image by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/672
- [ci] fixed compatibility workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/678
- [ci] update workflow trigger condition and support options by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/691
- [ci] added missing field in workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/692
- [ci] remove ipc config for rootless docker by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/694
- [test] added missing decorators to model checkpointing tests by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/727
- [unitest] add checkpoint for moe zero test by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/729
- [test] added a decorator for address already in use error with backward compatibility by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/760
- [test] refactored with the new rerun decorator by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/763
Documentation
- add PaLM link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/704
- [doc] removed outdated installation command by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/730
- add video by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/732
- [readme] polish readme by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/764
- [readme] sync CN readme by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/766
Miscellaneous
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/556
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/695
- [refactor] zero directory by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/724
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/751
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.2...v0.1.3
- Python
Published by ver217 almost 4 years ago
https://github.com/hpcaitech/colossalai - V0.1.2 Released!
Overview
Here are the main improvements of this release: 1. MOE and BERT models can be trained with ZeRO. 2. Provide a uniform checkpoint for all kinds of parallelism. 3. Optimize ZeRO-offload, and improve model scaling. 4. Design a uniform model memory tracer. 5. Implement an efficient hybrid Adam (CPU and CUDA kernels). 6. Improve activation offloading. 7. Profiler TensorBoard plugin of Beta version. 8. Refactor pipeline module for closer integration with engine. 9. Chinese tutorials, WeChat and Slack user groups.
What's Changed
Features
- [zero] get memory usage for sharded param by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/536
- [zero] improve the accuracy of getmemoryusage of sharded param by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/538
- [zero] refactor model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/537
- [zero] get memory usage of sharded optim v2. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/542
- [zero] polish ZeroInitContext by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/540
- [zero] optimize grad offload by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/539
- [zero] non model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/545
- [zero] add zero config to neutralize zero context init by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/546
- [zero] dump memory stats for sharded model by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/548
- [zero] add stateful tensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/549
- [zero] label state for param fp16 and grad by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/551
- [zero] hijack p.grad in sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/554
- [utils] update colo tensor moving APIs by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/553
- [polish] rename colattr -> coloattr by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/558
- [zero] trace states of fp16/32 grad and fp32 param by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/571
- [zero] adapt zero for unsharded parameters by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/561
- [refactor] memory utils by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/577
- Feature/checkpoint gloo by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/589
- [zero] add sampling time for memstats collector by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/610
- [model checkpoint] checkpoint utils by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/592
- [model checkpoint][hotfix] unified layers for save&load by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/593
- Feature/checkpoint 2D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/595
- Feature/checkpoint 1D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/594
- [model checkpoint] CPU communication ops by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/590
- Feature/checkpoint 2.5D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/596
- Feature/Checkpoint 3D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/597
- [model checkpoint] checkpoint hook by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/598
- Feature/Checkpoint tests by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/599
- [zero] adapt zero for unsharded parameters (Optimizer part) by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/601
- [zero] polish init context by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/645
- refactor pipeline---put runtime schedule into engine. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/627
Bug Fix
- [Zero] process no-leaf-module in Zero by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/535
- Add gather_out arg to Linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/541
- [hoxfix] fix parallelinput flag for Linear1DCol gather_output by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/579
- [hotfix] add hybrid adam to init by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/584
- Hotfix/path check util by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/591
- [hotfix] fix sharded optim zero grad by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/604
- Add tensor parallel input check by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/621
- [hotfix] Raise messages for indivisible batch sizes with tensor parallelism by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/622
- [zero] fixed the activation offload by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/647
- fixed bugs in CPU adam by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/633
- Revert "[zero] polish init context" by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/657
- [hotfix] fix a bug in model data stats tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/655
- fix bugs for unsharded parameters when restore data by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/664
Unit Testing
- [zero] test zero tensor utils by @FredHuang99 in https://github.com/hpcaitech/ColossalAI/pull/609
- remove hybrid adam in testmoezero_optim by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/659
Documentation
- Refactored docstring to google style by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/532
- [docs] updatad docs of hybrid adam and cpu adam by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/552
- html refactor by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/555
- [doc] polish docstring of zero by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/612
- [doc] update rst by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/615
- [doc] polish amp docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/616
- [doc] polish moe docsrting by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/618
- [doc] polish optimizer docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/619
- [doc] polish utils docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/620
- [NFC] polish colossalai/kernel/cudanative/csrc/kernels/cudautil.cu … by @GaryGky in https://github.com/hpcaitech/ColossalAI/pull/625
- [doc] polish checkpoint docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/637
- update GPT-2 experiment result by @Sze-qq in https://github.com/hpcaitech/ColossalAI/pull/666
- [NFC] polish code by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/646
Model Zoo
- [model zoo] add activation offload for gpt model by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/582
Miscellaneous
- [logging] polish logger format by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/543
- [profiler] add MemProfiler by @raejaf in https://github.com/hpcaitech/ColossalAI/pull/356
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/501
- [tool] create .clang-format for pre-commit by @BoxiangW in https://github.com/hpcaitech/ColossalAI/pull/578
- [GitHub] Add prefix and label in issue template by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/652
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.1...v0.1.2
- Python
Published by ver217 almost 4 years ago
https://github.com/hpcaitech/colossalai - V0.1.1 Released Today!
What's Changed
Features
- [MOE] changed parallelmode to dist process group by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/460
- [MOE] redirect moeenv from globalvariables to core by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/467
- [zero] zero init ctx receives a dp process group by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/471
- [zero] ZeRO supports pipeline parallel by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/477
- add LinearGate for MOE in NaiveAMP context by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/480
- [zero] polish sharded param name by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/484
- [zero] sharded optim support hybrid cpu adam by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/486
- [zero] polish sharded optimizer v2 by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/490
- [MOE] support PR-MOE by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/488
- [zero] sharded model manages ophooks individually by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/492
- [MOE] remove old MoE legacy by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/493
- [zero] sharded model support the reuse of fp16 shard by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/495
- [polish] polish singleton and global context by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/500
- [memory] add model data tensor moving api by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/503
- [memory] set cuda mem frac by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/506
- [zero] use colo model data api in sharded optimv2 by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/511
- [MOE] add MOEGPT model by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/510
- [zero] zero init ctx enable rmtorchpayloadonthe_fly by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/512
- [zero] show model data cuda memory usage after zero context init. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/515
- [log] polish disableexistingloggers by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/519
- [zero] add model data tensor inline moving API by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/521
- [cuda] modify the fused adam, support hybrid of fp16 and fp32 by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/497
- [zero] refactor model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/522
- [zero] added hybrid adam, removed loss scale in adam by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/527
Bug Fix
- fix discussion buttom in issue template by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/504
- [zero] fix grad offload by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/528
Unit Testing
- [MOE] add unitest for MOE experts layout, gradient handler and kernel by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/469
- [test] added rerun on exception for testing by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/475
- [zero] fix init device bug in zero init context unittest by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/516
- [test] fixed rerunonexception and adapted test cases by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/487
CI/CD
- [devops] remove tsinghua source for pip by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/505
- [devops] remove tsinghua source for pip by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/507
- [devops] recover tsinghua pip source due to proxy issue by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/509
Documentation
- [doc] update rst by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/470
- Update Experiment result about Colossal-AI with ZeRO by @Sze-qq in https://github.com/hpcaitech/ColossalAI/pull/479
- [doc] docs get correct release version by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/489
- Update README.md by @fastalgo in https://github.com/hpcaitech/ColossalAI/pull/514
- [doc] update apidoc by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/530
Model Zoo
- [model zoo] fix attn mask shape of gpt by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/472
- [model zoo] gpt embedding remove attn mask by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/474
Miscellaneous
- [install] run with out rich by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/513
- [refactor] remove old zero code by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/517
- [format] polish name format for MOE by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/481
New Contributors
- @fastalgo made their first contribution in https://github.com/hpcaitech/ColossalAI/pull/514
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.0...v0.1.1
- Python
Published by ver217 almost 4 years ago
https://github.com/hpcaitech/colossalai - V0.1.0 Released Today!
Overview
We are happy to release the version v0.1.0 today. Compared to the previous version, we have a brand new zero module and updated many aspects of our system for better performance and usability. The latest version can be installed by pip install colossalai now. We will update our examples and documentation in the next few days accordingly.
Highlights:
Note: a. Only the major base commits are chosen to display. Successive commits which enhance/update the base commit are not shown.
b. Some commits do not have associated pull request ID for some unknown reasons.
c. The list is ordered by time.
Features - add moe context, moe utilities and refactor gradient handler (#455 )By @1SAA - [zero] Update initialize for ZeRO (#458 ) By @ver217 - [zero] hybrid cpu adam (#445 ) By @feifeibear - added Multiply Jitter and capacity factor eval for MOE (#434 ) By @1SAA - [fp16] refactored fp16 optimizer (#392 ) By @FrankLeeeee - [zero] memtracer to record cuda memory usage of model data and overall system (#395 ) By @feifeibear - Added tensor detector (#393 ) By @Gy-Lu - Added activation offload (#331 ) By @Gy-Lu - [zero] zero init context collect numel of model (#375 ) By @feifeibear - Added PCIE profiler to dectect data transmission (#373 ) By @1SAA - Added Profiler Context to manage all profilers (#340 ) By @1SAA - set criterion as optional in colossalai initialize (#336 ) By @FrankLeeeee - [zero] Update sharded model v2 using sharded param v2 (#323 ) By @ver217 - [zero] zero init context (#321 ) By @feifeibear - Added profiler communication operations By @1SAA - added buffer sync to naive amp model wrapper (#291 ) By @FrankLeeeee - [zero] cpu adam kernel (#288 ) By @Gy-Lu - Feature/zero (#279 ) By @feifeibear @FrankLeeeee @ver217 - impl shard optim v2 and add unit test By @ver217 - [profiler] primary memory tracer By @raejaf - add sharded adam By @ver217
Unit Testing - [test] fixed amp convergence comparison test (#454 ) By @FrankLeeeee - [test] optimized zero data parallel test (#452 ) By @FrankLeeeee - [test] make zero engine test really work (#447 ) By @feifeibear - optimized context test time consumption (#446 ) By @FrankLeeeee - [unitest] polish zero config in unittest (#438 ) By @feifeibear - added testing module (#435 ) By @FrankLeeeee - [zero] polish ShardedOptimV2 unittest (#385 ) By @feifeibear - [unit test] Refactored test cases with component func (#339 ) By @FrankLeeeee
Documentation - [doc] Update docstring for ZeRO (#459 ) By @ver217 - update README and images path (#384 ) By @binmakeswell - add badge and contributor list By @FrankLeeeee - add community group and update issue template (#271 ) By @binmakeswell - update experimental visualization (#253 ) By @Sze-qq - add Chinese README By @binmakeswell
CI/CD - update github CI with the current workflow (#441 ) By @FrankLeeeee - update unit testing CI rules By @FrankLeeeee - added compatibility CI and options for release ci By @FrankLeeeee - added pypi publication CI and remove formatting CI By @FrankLeeeee
Bug Fix - fix gpt attention mask (#461 ) By @ver217 - [bug] Fixed device placement bug in memory monitor thread (#433 ) By @FrankLeeeee - fixed fp16 optimizer none grad bug (#432 ) By @FrankLeeeee - fixed gpt attention mask in pipeline (#430 ) By @FrankLeeeee - [hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394 ) By @1SAA - fixed bug in activation checkpointing test (#387 ) By @FrankLeeeee - [profiler] Fixed bugs in CommProfiler and PcieProfiler (#377 ) By @1SAA - fixed CI dataset directory; fixed import error of 2.5d accuracy (#255 ) By @kurisusnowdeng - fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial By @kurisusnowdeng
Miscellaneous - [log] better logging display with rich (#426 ) By @feifeibear
- Python
Published by FrankLeeeee almost 4 years ago
https://github.com/hpcaitech/colossalai - V0.0.2 Released Today!
Change Log
Added
- Unifed distributed layers
- MoE support
- DevOps tools such as github action, code review automation, etc.
- New project official website
Changes
- refactored the APIs for usability, flexibility and modularity
- adapted PyTorch AMP for tensor parallel
- refactored utilities for tensor parallel and pipeline parallel
- Separated benchmarks and examples as independent repositories
- Updated pipeline parallelism to support non-interleaved and interleaved versions
- refactored installation scripts for convenience
Fixed
- zero level 3 runtime error
- incorrect calculation in gradient clipping
- Python
Published by FrankLeeeee about 4 years ago
https://github.com/hpcaitech/colossalai - v0.0.1 Colossal-AI Beta Release
Features
- Data Parallelism
- Pipeline Parallelism (experimental)
- 1D, 2D, 2.5D, 3D and sequence tensor parallelism
- Easy-to-use trainer and engine
- Extensibility for user-defined parallelism
- Mixed Precision Training
- Zero Redundancy Optimizer (ZeRO)
- Python
Published by kurisusnowdeng over 4 years ago