Recent Releases of https://github.com/bytedance/flux
https://github.com/bytedance/flux - v1.1.1
Release including the MoE part
- C++
Published by ZSL98 over 1 year ago
https://github.com/bytedance/flux - v1.0.4
support torch2.5
- C++
Published by zheng-ningxin over 1 year ago
https://github.com/bytedance/flux - v1.0.3
v1.0.3 Support torch 2.4 for flux.
- C++
Published by zheng-ningxin over 1 year ago
https://github.com/bytedance/flux - v1.0.2
What's Changed
- add issue template by @liwenchangbdbz in https://github.com/bytedance/flux/pull/1
- add cutlass submodule and patches by @liwenchangbdbz in https://github.com/bytedance/flux/pull/2
- All gather and reduce scatter on SM80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/3
- Reorganize and deduplicate files by @wenlei-bao in https://github.com/bytedance/flux/pull/4
- Add arXiv paper link by @wenlei-bao in https://github.com/bytedance/flux/pull/5
- Update BibTex by @wenlei-bao in https://github.com/bytedance/flux/pull/6
- Support IPC && SM90 version of AG-GEMM, GEMM-RS by @zheng-ningxin in https://github.com/bytedance/flux/pull/9
- fix the allgatherbase backend issue(issue11) by @zheng-ningxin in https://github.com/bytedance/flux/pull/12
- using c10::intrusive_ptrc10d::ProcessGroup as argument from python by @houqi in https://github.com/bytedance/flux/pull/13
- Add more device types for the time estimation. by @zheng-ningxin in https://github.com/bytedance/flux/pull/15
- Update README.md by @zheng-ningxin in https://github.com/bytedance/flux/pull/16
- zero out all the allocated shm buffer by @zheng-ningxin in https://github.com/bytedance/flux/pull/18
- feat: fix tuning for the all-gather gemm && move the reset-signal() to the forward critical path by @zheng-ningxin in https://github.com/bytedance/flux/pull/19
- Tune the AG performance for the llama-8b by @zheng-ningxin in https://github.com/bytedance/flux/pull/21
- Remove pynvshmem import in gemmrs80.py by @tlrmchlsmth in https://github.com/bytedance/flux/pull/22
- Support performance tunning for gemm-rs kernel on sm80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/23
- add torch version to the whl name by @zheng-ningxin in https://github.com/bytedance/flux/pull/24
New Contributors
- @liwenchangbdbz made their first contribution in https://github.com/bytedance/flux/pull/1
- @zheng-ningxin made their first contribution in https://github.com/bytedance/flux/pull/3
- @wenlei-bao made their first contribution in https://github.com/bytedance/flux/pull/4
- @houqi made their first contribution in https://github.com/bytedance/flux/pull/13
- @tlrmchlsmth made their first contribution in https://github.com/bytedance/flux/pull/22
Full Changelog: https://github.com/bytedance/flux/commits/v1.0.2
- C++
Published by zheng-ningxin almost 2 years ago
https://github.com/bytedance/flux - v1.0.0
What's Changed
- add issue template by @liwenchangbdbz in https://github.com/bytedance/flux/pull/1
- add cutlass submodule and patches by @liwenchangbdbz in https://github.com/bytedance/flux/pull/2
- All gather and reduce scatter on SM80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/3
- Reorganize and deduplicate files by @wenlei-bao in https://github.com/bytedance/flux/pull/4
- Add arXiv paper link by @wenlei-bao in https://github.com/bytedance/flux/pull/5
- Update BibTex by @wenlei-bao in https://github.com/bytedance/flux/pull/6
- Support IPC && SM90 version of AG-GEMM, GEMM-RS by @zheng-ningxin in https://github.com/bytedance/flux/pull/9
- fix the allgatherbase backend issue(issue11) by @zheng-ningxin in https://github.com/bytedance/flux/pull/12
- using c10::intrusive_ptrc10d::ProcessGroup as argument from python by @houqi in https://github.com/bytedance/flux/pull/13
- Add more device types for the time estimation. by @zheng-ningxin in https://github.com/bytedance/flux/pull/15
- Update README.md by @zheng-ningxin in https://github.com/bytedance/flux/pull/16
- zero out all the allocated shm buffer by @zheng-ningxin in https://github.com/bytedance/flux/pull/18
- feat: fix tuning for the all-gather gemm && move the reset-signal() to the forward critical path by @zheng-ningxin in https://github.com/bytedance/flux/pull/19
- Tune the AG performance for the llama-8b by @zheng-ningxin in https://github.com/bytedance/flux/pull/21
- Remove pynvshmem import in gemmrs80.py by @tlrmchlsmth in https://github.com/bytedance/flux/pull/22
- Support performance tunning for gemm-rs kernel on sm80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/23
- add torch version to the whl name by @zheng-ningxin in https://github.com/bytedance/flux/pull/24
Full Changelog: https://github.com/bytedance/flux/commits/v1.0.0
- C++
Published by zheng-ningxin almost 2 years ago
https://github.com/bytedance/flux - v1.0.0-alpha
This is a Pre-Release of flux.
- C++
Published by zheng-ningxin almost 2 years ago