Recent Releases of https://github.com/bytedance/flux

https://github.com/bytedance/flux - v1.1.1

Release including the MoE part

- C++
Published by ZSL98 over 1 year ago

https://github.com/bytedance/flux - v1.0.4

support torch2.5

- C++
Published by zheng-ningxin over 1 year ago

https://github.com/bytedance/flux - v1.0.3

v1.0.3 Support torch 2.4 for flux.

- C++
Published by zheng-ningxin over 1 year ago

https://github.com/bytedance/flux - v1.0.2

What's Changed

  • add issue template by @liwenchangbdbz in https://github.com/bytedance/flux/pull/1
  • add cutlass submodule and patches by @liwenchangbdbz in https://github.com/bytedance/flux/pull/2
  • All gather and reduce scatter on SM80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/3
  • Reorganize and deduplicate files by @wenlei-bao in https://github.com/bytedance/flux/pull/4
  • Add arXiv paper link by @wenlei-bao in https://github.com/bytedance/flux/pull/5
  • Update BibTex by @wenlei-bao in https://github.com/bytedance/flux/pull/6
  • Support IPC && SM90 version of AG-GEMM, GEMM-RS by @zheng-ningxin in https://github.com/bytedance/flux/pull/9
  • fix the allgatherbase backend issue(issue11) by @zheng-ningxin in https://github.com/bytedance/flux/pull/12
  • using c10::intrusive_ptrc10d::ProcessGroup as argument from python by @houqi in https://github.com/bytedance/flux/pull/13
  • Add more device types for the time estimation. by @zheng-ningxin in https://github.com/bytedance/flux/pull/15
  • Update README.md by @zheng-ningxin in https://github.com/bytedance/flux/pull/16
  • zero out all the allocated shm buffer by @zheng-ningxin in https://github.com/bytedance/flux/pull/18
  • feat: fix tuning for the all-gather gemm && move the reset-signal() to the forward critical path by @zheng-ningxin in https://github.com/bytedance/flux/pull/19
  • Tune the AG performance for the llama-8b by @zheng-ningxin in https://github.com/bytedance/flux/pull/21
  • Remove pynvshmem import in gemmrs80.py by @tlrmchlsmth in https://github.com/bytedance/flux/pull/22
  • Support performance tunning for gemm-rs kernel on sm80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/23
  • add torch version to the whl name by @zheng-ningxin in https://github.com/bytedance/flux/pull/24

New Contributors

  • @liwenchangbdbz made their first contribution in https://github.com/bytedance/flux/pull/1
  • @zheng-ningxin made their first contribution in https://github.com/bytedance/flux/pull/3
  • @wenlei-bao made their first contribution in https://github.com/bytedance/flux/pull/4
  • @houqi made their first contribution in https://github.com/bytedance/flux/pull/13
  • @tlrmchlsmth made their first contribution in https://github.com/bytedance/flux/pull/22

Full Changelog: https://github.com/bytedance/flux/commits/v1.0.2

- C++
Published by zheng-ningxin almost 2 years ago

https://github.com/bytedance/flux - v1.0.0

What's Changed

  • add issue template by @liwenchangbdbz in https://github.com/bytedance/flux/pull/1
  • add cutlass submodule and patches by @liwenchangbdbz in https://github.com/bytedance/flux/pull/2
  • All gather and reduce scatter on SM80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/3
  • Reorganize and deduplicate files by @wenlei-bao in https://github.com/bytedance/flux/pull/4
  • Add arXiv paper link by @wenlei-bao in https://github.com/bytedance/flux/pull/5
  • Update BibTex by @wenlei-bao in https://github.com/bytedance/flux/pull/6
  • Support IPC && SM90 version of AG-GEMM, GEMM-RS by @zheng-ningxin in https://github.com/bytedance/flux/pull/9
  • fix the allgatherbase backend issue(issue11) by @zheng-ningxin in https://github.com/bytedance/flux/pull/12
  • using c10::intrusive_ptrc10d::ProcessGroup as argument from python by @houqi in https://github.com/bytedance/flux/pull/13
  • Add more device types for the time estimation. by @zheng-ningxin in https://github.com/bytedance/flux/pull/15
  • Update README.md by @zheng-ningxin in https://github.com/bytedance/flux/pull/16
  • zero out all the allocated shm buffer by @zheng-ningxin in https://github.com/bytedance/flux/pull/18
  • feat: fix tuning for the all-gather gemm && move the reset-signal() to the forward critical path by @zheng-ningxin in https://github.com/bytedance/flux/pull/19
  • Tune the AG performance for the llama-8b by @zheng-ningxin in https://github.com/bytedance/flux/pull/21
  • Remove pynvshmem import in gemmrs80.py by @tlrmchlsmth in https://github.com/bytedance/flux/pull/22
  • Support performance tunning for gemm-rs kernel on sm80 by @zheng-ningxin in https://github.com/bytedance/flux/pull/23
  • add torch version to the whl name by @zheng-ningxin in https://github.com/bytedance/flux/pull/24

Full Changelog: https://github.com/bytedance/flux/commits/v1.0.0

- C++
Published by zheng-ningxin almost 2 years ago

https://github.com/bytedance/flux - v1.0.0-alpha

This is a Pre-Release of flux.

- C++
Published by zheng-ningxin almost 2 years ago