Recent Releases of recsys-examples
recsys-examples - v25.06
What's Changed
Features & Enhancements
LFU Eviction Strategy for Dynamic Embeddings Added a new Least Frequently Used (LFU) eviction strategy to the dynamicemb module, improving memory management and embedding efficiency. (Contributed by @z52527 — (https://github.com/NVIDIA/recsys-examples/pull/52))
LayerNorm Recomputation for Fused HSTU Layer Support for recomputing LayerNorm in the fused HSTU layer to optimize memory usage during training. (Contributed by @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/59))
Embedding and Optimizer State Insertion to HKV During Backward Pass When useindexdedup is enabled, embeddings and optimizer states are now inserted into the HKV during the backward pass, improving training efficiency. (Contributed by @jiashuy — (https://github.com/NVIDIA/recsys-examples/pull/62))
Support for Non-Contiguous Input/Output in HSTU MHA and SiLU Recomputation Enabled handling of non-contiguous tensors for multi-head attention and SiLU recomputation within HSTU layers. (Contributed by @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/64))
Customized CUDA Operation for Concatenating 2D Jagged Tensors Introduced a new CUDA operator concat2djagged_tensors to efficiently concatenate jagged tensors in 2D. (Contributed by @z52527 — (https://github.com/NVIDIA/recsys-examples/pull/42))
Support for Training Pipeline Added support for a streamlined training pipeline to facilitate easier model training and experimentation. (Contributed by @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/68))
Bug Fixes
Fixed HSTU Preprocess and Postprocess CI Issues Resolved continuous integration issues related to HSTU preprocessing and postprocessing steps. (Contributed by @shijieliu — (https://github.com/NVIDIA/recsys-examples/pull/76))
Documentation
Updated HSTU Installation Instructions Clarified and expanded the README installation guide for the HSTU module to improve user onboarding. (Contributed by @z52527 — (https://github.com/NVIDIA/recsys-examples/pull/84))
Dependency Updates
Stable Dependency Upgrades Updated key dependencies to stable versions: torchrec updated to 1.2.0 fbgemm_gpu updated to 1.2.0 mcore updated to 0.12.1 (Contributed by @shijieliu and @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/74), (https://github.com/NVIDIA/recsys-examples/pull/75))
- Python
Published by shijieliu 8 months ago
recsys-examples - v25.05
Changelog
Dynamicemb example #16 #31 #58 EmbeddingBagCollection support in Dynamicemb #20 Dynamicemb functionality enhancement #45 #46 #53
HSTU cutlass kernel support contextual features in hopper backward #51
Decouple sharding and model defination in hstu example #37 Fused hstu layer #43 Fix kuairand dataset convergency issue #34 Doc enhancement #39
Full Changelog: https://github.com/NVIDIA/recsys-examples/commits/v25.05
- Python
Published by shijieliu 9 months ago