Recent Releases of recsys-examples

recsys-examples - v25.06

What's Changed

Features & Enhancements

LFU Eviction Strategy for Dynamic Embeddings Added a new Least Frequently Used (LFU) eviction strategy to the dynamicemb module, improving memory management and embedding efficiency. (Contributed by @z52527 — (https://github.com/NVIDIA/recsys-examples/pull/52))

LayerNorm Recomputation for Fused HSTU Layer Support for recomputing LayerNorm in the fused HSTU layer to optimize memory usage during training. (Contributed by @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/59))

Embedding and Optimizer State Insertion to HKV During Backward Pass When useindexdedup is enabled, embeddings and optimizer states are now inserted into the HKV during the backward pass, improving training efficiency. (Contributed by @jiashuy — (https://github.com/NVIDIA/recsys-examples/pull/62))

Support for Non-Contiguous Input/Output in HSTU MHA and SiLU Recomputation Enabled handling of non-contiguous tensors for multi-head attention and SiLU recomputation within HSTU layers. (Contributed by @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/64))

Customized CUDA Operation for Concatenating 2D Jagged Tensors Introduced a new CUDA operator concat2djagged_tensors to efficiently concatenate jagged tensors in 2D. (Contributed by @z52527 — (https://github.com/NVIDIA/recsys-examples/pull/42))

Support for Training Pipeline Added support for a streamlined training pipeline to facilitate easier model training and experimentation. (Contributed by @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/68))

Bug Fixes

Fixed HSTU Preprocess and Postprocess CI Issues Resolved continuous integration issues related to HSTU preprocessing and postprocessing steps. (Contributed by @shijieliu — (https://github.com/NVIDIA/recsys-examples/pull/76))

Documentation

Updated HSTU Installation Instructions Clarified and expanded the README installation guide for the HSTU module to improve user onboarding. (Contributed by @z52527 — (https://github.com/NVIDIA/recsys-examples/pull/84))

Dependency Updates

Stable Dependency Upgrades Updated key dependencies to stable versions: torchrec updated to 1.2.0 fbgemm_gpu updated to 1.2.0 mcore updated to 0.12.1 (Contributed by @shijieliu and @JacoCheung — (https://github.com/NVIDIA/recsys-examples/pull/74), (https://github.com/NVIDIA/recsys-examples/pull/75))

- Python
Published by shijieliu 8 months ago

recsys-examples - v25.05

Changelog

Dynamicemb example #16 #31 #58 EmbeddingBagCollection support in Dynamicemb #20 Dynamicemb functionality enhancement #45 #46 #53

HSTU cutlass kernel support contextual features in hopper backward #51

Decouple sharding and model defination in hstu example #37 Fused hstu layer #43 Fix kuairand dataset convergency issue #34 Doc enhancement #39

Full Changelog: https://github.com/NVIDIA/recsys-examples/commits/v25.05

- Python
Published by shijieliu 9 months ago