Khrushchev, M., Frolov, A., & Vasilev, R. (2024). YaFSDP: Yet another Fully Sharded Data Parallel [Computer software]. https://github.com/yandex/YaFSDP