Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
YaFSDP: Yet another Fully Sharded Data Parallel
Basic Info
Statistics
- Stars: 967
- Watchers: 18
- Forks: 49
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
YaFSDP
Overview
YaFSDP is a Sharded Data Parallelism framework, designed to work well with transformer-like neural network architectures. YaFSDP is developed and maintained by Yandex.
You can find more info on YaFSDP internals in our blog posts on Medium and Habr.
Advantages over FSDP
YaFSDP is up to 20% faster for pre-training LLMs and performs better in high memory pressure conditions. It is designed to reduce communications and memory operations overhead.
YaFSDP:

FSDP:

Benchmarks
We've compared YaFSDP with FSDP on a variety of pre-training setups ranging from:
- 7B to 70B parameters
- 64 to 256 devices
- 2048 to 8192 tokens per sequence
| model | gpu-count | seq-len | num-ckpt-layers | speedup | YaFSDP iteration time (s) | FSDP iteration time (s) | | :---------- | --------: | ------: | --------------: | ------: | ------------------------: | ----------------------: | | Llama 2 7B | 64 | 2048 | 0 | 9.92% | 0.81 | 0.90 | | Llama 2 7B | 64 | 4096 | 0 | 3.43% | 1.16 | 1.21 | | Llama 2 7B | 64 | 8192 | 0 | 2.68% | 2.23 | 2.29 | | Llama 2 7B | 128 | 2048 | 0 | 9.57% | 0.87 | 0.97 | | Llama 2 7B | 128 | 4096 | 0 | 2.42% | 1.19 | 1.22 | | Llama 2 7B | 128 | 8192 | 0 | 2.32% | 2.25 | 2.31 | | Llama 2 13B | 128 | 2048 | 0 | 12.10% | 1.55 | 1.76 | | Llama 2 13B | 128 | 4096 | 0 | 3.49% | 2.06 | 2.14 | | Llama 2 34B | 128 | 2048 | 0 | 20.70% | 3.39 | 4.27 | | Llama 2 34B | 256 | 2048 | 0 | 21.99% | 3.51 | 4.50 | | Llama 2 34B | 256 | 4096 | 5 | 8.35% | 5.33 | 5.81 | | Llama 2 70B | 256 | 2048 | 10 | 21.48% | 6.97 | 8.87 | | Llama 2 70B | 256 | 4096 | 50 | 7.17% | 11.07 | 11.93 | | Llama 3 8B | 64 | 2048 | 0 | 11.91% | 0.97 | 1.10 | | Llama 3 8B | 64 | 4096 | 0 | 7.86% | 1.36 | 1.48 | | Llama 3 70B | 256 | 2048 | 20 | 26.60% | 7.17 | 9.76 |
Details:
- In each run per-device batch size is set to 1.
speeduprepresents relative iteration time decrease between YaFSDP and FSDP runs.num-ckpt-layersrefers to the number of transformer layers to which activation checkpointing was applied.- Performance was measured using a cluster of hosts with A100 80 GB GPUs.
Examples
You can find examples of LLM training using 🤗 stack in the examples folder:
clm.mdfor causal pre-trainingsft.mdfor supervised fine-tuning
Notice that both examples require a Docker image, which can be built using
docker/build.sh script. The image is based on the NVIDIA PyTorch
image
with some patched 🤗 libraries. Patches for the libraries can be found in the
patches folder.
Issues and questions
If you encounter any bugs of have any questions feel free to open a GitHub issue.
Citation
If you use this codebase, please cite it by using the following BibTeX entry:
bibtex
@misc{YaFSDP2024,
author = {Mikhail Khrushchev and Anton Frolov and Ruslan Vasilev},
title = {YaFSDP: Yet another Fully Sharded Data Parallel},
howpublished = {\url{https://github.com/yandex/YaFSDP}},
year = {2024}
}
Owner
- Name: Yandex
- Login: yandex
- Kind: organization
- Email: opensource-support@yandex-team.ru
- Location: Moscow, Russia
- Website: https://tech.yandex.com/
- Repositories: 85
- Profile: https://github.com/yandex
Yandex open source projects and technologies
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Khrushchev" given-names: "Mikhail" - family-names: "Frolov" given-names: "Anton" - family-names: "Vasilev" given-names: "Ruslan" title: "YaFSDP: Yet another Fully Sharded Data Parallel" date-released: 2024-06-11 url: "https://github.com/yandex/YaFSDP"
GitHub Events
Total
- Issues event: 2
- Watch event: 136
- Push event: 37
- Fork event: 10
Last Year
- Issues event: 2
- Watch event: 136
- Push event: 37
- Fork event: 10
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| lovanto | l****o@y****m | 34 |
| Ruslan Vasilev | 5****g | 5 |
| dfyz | d****z@y****m | 5 |
| as-bessonov | a****v@y****m | 3 |
| Ermakov Petr | e****d@g****m | 3 |
| robot-piglet | r****t@y****m | 2 |
| thefacetak | t****k@y****m | 1 |
| andreichernov | a****v@y****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 5
- Total pull requests: 6
- Average time to close issues: 28 minutes
- Average time to close pull requests: 8 minutes
- Total issue authors: 5
- Total pull request authors: 3
- Average comments per issue: 1.4
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 6
- Average time to close issues: 28 minutes
- Average time to close pull requests: 8 minutes
- Issue authors: 5
- Pull request authors: 3
- Average comments per issue: 1.4
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- liuyunfeng2016 (1)
- MVoloshin71 (1)
- RunFMe (1)
- AZhurkin (1)
- pansershrek (1)
- 152334H (1)
Pull Request Authors
- antony-frolov (8)
- artnitolog (2)
- ermakovpetr (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- nvcr.io/nvidia/pytorch 24.02-py3 build