qsync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Repository
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
Basic Info
Statistics
- Stars: 19
- Watchers: 2
- Forks: 3
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
Description
QSync aims to explore the potential of removing unnecessary quantized operations to improve training accuracy. It achieves this through the following components: - Quantization perturbation indicator/Replayer for analyzing the global data flow graph's memory and latency under mixed-precision (Predictor) - Allocator for selecting the optimal quantized operations for training (Allocator / Syncer) - Support for low-precision backends (CUTLASS, CUDNN) (LP-PyTorch)
In particular, QSync addresses a specific practical scenario: hybrid-cluster training, which involves inference GPUs with power capabilities (memory, compute) and training GPUs with higher capabilities.
The provided scripts support both convolution-based and transformer-based models.
NOTE: The project is a bit old. The performance of kernel implementation may not catch up with latest PyTorch.
Set Environment
Clone the repo
git clone --recursive https://github.com/bytedance/QSync.git
Docker
- run
build.shunderdockerfile - run
run.sh, specifiying the necessary path mounting inside. - run
pip install -e .right in the root folder of QSync, compilation of kernels will start.
Manual Installation
- Some libs may hard to install without proxy. Change
<abspath_to_root>inm_install.shto the absolute path to the root folder. Then bash m_install.shmake
Usage
QSync is implemented under the qsync folder, composed of syncer, predictor and LpTorch.
- to use LpTorch and convert your model to mixed-biwdith model, use model = QModule(model)
- See detail for usage of predictor and syncer in the corresponding page.
- See sample under benchmark_convs / benchmark_transformers
notice the cross-node cost modeling is not as accurate as single-node is. Extra efforts required to align the communication start.
Owner
- Name: Bytedance Inc.
- Login: bytedance
- Kind: organization
- Location: Singapore
- Website: https://opensource.bytedance.com
- Twitter: ByteDanceOSS
- Repositories: 255
- Profile: https://github.com/bytedance
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: JUNTAO
given-names: ZHAO
orcid: https://orcid.org/0000-0003-3376-0607
repository-code: 'https://github.com/SpringWave1/QSync'
abstract: >-
Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
keywords:
- 'neural network, cutlass, composable kernel, cuda, rocm'
title: QSync"
version: '0.1'
date-released: '2022-11-20'
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- accelerate *
- bokeh *
- cython *
- datasets *
- jupyterlab *
- matplotlib *
- ninja *
- pandas *
- pulp *
- pycocotools *
- pynvml *
- regex *
- scikit-learn *
- seaborn *
- tensorboard *
- tensorboardX *
- tensorflow *
- tokenizers ==0.12.1
- torch ==1.10.0
- tqdm *
- transformers *
- pytorch/pytorch 1.10.0-cuda11.3-cudnn8-devel build