qsync

Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".

https://github.com/bytedance/qsync

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

research
Last synced: 10 months ago · JSON representation ·

Repository

Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".

Basic Info
  • Host: GitHub
  • Owner: bytedance
  • License: mit
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 1.81 MB
Statistics
  • Stars: 19
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
research
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

QSync

Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".

Description

QSync aims to explore the potential of removing unnecessary quantized operations to improve training accuracy. It achieves this through the following components: - Quantization perturbation indicator/Replayer for analyzing the global data flow graph's memory and latency under mixed-precision (Predictor) - Allocator for selecting the optimal quantized operations for training (Allocator / Syncer) - Support for low-precision backends (CUTLASS, CUDNN) (LP-PyTorch)

In particular, QSync addresses a specific practical scenario: hybrid-cluster training, which involves inference GPUs with power capabilities (memory, compute) and training GPUs with higher capabilities.

The provided scripts support both convolution-based and transformer-based models.

NOTE: The project is a bit old. The performance of kernel implementation may not catch up with latest PyTorch.

Set Environment

Clone the repo git clone --recursive https://github.com/bytedance/QSync.git

Docker

  • run build.sh under dockerfile
  • run run.sh, specifiying the necessary path mounting inside.
  • run pip install -e . right in the root folder of QSync, compilation of kernels will start.

Manual Installation

  • Some libs may hard to install without proxy. Change <abspath_to_root> in m_install.sh to the absolute path to the root folder. Then
  • bash m_install.sh
  • make

Usage

QSync is implemented under the qsync folder, composed of syncer, predictor and LpTorch. - to use LpTorch and convert your model to mixed-biwdith model, use model = QModule(model) - See detail for usage of predictor and syncer in the corresponding page. - See sample under benchmark_convs / benchmark_transformers

notice the cross-node cost modeling is not as accurate as single-node is. Extra efforts required to align the communication start.

Owner

  • Name: Bytedance Inc.
  • Login: bytedance
  • Kind: organization
  • Location: Singapore

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: JUNTAO
    given-names: ZHAO
    orcid: https://orcid.org/0000-0003-3376-0607

repository-code: 'https://github.com/SpringWave1/QSync'
abstract: >-
  Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
keywords:
  - 'neural network, cutlass, composable kernel, cuda, rocm'

title: QSync"
version: '0.1'
date-released: '2022-11-20'

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • accelerate *
  • bokeh *
  • cython *
  • datasets *
  • jupyterlab *
  • matplotlib *
  • ninja *
  • pandas *
  • pulp *
  • pycocotools *
  • pynvml *
  • regex *
  • scikit-learn *
  • seaborn *
  • tensorboard *
  • tensorboardX *
  • tensorflow *
  • tokenizers ==0.12.1
  • torch ==1.10.0
  • tqdm *
  • transformers *
dockerfile/Dockerfile docker
  • pytorch/pytorch 1.10.0-cuda11.3-cudnn8-devel build
pytorch/cudnn_bn/setup.py pypi
pytorch/cudnn_conv/setup.py pypi
pytorch/cutlass-conv/setup.py pypi
pytorch/cutlass-linear/setup.py pypi
pytorch/int8pool-extension/setup.py pypi
pytorch/other_extension/setup.py pypi
pytorch/quantization/setup.py pypi
setup.py pypi