comscribe

ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.

https://github.com/parcorelab/comscribe

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, acm.org
○
Academic email domains
✓
Institutional organization owner
Organization parcorelab has institutional domain (parcorelab.ku.edu.tr)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.

Basic Info

Host: GitHub
Owner: ParCoreLab
License: bsd-3-clause
Language: C++
Default Branch: master
Homepage: https://link.springer.com/chapter/10.1007/978-3-030-71058-3_10
Size: 583 KB

Statistics

Stars: 25
Watchers: 0
Forks: 4
Open Issues: 3
Releases: 0

Created almost 6 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

ComScribe

ComScribe is a tool that identifies communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.

Installation
Usage
Benchmarks
Publication

Installation

You can directly execute install.sh script

./install.sh

You can install it manually.

You will need the following programs:

Python: ComScribe is a Python script. It uses several packages listed in requirements.txt, which you can install via the command:

Bash pip3 install -r requirements.txt

nvprof: ComScribe parses the outputs of NVIDIA's profiler nvprof, which is a light-weight command-line profiler available since CUDA 5.
NCCL[Optional]. ComScribe modifies NCCL library to profile collective communication primitives. If your application does not use any collective operations, you don't have to perform this step.

bash cd nccl && make -j src.build

No further installation is required.

Usage

P2P Communication Profiling

To obtain the communication matrices of your application (app):

bash python3 comscribe.py -g <num_gpus> -s log|linear -i <cmd_to_run>

-g lets our tool know how many GPUs will be used, however note that if the application to be run requires such a parameter too, it must be explicitly specified (see -i below).

-s can be log for log scale or linear for linear scale for the output figures.

-i takes the input command as a string such as: -i './app --foo 20 --bar 5'

The communication matrix for a communication type is only generated if it is detected, e.g. if there are no Unified Memory transfers then there will not be any output regarding Unified Memory transfers. For the types of communication detected, the generated figures are saved as PDF files in the directory of the script.

Collective Communication Profiling

bash python3 comscribe.py -n -c <collective_type> -g <num_gpus> -s log|linear -i <cmd_to_run>

-n enables the profiling of collectives.

-c represents the collective to be profiled (if not specified, all collectives will be profiled by default). Options: broadcast, reduce, allgather, allreduce, reducescatter

Benchmarks

We have used our tool in an NVIDIA V100 DGX2 system with up to 16 GPUs using CUDA v10.0.130 for the following benchmarks:

NVIDIA Monte Carlo Simluation of 2D Ising-GPU | GitHub
NVIDIA Multi-GPU Jacobi Solver | GitHub
Comm|Scope | Paper | GitHub
- Full-Duplex | GitHub
- Full-Duplex with Unified Memory | GitHub
- Half-Duplex with peer access | GitHub
- Half-Duplex without peer access | GitHub
- Zero-copy Memory (both Read and Write benchmarks) | GitHub
Note: In order to run a Comm|Scope benchmark with fixed iterations e.g. 100, in the source code of benchmark, replace it's registration with: benchmark::RegisterBenchmark(...)->SMALL_ARGS()->Iterations(100);
MGBench | Github
- Full-Duplex | GitHub
- Scatter-Gather | GitHub
- Game Of Life | GitHub
Eidetic 3D LSTM | Paper | GitHub
Transformer | Paper | GitHub

Example: Comm|Scope Zero-copy Memory Read Half-Duplex Micro-benchmark

python3 comscribe.py -g 4 -i './scope --benchmark_filter="Comm_ZeroCopy_GPUToGPU_Read.*18.*" -n 0' -s log

Gives the bar-chart for Zero-copy memory transfers:

Example: Comm|Scope Unified Memory Full Duplex Micro-benchmark

python3 comscribe.py -g 4 -i './scope --benchmark_filter="Comm_Demand_Duplex_GPUGPU.*18.*"' -s linear

Gives two matrices, bytes transferred (left) and number of transfers made (right):

Example: MGBench Full Duplex Micro-benchmark

python3 comscribe.py -g 4 -i './fullduplex' -s linear

Gives two matrices, bytes transferred (left) and number of transfers made (right):

Example: NVIDIA Monte Carlo Simluation of 2D Ising-GPU

python3 comscribe.py -g 4 -i './cuIsing -y 32768 -x 65536 -n 128 -p 16 -d 4 -t 1.5' -s log

Gives two matrices, bytes transferred (left) and number of transfers made (right):

Publication:

Akhtar, P., Tezcan, E., Qararyah, F.M., Unat, D. (2021). ComScribe: Identifying Intra-node GPU Communication. In: Wolf, F., Gao, W. (eds) Benchmarking, Measuring, and Optimizing. Bench 2020. Lecture Notes in Computer Science(), vol 12614. Springer, Cham. https://doi.org/10.1007/978-3-030-71058-3_10
Soytürk, M.A., Akhtar, P., Tezcan, E., Unat, D. (2022). Monitoring Collective Communication Among GPUs. In: , et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_4

Owner

Name: ParCoreLab
Login: ParCoreLab
Kind: organization
Location: Istanbul

Website: https://parcorelab.ku.edu.tr/
Twitter: didemunat
Repositories: 20
Profile: https://github.com/ParCoreLab

Koç University - Parallel and Multicore Computing Laboratory

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Soyturk"
  given-names: "Muhammet Abdullah"
  orcid: "https://orcid.org/0000-0002-2880-0857"
- family-names: "Akhtar"
  given-names: "Palwisha"
  orcid: "https://orcid.org/0000-0003-0279-031X"
- family-names: "Tezcan"
  given-names: "Erhan"
  orcid: "https://orcid.org/0000-0001-5129-4166"
- family-names: "Unat"
  given-names: "Didem"
  orcid: "https://orcid.org/0000-0002-2351-0770"
title: "Monitoring Collective Communication Among GPUs"
doi: 10.1007/978-3-031-06156-1_4
date-released: 2022-06-09
url: "https://github.com/ParCoreLab/ComScribe"

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Dependencies

requirements.txt pypi

cycler ==0.10.0
kiwisolver ==1.1.0
matplotlib ==3.1.2
numpy ==1.17.4
pandas ==0.25.3
pyparsing ==2.4.5
python-dateutil ==2.8.1
pytz ==2019.3
scipy ==1.3.3
seaborn ==0.9.0
six ==1.13.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

comscribe

Science Score: 75.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

ComScribe

Installation

Usage

P2P Communication Profiling

Collective Communication Profiling

Benchmarks

Example: Comm|Scope Zero-copy Memory Read Half-Duplex Micro-benchmark

Example: Comm|Scope Unified Memory Full Duplex Micro-benchmark

Example: MGBench Full Duplex Micro-benchmark

Example: NVIDIA Monte Carlo Simluation of 2D Ising-GPU

Publication:

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies