tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

https://github.com/fzj-jsc/tutorial-multi-gpu

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
2 of 12 committers (16.7%) from academic institutions
✓
Institutional organization owner
Organization fzj-jsc has institutional domain (www.fz-juelich.de)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary

Keywords

cuda exascale-computing gpu hpc isc22 isc23 isc24 mpi multi-gpu nccl nvshmem sc21 sc22 sc23 supercomputing

Last synced: 6 months ago · JSON representation ·

Repository

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Basic Info

Host: GitHub
Owner: FZJ-JSC
License: mit
Language: Cuda
Default Branch: main
Homepage:
Size: 157 MB

Statistics

Stars: 269
Watchers: 12
Forks: 56
Open Issues: 0
Releases: 7

Topics

cuda exascale-computing gpu hpc isc22 isc23 isc24 mpi multi-gpu nccl nvshmem sc21 sc22 sc23 supercomputing

Created over 4 years ago · Last pushed 8 months ago

Metadata Files

Readme License Citation Zenodo

ISC25 Tutorial: Efficient Distributed GPU Programming for Exascale

Repository with talks and exercises of our Efficient GPU Programming for Exascale tutorial, to be held at ISC25.

Coordinates

Date: 13 June 2025
Occasion: ISC25 Tutorial
Tutors: Simon Garcia de Gonzalo (SNL), Andreas Herten (JSC), Lena Oden (Uni Hagen), with support by Markus Hrywniak (NVIDIA) and Jiri Kraus (NVIDIA)

Setup

The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in NVIDIA/multi-gpu-programming-models.

Walk-through:

Sign up at JuDoor
Open Jupyter JSC: https://jupyter.jsc.fz-juelich.de
Create new Jupyter instance on JUPITER, using training2526 account, on LoginNode
Source course environment: source $PROJECT_training2526/env.sh
Sync material: jsc-material-sync
Locally install NVIDIA Nsight Systems: https://developer.nvidia.com/nsight-systems

Curriculum (Note: square-bracketed sessions are skipped at ISC25 because only ½ day was allocated to the tutorial):

Lecture: Tutorial Overview, Introduction to System + Onboarding Andreas
Lecture: MPI-Distributed Computing with GPUs Simon
Hands-on: Multi-GPU Parallelization
[Lecture: Performance / Debugging Tools]
Lecture: Optimization Techniques for Multi-GPU Applications Lena
Hands-on: Overlap Communication and Computation with MPI
[Lecture: Overview of NCCL and NVSHMEN in MPI]
[Hands-on: Using NCCL and NVSHMEM]
[Lecture: Device-initiated Communication with NVSHMEM]
[Hands-on: Using Device-Initiated Communication with NVSHMEM]
Lecture: Conclusion and Outline of Advanced Topics Andreas

Owner

Name: Jülich Supercomputing Centre
Login: FZJ-JSC
Kind: organization
Location: Germany

Website: https://www.fz-juelich.de/en/ias/jsc
Twitter: fzj_jsc
Repositories: 29
Profile: https://github.com/FZJ-JSC

Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.

Citation (CITATION.cff)

cff-version: 1.2.0
title: Efficient Distributed GPU Programming for Exascale
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Andreas
    family-names: Herten
    email: a.herten@fz-juelich.de
    affiliation: Jülich Supercomputing Centre
    orcid: 'https://orcid.org/0000-0002-7150-2505'
  - given-names: Lena
    family-names: Oden
    email: lena.oden@fernuni-hagen.de
    affiliation: FernUni Hagen
    orcid: 'https://orcid.org/0000-0002-9670-5296'
  - given-names: Simon
    family-names: Garcia de Gonzalo
    email: simgarc@sandia.gov
    affiliation: Sandia National Laboratories
    orcid: 'https://orcid.org/0000-0002-5699-1793'
  - given-names: Jiri
    family-names: Kraus
    email: jkraus@nvidia.com
    affiliation: NVIDIA
    orcid: 'https://orcid.org/0000-0002-5240-3317'
  - given-names: Markus
    family-names: Hrywniak
    email: mhrywniak@nvidia.com
    affiliation: NVIDIA
    orcid: 'https://orcid.org/0000-0002-6015-8788'
identifiers:
  - type: doi
    value: 10.5281/zenodo.5745504
    description: Year-agnostic Zenodo Identifier
repository-code: 'https://github.com/FZJ-JSC/tutorial-multi-gpu/'
abstract: >-
  Over the past decade, GPUs became ubiquitous in HPC installations around the world, delivering the majority of performance of some of the largest supercomputers (e.g. Summit, Sierra, JUWELS Booster). This trend continues in the recently deployed and upcoming Pre-Exascale and Exascale systems (JUPITER, LUMI, Leonardo; El Capitan, Frontier, Aurora): GPUs are chosen as the core computing devices to enter this next era of HPC.
  To take advantage of future GPU-accelerated systems with tens of thousands of devices, application developers need to have the proper skills and tools to understand, manage, and optimize distributed GPU applications.
  In this tutorial, participants will learn techniques to efficiently program large-scale multi-GPU systems. While programming multiple GPUs with MPI is explained in detail, also advanced tuning techniques and complementing programming models like NCCL and NVSHMEM are presented. Tools for analysis are shown and used to motivate and implement performance optimizations. The tutorial teaches fundamental concepts that apply to GPU-accelerated systems in general, taking the NVIDIA platform as an example. It is a combination of lectures and hands-on exercises, using a development system for JUPITER (JEDI), for interactive learning and discovery.
keywords:
  - NVIDIA
  - GPU
  - CUDA
  - Exascale
  - MPI
  - NCCL
  - NVSHMEM
  - Distributed Programming
license: MIT
version: '8.0-isc25'
date-released: '2025-06-13'

GitHub Events

Total

Create event: 5
Issues event: 3
Release event: 1
Watch event: 104
Delete event: 5
Issue comment event: 9
Push event: 35
Pull request event: 6
Fork event: 8

Last Year

Create event: 5
Issues event: 3
Release event: 1
Watch event: 104
Delete event: 5
Issue comment event: 9
Push event: 35
Pull request event: 6
Fork event: 8

Committers

Last synced: over 1 year ago

All Time

Total Commits: 206
Total Committers: 12
Avg Commits per committer: 17.167
Development Distribution Score (DDS): 0.51

Past Year

Commits: 37
Committers: 6
Avg Commits per committer: 6.167
Development Distribution Score (DDS): 0.378

Top Committers

Name	Email	Commits
Andreas Herten	a**n@f**e	101
Simon Garcia de Gonzalo	s**g@g**m	25
Lena Oden	m**l@l**e	20
Markus Hrywniak	m**k@n**m	16
Jiri Kraus	j**s@n**m	15
Simon Garcia de Gonzalo	g**1@j**s	9
Simon Garcia De Gonzalo	s**a@b**s	8
Andreas Herten	a**b@g**m	5
lena.oden	l**n@f**e	3
simgarc	s**c@a**v	2
Simon Garcia de Gonzalo	g**1@j**s	1
Markus Hrywniak	5****k	1

Committer Domains (Top 20 + Academic)

nvidia.com: 2 fz-juelich.de: 2 jwlogin24.juwels: 1 anchor.sandia.gov: 1 bsc.es: 1 jwlogin23.juwels: 1 lenaoden.de: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 4
Total pull requests: 24
Average time to close issues: 7 days
Average time to close pull requests: 1 day
Total issue authors: 3
Total pull request authors: 5
Average comments per issue: 1.0
Average comments per pull request: 0.63
Merged pull requests: 24
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 4
Average time to close issues: 1 day
Average time to close pull requests: less than a minute
Issue authors: 1
Pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

AndiH (2)
jbadwaik (1)
zzzlxhhh (1)
ydsumt (1)

Pull Request Authors

LenaO (10)
jirikraus (9)
mhrywniak (6)
simongdg (4)
AndiH (2)

Top Labels

Issue Labels

enhancement (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

tutorial-multi-gpu

Science Score: 72.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

ISC25 Tutorial: Efficient Distributed GPU Programming for Exascale

Coordinates

Setup

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels