tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Science Score: 72.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
2 of 12 committers (16.7%) from academic institutions -
✓Institutional organization owner
Organization fzj-jsc has institutional domain (www.fz-juelich.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary
Keywords
Repository
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Basic Info
Statistics
- Stars: 269
- Watchers: 12
- Forks: 56
- Open Issues: 0
- Releases: 7
Topics
Metadata Files
README.md
ISC25 Tutorial: Efficient Distributed GPU Programming for Exascale
Repository with talks and exercises of our Efficient GPU Programming for Exascale tutorial, to be held at ISC25.
Coordinates
- Date: 13 June 2025
- Occasion: ISC25 Tutorial
- Tutors: Simon Garcia de Gonzalo (SNL), Andreas Herten (JSC), Lena Oden (Uni Hagen), with support by Markus Hrywniak (NVIDIA) and Jiri Kraus (NVIDIA)
Setup
The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in NVIDIA/multi-gpu-programming-models.
Walk-through:
- Sign up at JuDoor
- Open Jupyter JSC: https://jupyter.jsc.fz-juelich.de
- Create new Jupyter instance on JUPITER, using training2526 account, on LoginNode
- Source course environment:
source $PROJECT_training2526/env.sh - Sync material:
jsc-material-sync - Locally install NVIDIA Nsight Systems: https://developer.nvidia.com/nsight-systems
Curriculum (Note: square-bracketed sessions are skipped at ISC25 because only ½ day was allocated to the tutorial):
- Lecture: Tutorial Overview, Introduction to System + Onboarding Andreas
- Lecture: MPI-Distributed Computing with GPUs Simon
- Hands-on: Multi-GPU Parallelization
- [Lecture: Performance / Debugging Tools]
- Lecture: Optimization Techniques for Multi-GPU Applications Lena
- Hands-on: Overlap Communication and Computation with MPI
- [Lecture: Overview of NCCL and NVSHMEN in MPI]
- [Hands-on: Using NCCL and NVSHMEM]
- [Lecture: Device-initiated Communication with NVSHMEM]
- [Hands-on: Using Device-Initiated Communication with NVSHMEM]
- Lecture: Conclusion and Outline of Advanced Topics Andreas
Owner
- Name: Jülich Supercomputing Centre
- Login: FZJ-JSC
- Kind: organization
- Location: Germany
- Website: https://www.fz-juelich.de/en/ias/jsc
- Twitter: fzj_jsc
- Repositories: 29
- Profile: https://github.com/FZJ-JSC
Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.
Citation (CITATION.cff)
cff-version: 1.2.0
title: Efficient Distributed GPU Programming for Exascale
message: >-
If you use this software, please cite it using the
metadata from this file.
authors:
- given-names: Andreas
family-names: Herten
email: a.herten@fz-juelich.de
affiliation: Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-7150-2505'
- given-names: Lena
family-names: Oden
email: lena.oden@fernuni-hagen.de
affiliation: FernUni Hagen
orcid: 'https://orcid.org/0000-0002-9670-5296'
- given-names: Simon
family-names: Garcia de Gonzalo
email: simgarc@sandia.gov
affiliation: Sandia National Laboratories
orcid: 'https://orcid.org/0000-0002-5699-1793'
- given-names: Jiri
family-names: Kraus
email: jkraus@nvidia.com
affiliation: NVIDIA
orcid: 'https://orcid.org/0000-0002-5240-3317'
- given-names: Markus
family-names: Hrywniak
email: mhrywniak@nvidia.com
affiliation: NVIDIA
orcid: 'https://orcid.org/0000-0002-6015-8788'
identifiers:
- type: doi
value: 10.5281/zenodo.5745504
description: Year-agnostic Zenodo Identifier
repository-code: 'https://github.com/FZJ-JSC/tutorial-multi-gpu/'
abstract: >-
Over the past decade, GPUs became ubiquitous in HPC installations around the world, delivering the majority of performance of some of the largest supercomputers (e.g. Summit, Sierra, JUWELS Booster). This trend continues in the recently deployed and upcoming Pre-Exascale and Exascale systems (JUPITER, LUMI, Leonardo; El Capitan, Frontier, Aurora): GPUs are chosen as the core computing devices to enter this next era of HPC.
To take advantage of future GPU-accelerated systems with tens of thousands of devices, application developers need to have the proper skills and tools to understand, manage, and optimize distributed GPU applications.
In this tutorial, participants will learn techniques to efficiently program large-scale multi-GPU systems. While programming multiple GPUs with MPI is explained in detail, also advanced tuning techniques and complementing programming models like NCCL and NVSHMEM are presented. Tools for analysis are shown and used to motivate and implement performance optimizations. The tutorial teaches fundamental concepts that apply to GPU-accelerated systems in general, taking the NVIDIA platform as an example. It is a combination of lectures and hands-on exercises, using a development system for JUPITER (JEDI), for interactive learning and discovery.
keywords:
- NVIDIA
- GPU
- CUDA
- Exascale
- MPI
- NCCL
- NVSHMEM
- Distributed Programming
license: MIT
version: '8.0-isc25'
date-released: '2025-06-13'
GitHub Events
Total
- Create event: 5
- Issues event: 3
- Release event: 1
- Watch event: 104
- Delete event: 5
- Issue comment event: 9
- Push event: 35
- Pull request event: 6
- Fork event: 8
Last Year
- Create event: 5
- Issues event: 3
- Release event: 1
- Watch event: 104
- Delete event: 5
- Issue comment event: 9
- Push event: 35
- Pull request event: 6
- Fork event: 8
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Andreas Herten | a****n@f****e | 101 |
| Simon Garcia de Gonzalo | s****g@g****m | 25 |
| Lena Oden | m****l@l****e | 20 |
| Markus Hrywniak | m****k@n****m | 16 |
| Jiri Kraus | j****s@n****m | 15 |
| Simon Garcia de Gonzalo | g****1@j****s | 9 |
| Simon Garcia De Gonzalo | s****a@b****s | 8 |
| Andreas Herten | a****b@g****m | 5 |
| lena.oden | l****n@f****e | 3 |
| simgarc | s****c@a****v | 2 |
| Simon Garcia de Gonzalo | g****1@j****s | 1 |
| Markus Hrywniak | 5****k | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 4
- Total pull requests: 24
- Average time to close issues: 7 days
- Average time to close pull requests: 1 day
- Total issue authors: 3
- Total pull request authors: 5
- Average comments per issue: 1.0
- Average comments per pull request: 0.63
- Merged pull requests: 24
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 4
- Average time to close issues: 1 day
- Average time to close pull requests: less than a minute
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AndiH (2)
- jbadwaik (1)
- zzzlxhhh (1)
- ydsumt (1)
Pull Request Authors
- LenaO (10)
- jirikraus (9)
- mhrywniak (6)
- simongdg (4)
- AndiH (2)