tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

https://github.com/fzj-jsc/tutorial-multi-gpu

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 12 committers (16.7%) from academic institutions
  • Institutional organization owner
    Organization fzj-jsc has institutional domain (www.fz-juelich.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.0%) to scientific vocabulary

Keywords

cuda exascale-computing gpu hpc isc22 isc23 isc24 mpi multi-gpu nccl nvshmem sc21 sc22 sc23 supercomputing
Last synced: 6 months ago · JSON representation ·

Repository

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Basic Info
  • Host: GitHub
  • Owner: FZJ-JSC
  • License: mit
  • Language: Cuda
  • Default Branch: main
  • Homepage:
  • Size: 157 MB
Statistics
  • Stars: 269
  • Watchers: 12
  • Forks: 56
  • Open Issues: 0
  • Releases: 7
Topics
cuda exascale-computing gpu hpc isc22 isc23 isc24 mpi multi-gpu nccl nvshmem sc21 sc22 sc23 supercomputing
Created over 4 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation Zenodo

README.md

ISC25 Tutorial: Efficient Distributed GPU Programming for Exascale

DOI

Repository with talks and exercises of our Efficient GPU Programming for Exascale tutorial, to be held at ISC25.

Coordinates

  • Date: 13 June 2025
  • Occasion: ISC25 Tutorial
  • Tutors: Simon Garcia de Gonzalo (SNL), Andreas Herten (JSC), Lena Oden (Uni Hagen), with support by Markus Hrywniak (NVIDIA) and Jiri Kraus (NVIDIA)

Setup

The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in NVIDIA/multi-gpu-programming-models.

Walk-through:

  • Sign up at JuDoor
  • Open Jupyter JSC: https://jupyter.jsc.fz-juelich.de
  • Create new Jupyter instance on JUPITER, using training2526 account, on LoginNode
  • Source course environment: source $PROJECT_training2526/env.sh
  • Sync material: jsc-material-sync
  • Locally install NVIDIA Nsight Systems: https://developer.nvidia.com/nsight-systems

Curriculum (Note: square-bracketed sessions are skipped at ISC25 because only ½ day was allocated to the tutorial):

  1. Lecture: Tutorial Overview, Introduction to System + Onboarding Andreas
  2. Lecture: MPI-Distributed Computing with GPUs Simon
  3. Hands-on: Multi-GPU Parallelization
  4. [Lecture: Performance / Debugging Tools]
  5. Lecture: Optimization Techniques for Multi-GPU Applications Lena
  6. Hands-on: Overlap Communication and Computation with MPI
  7. [Lecture: Overview of NCCL and NVSHMEN in MPI]
  8. [Hands-on: Using NCCL and NVSHMEM]
  9. [Lecture: Device-initiated Communication with NVSHMEM]
  10. [Hands-on: Using Device-Initiated Communication with NVSHMEM]
  11. Lecture: Conclusion and Outline of Advanced Topics Andreas

Owner

  • Name: Jülich Supercomputing Centre
  • Login: FZJ-JSC
  • Kind: organization
  • Location: Germany

Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.

Citation (CITATION.cff)

cff-version: 1.2.0
title: Efficient Distributed GPU Programming for Exascale
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Andreas
    family-names: Herten
    email: a.herten@fz-juelich.de
    affiliation: Jülich Supercomputing Centre
    orcid: 'https://orcid.org/0000-0002-7150-2505'
  - given-names: Lena
    family-names: Oden
    email: lena.oden@fernuni-hagen.de
    affiliation: FernUni Hagen
    orcid: 'https://orcid.org/0000-0002-9670-5296'
  - given-names: Simon
    family-names: Garcia de Gonzalo
    email: simgarc@sandia.gov
    affiliation: Sandia National Laboratories
    orcid: 'https://orcid.org/0000-0002-5699-1793'
  - given-names: Jiri
    family-names: Kraus
    email: jkraus@nvidia.com
    affiliation: NVIDIA
    orcid: 'https://orcid.org/0000-0002-5240-3317'
  - given-names: Markus
    family-names: Hrywniak
    email: mhrywniak@nvidia.com
    affiliation: NVIDIA
    orcid: 'https://orcid.org/0000-0002-6015-8788'
identifiers:
  - type: doi
    value: 10.5281/zenodo.5745504
    description: Year-agnostic Zenodo Identifier
repository-code: 'https://github.com/FZJ-JSC/tutorial-multi-gpu/'
abstract: >-
  Over the past decade, GPUs became ubiquitous in HPC installations around the world, delivering the majority of performance of some of the largest supercomputers (e.g. Summit, Sierra, JUWELS Booster). This trend continues in the recently deployed and upcoming Pre-Exascale and Exascale systems (JUPITER, LUMI, Leonardo; El Capitan, Frontier, Aurora): GPUs are chosen as the core computing devices to enter this next era of HPC.
  To take advantage of future GPU-accelerated systems with tens of thousands of devices, application developers need to have the proper skills and tools to understand, manage, and optimize distributed GPU applications.
  In this tutorial, participants will learn techniques to efficiently program large-scale multi-GPU systems. While programming multiple GPUs with MPI is explained in detail, also advanced tuning techniques and complementing programming models like NCCL and NVSHMEM are presented. Tools for analysis are shown and used to motivate and implement performance optimizations. The tutorial teaches fundamental concepts that apply to GPU-accelerated systems in general, taking the NVIDIA platform as an example. It is a combination of lectures and hands-on exercises, using a development system for JUPITER (JEDI), for interactive learning and discovery.
keywords:
  - NVIDIA
  - GPU
  - CUDA
  - Exascale
  - MPI
  - NCCL
  - NVSHMEM
  - Distributed Programming
license: MIT
version: '8.0-isc25'
date-released: '2025-06-13'

GitHub Events

Total
  • Create event: 5
  • Issues event: 3
  • Release event: 1
  • Watch event: 104
  • Delete event: 5
  • Issue comment event: 9
  • Push event: 35
  • Pull request event: 6
  • Fork event: 8
Last Year
  • Create event: 5
  • Issues event: 3
  • Release event: 1
  • Watch event: 104
  • Delete event: 5
  • Issue comment event: 9
  • Push event: 35
  • Pull request event: 6
  • Fork event: 8

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 206
  • Total Committers: 12
  • Avg Commits per committer: 17.167
  • Development Distribution Score (DDS): 0.51
Past Year
  • Commits: 37
  • Committers: 6
  • Avg Commits per committer: 6.167
  • Development Distribution Score (DDS): 0.378
Top Committers
Name Email Commits
Andreas Herten a****n@f****e 101
Simon Garcia de Gonzalo s****g@g****m 25
Lena Oden m****l@l****e 20
Markus Hrywniak m****k@n****m 16
Jiri Kraus j****s@n****m 15
Simon Garcia de Gonzalo g****1@j****s 9
Simon Garcia De Gonzalo s****a@b****s 8
Andreas Herten a****b@g****m 5
lena.oden l****n@f****e 3
simgarc s****c@a****v 2
Simon Garcia de Gonzalo g****1@j****s 1
Markus Hrywniak 5****k 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 4
  • Total pull requests: 24
  • Average time to close issues: 7 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 3
  • Total pull request authors: 5
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.63
  • Merged pull requests: 24
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 4
  • Average time to close issues: 1 day
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AndiH (2)
  • jbadwaik (1)
  • zzzlxhhh (1)
  • ydsumt (1)
Pull Request Authors
  • LenaO (10)
  • jirikraus (9)
  • mhrywniak (6)
  • simongdg (4)
  • AndiH (2)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels