Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: s769
  • License: mit
  • Language: C++
  • Default Branch: main
  • Size: 7.47 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Multi-GPU Accelerated FFT-Based Matvec for Block Triangular Toeplitz Matrices

This repository contains the code for the paper "Sreeram Venkat, Milinda Fernando, Stefan Henneking, and Omar Ghattas. Fast and Scalable FFT-Based GPU-Accelerated Algorithms for Hessian Actions Arising in Linear Inverse Problems Governed by Autonomous Dynamical Systems. arXiv preprint arXiv:2407.13066. 2024 Jul 18."

Note: This is the main branch, which only supports NVIDIA GPUs. For AMD and (experimental) Intel support as well as mixed-precision computation, use the mp branch.

Documentation

The documentation for the code can be found here.

Installation

To build the code, the following dependencies are required:

  • CUDA (with cuFFT, cuBLAS, and cuTENSOR 2.x) and a CUDA enabled GPU
  • NCCL
  • HDF5 (parallel version is required)

First, clone the repository: bash git clone https://github.com/s769/FFTMatvec.git cd matvec-test

Initialize the submodules: bash git submodule update --init --recursive

Then, build the code: bash cmake -B build -DNCCL_LIBRARIES=/path/to/nccl/lib -DNCCL_INCLUDE_DIRS=/path/to/nccl/include -DCMAKE_BUILD_TYPE=Release -DCUTENSOR_ROOT=/path/to/cutensor cmake --build build

Note: the -DCUTENSOR_ROOT option is only needed if the cuTENSOR 2.x library is not in the usual CUDA library path. Some systems may have the cuTENSOR 1.x library in the CUDA library path, which is not compatible with this code. In that case, the cuTENSOR 2.x library must be installed, and the path to the cuTENSOR 2.x library must be provided to the build command.

Tests will build by default. If you don't want to build the tests, you can disable them by adding -DENABLE_TESTING=OFF to the cmake command. To run the tests, use ctest in the build directory. Tests require a minimum of 2 GPUs to run.

To build the documentation, the following dependencies are required:

Then, build the documentation by running mkdocs build or mkdocs serve in the docs directory. The built documentation will be in the site directory.

Usage

The main executable is fft_matvec. It takes the following arguments:

  • -pr (int): Number of processor rows (default: 1)
  • -pc (int): Number of processor columns (default: 1)
  • -g (bool): Use global sizes (default: false)
  • -Nm (int): Number of global block columns (default: 10, ignored if -g is false)
  • -Nd (int): Number of global block rows (default: 5, ignored if -g is false)
  • -Nt (int): Block size (default: 7)
  • -nm (int): Number of local block columns (default: 3, ignored if -g is true)
  • -nd (int): Number of local block rows (default: 2, ignored if -g is true)
  • -v (bool): Print input/output vectors (default: false)
  • -N (int): Number of matvecs to use for timing (default: 100)
  • -raw (bool): Print raw timing data instead of table (default: false)
  • -t (bool): Check matvec results (default: false)
  • -h (bool): Print help message

Note: pr x pc must be equal to the number of processors used to run the code. If no values are provided for -pr and -pc, the code will run with pr = 1 and pc = num_mpi_procs.

For boolean arguments, just pass the flag to enable it without a value. For example: bash mpiexec -np 4 ./build/fft_matvec -pr 2 -pc 2 -g -Nm 20 -Nd 10 -Nt 7 -nm 4 -nd 3 -v -N 100

will run the code with 4 processors, a 2x2 processor grid, global sizes, 20 global block columns, 10 global block rows, a block size of 7, 4 local block columns, 3 local block rows, print input/output vectors, and use 100 matvecs for timing.

To reproduce the results in the paper, run with the configurations described in the Numerical Results section.

License

This code is released under the MIT License. See LICENSE for more information.

Owner

  • Name: Sreeram Venkat
  • Login: s769
  • Kind: user
  • Company: UT Austin

Graduate Research Student at the Oden Institute, UT Austin

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
  - family-names: "Venkat"
    given-names: "Sreeram"
    affiliation: "University of Texas at Austin"
  - family-names: "Fernando"
    given-names: "Milinda"
    affiliation: "University of Texas at Austin"
  - family-names: "Henneking"
    given-names: "Stefan"
    affiliation: "University of Texas at Austin"
  - family-names: "Ghattas"
    given-names: "Omar"
    affiliation: "University of Texas at Austin"
title: "FFTMatvec"
version: "0.1.0"
date-released: 2024-08-16
url: "https://github.com/s769/FFTMatvec"
preferred-citation:
  type: article
  authors:
    - family-names: "Venkat"
      given-names: "Sreeram"
      affiliation: "University of Texas at Austin"
    - family-names: "Fernando"
      given-names: "Milinda"
      affiliation: "University of Texas at Austin"
    - family-names: "Henneking"
      given-names: "Stefan"
      affiliation: "University of Texas at Austin"
    - family-names: "Ghattas"
      given-names: "Omar"
      affiliation: "University of Texas at Austin"
  title: "Fast and Scalable FFT-Based GPU-Accelerated Algorithms for Hessian Actions Arising in Linear Inverse Problems Governed by Autonomous Dynamical Systems"
  journal: "arXiv preprint arXiv:2407.13066"
  year: 2024

  

GitHub Events

Total
  • Release event: 1
  • Delete event: 1
  • Push event: 30
  • Create event: 2
Last Year
  • Release event: 1
  • Delete event: 1
  • Push event: 30
  • Create event: 2

Dependencies

docs/requirements.txt pypi
  • breathe *
  • sphinx-rtd-theme *