cudamatrixtranspose
Optimizing matrix transposition on GPU with CUDA (University of Trento, Italy)
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary
Keywords
Repository
Optimizing matrix transposition on GPU with CUDA (University of Trento, Italy)
Basic Info
- Host: GitHub
- Owner: LuCazzola
- License: mit
- Language: C
- Default Branch: master
- Homepage: https://www.lucazzola.it/cuda_programming.html
- Size: 1.81 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Matrix transposition : from sequential to parallel with CUDA
The following repository contains all the material related to both the homeworks on Matrix Transposition assigned during the GPU computing course : University of Trento (Italy) a.y. 2023/2024.
To see the report and better understand what this work is about, click Here

How to use
Download the directory
git clone https://github.com/LuCazzola/cudaMatrixTranspose.git
Here follows the Hierarchy of relevant project's files : ```bash
.
├── bin # final executables
│ └── ...
├── obj # intermediate object files
│ └── ...
└── src # source code
│ ├── headers # header files
│ │ └── ...
│ ├── benchmark.c # produce an output file according to options in "runbenchmark.sh"
│ ├── benchmarkgpu.cu # produce an output file according to options in "launchbenchmark.sh"
│ .
│ ├── main.c # test the functions according to options in "runmain.sh"
│ ├── maingpu.cu # test the functions according to options in "launchmain.sh"
│ .
│ ├── transpose.c # functions to compute the transpose of a given matrix
│ ├── transposegpu.c
│ .
│ ├── matrix.c # definition of methods to handle matrices
│ ├── optparser.c # command line parameter parsing
│ .
│ └── commoncuda.cu # defines some common functions for cuda methods
│
├── runbenchmark.sh # set parameters related to "benchmark.c" and run the script
├── runmain.sh # set parameters related to "main.c" and run the script
├── runcachebenchmark.sh # run cachegrind to benchmark cache miss % on specified function
│ .
├── launchbenchmark.sh # set parameters related to "benchmarkgpu.cu" and run the script on SLURM system
├── launchmain.sh # set parameters related to "maingpu.cu" and run the script on SLURM system
│ .
├── data # data gathered via "runbenchmark.sh" & "launchbenchmark.sh"
│ └── ...
├── plotdata.py # generates graphs using the data stored in "data" folder
│
├── Makefile
└── ...
```
Main commands
Makefile defines 4 rules :
* make : builds object files and homework-1 + homework-2 executables
* make debug : builds object files and ALL executables adding debugging flags
* make benchmark : builds object files and benchmark + benchmark_gpu executable
* make clean : cleans all object files
There are many pre-set scripts to choose from :
CPU scripts section ( Homework-1 )
GPU scripts section ( Homework-2 )
CPU test commands ( Homework-1 )
NOTE
Go first inside the repository before running the scripts
cd cudaMatrixTranspose
COMMANDS
"run_main.sh" script sets parameters related to homework-1 executable and runs it.
To change run parameters and have a better understanding of its functionalities see : run_main.sh
make
./run_main.sh
"run_benchmark.sh" script sets parameters related to benchmark executable and runs it.
extracted data can be found on the data folder
To change run parameters and have a better understanding of its functionalities see : run_benchmark.sh
make benchmark
./run_benchmark.sh
"runcachebenchmark.sh" script sets parameters related to homework-1 and runs Cachegrind on it, extracting localized informations about cache misses inside transposenaive() or transposeblocks() functions (according to the chosen parameter "method")
To change run parameters and have a better understanding of its functionalities see : runcachebenchmark.sh
make clean
make debug
./run_cache_benchmark.sh
GPU test commands ( Homework-2 )
NOTE
Please consider that the following commands are supposed to be ran on the Marzola DISI cluster, modify the launch_main.sh & launch_benchmark.sh scripts if needed to change partition or SLURM system.
Outside the cloned project folder upload the project's directory to the login node
scp -r cudaMatrixTranspose <YOUR USERNAME>@marzola.disi.unitn.it:/home/<YOUR USERNAME>
Then login and go inside the project's folder
cd cudaMatrixTranspose
module load cuda
COMMANDS
"launch_main.sh" script sets parameters related to homework-2 executable and runs it.
To change run parameters and have a better understanding of its functionalities see : launch_main.sh
make
sbatch launch_main.sh
To visualize the results, once the node returns do:
cat output.out
"launch_benchmark.sh" script sets parameters related to benchmark_gpu executable and runs it.
extracted data can be found on the data folder
To change run parameters and have a better understanding of its functionalities see : launch_benchmark.sh
make benchmark
sbatch launch_benchmark.sh
Graph Plotting
Inside the project's directory there's also a python script which take's the content of data folder and generates 2 types of graphs
- x : Matrix size - y : Mean execution time
- x : Matrix size - y : Mean effective bandwidth
Test it by running (on you own device) :
python3 plot_data.py
You can customize what information to plot inside the script
Extra Customization
It's also possible to change some other parameters at compilation level (optimization level and matrix element data type) by changing some variables in the makefile) :
Owner
- Login: LuCazzola
- Kind: user
- Repositories: 1
- Profile: https://github.com/LuCazzola
Citation (CITATION.cff)
@software{matTrans_LucaC,
author = {Luca Cazzola},
month = {5},
title = {{cuda inplace matrix transpose}},
url = {https://github.com/LuCazzola/cudaMatrixTranspose,
version = {1.0},
year = {2024}
}