Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: dadeba
  • License: bsd-3-clause
  • Language: C++
  • Default Branch: main
  • Size: 58.6 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

GEMM Routines in Posit for GPUs

We have ported the addition and multiplication routines from SoftPosit as OpenCL kernels. We also created GEMM routines in 32-bit Posit arithmetic. These programs were used for benchmarking in our paper presented at HPC Asia 2024. The paper is also published in arxiv

Part of the code is derived from MPLAPACK.

Build Instructions

  1. Clone the SoftPosit repository: bash $ git clone https://gitlab.com/cerlane/SoftPosit.git

  2. Apply the patch and build SoftPosit: bash $ cd SoftPosit $ patch -p1 < ../SoftPosit.patch $ cd build/Linux-x86_64-GCC $ make $ cd ../../..

  3. Build the project: bash $ make

Test Programs

All programs use GEMM routines in 32-bit Posit arithmetic. You can specify the blocking size for the GEMM routines through an environment variable.

Example of setting the block size to 16: bash $ export OPENCL_GEMM_BLOCKSIZE=16

The performance of all programs can vary slightly depending on the blocking size.

GEMM

  • run_gemm
  • rungemmtrailing

LU decomposition

  • run_lu
  • runlubench
  • runlucheck
  • runlupower_bench

Cholesky decomposition

  • run_cho
  • runchobench
  • runchocheck

Reference

bibtex @inproceedings{10.1145/3635035.3635046, author = {Nakasato, Naohito and Murakami, Yuki and Kono, Fumiya and Nakata, Maho}, title = {Evaluation of POSIT Arithmetic with Accelerators}, year = {2024}, isbn = {9798400708893}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3635035.3635046}, doi = {10.1145/3635035.3635046}, booktitle = {Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region}, pages = {62–72}, numpages = {11}, location = {Nagoya, Japan}, series = {HPCAsia '24} }

Owner

  • Name: N.Nakasato
  • Login: dadeba
  • Kind: user
  • Location: Japan
  • Company: University of Aizu

Citation (CITATION.bib)

@inproceedings{10.1145/3635035.3635046,
author = {Nakasato, Naohito and Murakami, Yuki and Kono, Fumiya and Nakata, Maho},
title = {Evaluation of POSIT Arithmetic with Accelerators},
year = {2024},
isbn = {9798400708893},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3635035.3635046},
doi = {10.1145/3635035.3635046},
abstract = {We present an evaluation of 32-bit POSIT arithmetic through its implementation as accelerators on FPGAs and GPUs. POSIT, a floating-point number format, adaptively changes the size of its fractional part. We developed hardware designs for FPGAs and software for GPUs to accelerate linear algebra operations using Posit(32,2) arithmetic. Our FPGA- and GPU-based accelerators in Posit(32,2) arithmetic significantly accelerated the Cholesky and LU decomposition algorithms for dense matrices. In terms of numerical accuracy, Posit(32,2) arithmetic is approximately 0.5 - 1.0 digits more accurate than the standard 32-bit format, especially when the norm of the elements of the input matrix is close to 1. Evaluating power consumption, we observed that the power efficiency of the accelerators ranged between 0.043 - 0.076 Gflops/watts for the LU decomposition in Posit(32,2) arithmetic. The power efficiency of the latest GPUs as accelerators of Posit(32,2) arithmetic is better than that of the evaluated FPGA chip.},
booktitle = {Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region},
pages = {62–72},
numpages = {11},
location = {Nagoya, Japan},
series = {HPCAsia '24}
}

GitHub Events

Total
Last Year