unisparse

Code base for OOPSLA'24 paper: UniSparse: An Intermediate Language for General Sparse Format Customization

https://github.com/cornell-zhang/unisparse

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: acm.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization cornell-zhang has institutional domain (zhang.ece.cornell.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Code base for OOPSLA'24 paper: UniSparse: An Intermediate Language for General Sparse Format Customization

Basic Info
  • Host: GitHub
  • Owner: cornell-zhang
  • License: apache-2.0
  • Language: MLIR
  • Default Branch: main
  • Homepage:
  • Size: 204 MB
Statistics
  • Stars: 30
  • Watchers: 4
  • Forks: 0
  • Open Issues: 4
  • Releases: 1
Created over 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

UniSparse: An Intermediate Language for General Sparse Format Customization

DOI GitHub GitHub Actions

UniSparse is an intermediate language and compiler that provides a unified abstraction for representing and customizing sparse formats. Compared to prior sparse linear algebra compilers, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. UniSparse improves over current programming models that only provide limited support for customized sparse formats.

This repository implements UniSparse as an independent dialect on top of the MLIR infrastructure. The UniSparse dialect allows users to declaratively specify format conversion and compute kernels using UniSparse format encodings. The compiler tool automatically lowers the program and generates format conversion routines and sparse linear algebra kernels.

Installation

We provide two ways to install UniSparse: - Docker. The docker image has prebuilt LLVM/MLIR and baseline projects such as TACO. The image requires 14.5GB memory space. - Build from source. We provide a bash script that builds LLVM/MLIR and UniSparse to your local environment. Note that it requires 11GB memory space.

Docker

We first pull a docker image from dockerhub:
$docker pull sibylau/mlir-llvm:oopsla24-ae Note that this docker image is 14.5GB and it may take time to download it.
Then we run a container from this docker image:
$ docker run -it --entrypoint bash sibylau/mlir-llvm:oopsla24-ae Inside this container, we install UniSparse:
$ git clone https://github.com/cornell-zhang/UniSparse.git -b oopsla24-ae Source the bash file under the UniSparse project directory path:
$ cd UniSparse && source script/build.sh
Please also export environment variable
$ export LD_LIBRARY_PATH=/install/taco/build/lib:$LD_LIBRARY_PATH

For a test run, generate sparse kernels and run them via $cd evaluation/KernelGeneration && bash run.sh.

Build from source

Prerequisites: - cmake v3.18.0 or newer - ninja v1.10.1 or newer (optional)

Please clone the UniSparse repo, and declare your own Eigen, LLVM and UniSparse projects root path: $ export EIGEN_PATH=$YOUR_EIGEN_PATH $ export LLVM_PATH=$YOUR_LLVM_PATH $ export UNISPARSE_PATH=$YOUR_UNISPARSE_PATH $ git clone https://github.com/cornell-zhang/UniSparse.git $UNISPARSE_PATH/UniSparse and then source the install.sh script. $ cd $UNISPARSE_PATH/UniSparse $ bash install.sh Please export environment variables for LLVM/MLIR and UniSparse: $ export CPATH=$EIGEN_PATH:$CPATH $ export PATH=$LLVM_PATH/build/bin/:$PATH $ export LD_LIBRARY_PATH=$LLVM_PATH/build/lib:$LD_LIBRARY_PATH $ export PATH=$UNISPARSE_PATH/build/bin/:$PATH $ export LD_LIBRARY_PATH=$UNISPARSE_PATH/build/lib:$LD_LIBRARY_PATH

Getting Started

Format Conversion

In the UniSparse project, go to $cd evaluation/FormatConversion. Run $bash run.sh will compile format conversion programs generated by UniSparse and two baselines -- MLIR SparseTensor and TACO to executables.

Export the environment variable TENSOR0 to be the path of the desired sparse matrix before running the experiments. The docker environment includes several sparse matrix datasets downloaded from the SuiteSparse matrix collection under the path /install/datasets. For example, if we want to run the CSR to CSC format conversion on the matrix /install/datasets/row_major/wiki-Vote_row_major.mtx in the docker environment, we need to export TENSOR0 before running the program executable: $ export TENSOR0=/install/datasets/row_major/wiki-Vote_row_major.mtx $ ./csr_csc # UniSparse executable $ ./sparse_tensor_csr_csc # SparseTensor dialect executable $ ./taco_format_conversion /install/datasets/row_major/wiki-Vote_row_major.mtx 2 # TACO executable The command line output shows execution time in seconds for each format conversion kernel and dataset.

The pre-built executables can be found in evaluation/FormatConversion/executables, and the expected output is shown in evaluation/FormatConversion/output.log. According to the results, we can found that the execution time of format conversion kernels generated by UniSparse is comparable or better than two baselines, while UniSparse supports automated conversion for more custom formats, such as DCSC to BCSR, CSB to DIA_Variant, COO to C2SR, and COO to CISR, as claimed in the paper.

Kernel Generation

In the UniSparse project, go to $cd evaluation/KernelGeneration. Run $bash run.sh will compile sparse kernels generated by UniSparse and two baselines -- MLIR SparseTensor and TACO to executables.

Export the environment variable TENSOR0 to be the path of the desired sparse matrix before running the experiments. For example, if we want to run the SpMM kernel using the CSR format on the matrix /install/datasets/row_major/wiki-Vote_row_major.mtx in the docker environment, we need to export TENSOR0 before running the program executable: $ export TENSOR0=/install/datasets/row_major/wiki-Vote_row_major.mtx $ ./unisparse_csr_spmm_F64 # UniSparse executable $ ./sparse_tensor_csr_spmm_F64 # SparseTensor dialect executable $ ./taco_csr_spmm /install/datasets/row_major/wiki-Vote_row_major.mtx # TACO executable The command line output shows execution time in seconds for each kernel and dataset.

The pre-built executables can be found in evaluation/KernelGeneration/executables, and the expected output is shown in evaluation/KernelGeneration/output.log. According to the results, we can found that the execution time of sparse linear algebra kernels generated by UniSparse is comparable or better than two baselines. Specifically, kernels generated by UniSparse achieve similar performance to those generated by MLIR SparseTensor, as UniSparse reuses kernel generation passes of MLIR SparseTensor. While TACO generated kernels are typically slower than UniSparse and MLIR SparseTensor. Note that all kernels are using double precision and execute in a single thread. As mentioned in the rebuttal, the CSCCSCCSC_SpGEMM kernel generated by TACO is not functionally correct, and therefore, we do not include the results for evaluation here.

Format & Kernel Expressibility Extension

In the UniSparse project, go to $cd evaluation/Reusability. Run $bash run.sh will compile format conversion and compute kernels generated by UniSparse with other formats not presented in the paper.

Export the environment variable TENSOR0 to be the path of the desired sparse matrix before running the experiments. For example, if we want to run the CSR to DIA variant format conversion on the matrix /install/datasets/row_major/wiki-Vote_row_major.mtx in the docker environment, we need to export TENSOR0 before running the program executable: $ export TENSOR0=/install/datasets/row_major/wiki-Vote_row_major.mtx $ ./csr_dia_v # UniSparse executable The command line output shows execution time in seconds for each kernel and dataset.

The pre-built executables can be found in evaluation/Reusability/executables, and the expected output is shown in evaluation/Reusability/output.log. We show 3 examples of kernels generated by UniSparse. One is automatically converting from CSR to DIA variant format, another is SpMM using CSC format, and the last is CSRCSCCSC_SpGEMM kernel. These three kernels demonstrate that UniSparse can be reused to generate format conversion kernels with more source and target formats, and compute kernels with a diverse combination of formats.

Citation

Please refer to our OOPSLA'24 paper for more details. If you use UniSparse in your research, please use the following bibtex entry to cite us: @article{liu-unisparse-oopsla2024, author = {Liu, Jie and Zhao, Zhongyuan and Ding, Zijian and Brock, Benjamin and Rong, Hongbo and Zhang, Zhiru}, title = {UniSparse: An Intermediate Language for General Sparse Format Customization}, year = {2024}, issue_date = {April 2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {8}, number = {OOPSLA1}, url = {https://doi.org/10.1145/3649816}, doi = {10.1145/3649816}, journal = {Proceedings of the ACM on Programming Languages}, month = {April}, articleno = {99}, numpages = {29}, keywords = {compilers, heterogeneous systems, programming languages, sparse data formats} }

Owner

  • Name: Cornell Zhang Research Group
  • Login: cornell-zhang
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Liu"
  given-names: "Jie"
  orcid: "https://orcid.org/0000-0003-1534-3500"
- family-names: "Zhao"
  given-names: "Zhongyuan"
  orcid: "https://orcid.org/0000-0002-6637-553X"
- family-names: "Ding"
  given-names: "Zijian"
  orcid: "https://orcid.org/0009-0000-4555-2077"
- family-names: "Brock"
  given-names: "Benjamin"
  orcid: "https://orcid.org/0000-0003-1488-1622"
- family-names: "Rong"
  given-names: "Hongbo"
  orcid: "https://orcid.org/0000-0002-3275-7791"
- family-names: "Zhang"
  given-names: "Zhiru"
  orcid: "https://orcid.org/0000-0002-0778-0308"
title: "UniSparse: An Intermediate Language for General Sparse Format Customization"
version: 1.0.0
doi: 10.5281/zenodo.10464499
date-released: 2024-01-05
url: "https://github.com/cornell-zhang/UniSparse"

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
Last Year
  • Watch event: 2
  • Push event: 1

Dependencies

.github/workflows/github-actions-build-test.yml actions
  • actions/checkout v3 composite