https://github.com/cornell-zhang/allo-pldi24-artifact
Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, ieee.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary
Repository
Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"
Basic Info
Statistics
- Stars: 11
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Allo Artifact
This repository contains scripts for setting up environments and reproducing results presented in the PLDI 2024 paper entitled Allo: A Programming Model for Composable Accelerator Design. If you wish to access the core implementation, documentation, and tutorials for the Allo language, please refer to the following links. We encourage you to explore these resources if you are interested in using Allo for designing other hardware accelerators that are not presented in our paper.
- Allo repository: https://github.com/cornell-zhang/allo
- Allo documentation: https://cornell-zhang.github.io/allo
Clone the Repository
Please clone the repository following the instruction below. You can optionally include the --recursive flag to download all baseline systems under the 3rdparty directory. Notice this may take a while to download all the baseline systems since most of the baselines require a separate LLVM build. For AE, clone this repo without downloading the baseline systems. Instead, you can use the provided docker image, which has already set up the required environment.
bash
git clone https://github.com/cornell-zhang/allo-pldi24-artifact.git
cd allo-pldi24-artifact
Setup Environment (Est. Time: 30 mins)
We have already built a docker image that contains necessary packages, including the Allo library and the baseline systems. Since our experiments involve using the AMD Vitis toolchain for FPGA synthesis, we also require the reviewers to install the Vitis 2022.1 toolchain. Below are the instructions for setting up the environment.
Allo and Baseline Systems
Pull Docker Image from Docker Hub
We provide a pre-built docker image available on Docker Hub. Users can pull it with:
bash
docker image pull alloprj/allo-container:latest
docker tag alloprj/allo-container:latest allo-container
Build from source
If you do not want to use the pre-built docker, we also provide detailed instructions on how to build the baseline systems from scratch. Please refer to 3rdparty/README.md for more information. For AE, there's no need to build from source. All tools are provided in the docker container.
Vitis Toolchain
The experiments require the Vitis 2022.1 toolchain for FPGA synthesis. We provide two options for users to obtain the Vitis toolchain. The first option is to use the pre-configured Vitis toolchain volume that we provide, which can be downloaded from our Drive, but it may take about 5 hours to download the whole package. The second option is to download the Vitis toolchain from the official website. The downloading speed is probably faster, but you need to register an AMD account and install it manually on your machine, which may also take about 5 hours to complete. Reviewers can choose either option based on their preference.
Vitis Toolchain Volume (For AE Only)
We provide a pre-configured Vitis toolchain installation volume that can be downloaded from this Drive. Notice this is only for artifact evaluation and should not be used for other purpose. This link will be invalid after the artifact evaluation period.
Please verify the md5 checksum of the downloaded zip file:
bash
$ md5sum vitis-docker-volume.zip
a52f2b2cb5e6a6eae44243a0fa1774d5 vitis-docker-volume.zip
The image is about 66GB and can be unzipped using the following command:
bash
unzip vitis-docker-volume.zip
If you need full access to the Vitis toolchain (not necessary for AE), please download it from the official website following the instruction below.
Download from Official Website
To fully utilize the Vitis toolchain, please download from the official website. We recommend using the Vitis 2022.1 version, which is the version we used for our experiments. Notice that you need to register an AMD account to download the toolchain.
Please visit the "Vitis Core Development Kit - 2022.1 Full Product Installation" webpage and download the "Xilinx Unified Installer 2022.1: Linux Self Extracting Web Installer (BIN - 266.73 MB)". You can follow the official instruction install the Vitis toolchain.
After installing, you need to remember the path to the Vitis toolchain, which will be used to mount the volume to the docker image. For example, if you install the Vitis toolchain to /opt/xilinx/2022.1/, then the path to the Vitis toolchain is /opt/xilinx/2022.1/, and you can see Vitis_HLS and Vivado folders under this directory.
Kick-the-Tires (Est. Time: 10 mins)
To make sure the environment runs smoothly, we first examine if the docker and the required packages are installed correctly. You can launch the docker image with the following command. Notice that you need to replace /your/path/to/vitis-docker-volume/ with the path to the Vitis toolchain that you downloaded from the official website or the path to the unzipped volume.
bash
docker run -v /your/path/to/vitis-docker-volume/:/tools/xilinx -it allo-container:latest /bin/bash
[!NOTE] If you already have Vitis 2022.1 installed on your system, and you would like to use it inside the AE docker container, you must change the mounting destination to be exactly the same as your installation path. For example, if your Vitis installation path is
/tools/Xilinx/, you have to change the docker command todocker run -v /tools/Xilinx:/tools/Xilinx -it allo-container:latest /bin/bash.
Check Vitis Toolchain
Inside the docker image, you can check if the Vitis toolchain is installed correctly by running the following command. The output should be the path to the vitis_hls binary.
(py312) root@00ad1aec9529:~/allo# which vitis_hls
/tools/xilinx/Vitis_HLS/2022.1/bin/vitis_hls
[!NOTE] If you mounted your own Vitis installation to the docker container, you need to source the set up script before checking
vitis_hls. For example, if your installation path is/tools/Xilinx, you will need to dosource /tools/Xilinx/Vitis_HLS/2022.1/settings64.shinside the container.
Run Unit Tests
We will run some basic tests on the artifact to make sure the environment is set up correctly. Inside the docker image, you can use the following command to run the unit tests:
bash
cd /root/allo && python3 -m pytest tests
You are expected to see the following outputs:
================================== 197 passed, 1 skipped, 62 warnings in 86.76s (0:01:26) ===================================
Congratulations! You have successfully set up the environment and passed the basic tests :) You can now proceed to the next step to reproduce the experiments in the paper.
Reproduce Experiments (Est. Time: 20 hours)
Below are the scripts to reproduce experiments in Allo paper. Each script will emit logging files that are used to generate the final results.
We provide scripts for reproducing the results of Figure 10, Table 3, and Table 4, which construct the main conclusion of our paper. For Figure 12, it requires a U280 FPGA to run the experiment. Due to campus security settings, we are unable to grant reviewers access to the machine, and thus, Figure 12 is not part of the artifact evaluation. However, we provide the code and instructions for users with access to FPGA machines. Additionally, the synthesis report is included for reference.
First, we can run the following command to enter the docker image. It will mount the Vitis toolchain and the AE folder to the docker, so please make sure you have already cloned this repository and downloaded the Vitis toolchain. We highly recommend lanuching a tmux terminal before executing the docker image, as the experiments may take a long time to finish. Also remember to replace /your/path/to/vitis-docker-volume/ in the following command with the path to the Vitis toolchain, and you are in the allo-pldi24-artifact directory.
bash
docker run -v /your/path/to/vitis-docker-volume/:/tools/xilinx -v $(pwd):/root/allo-pldi24-artifact -it allo-container:latest /bin/bash
[!NOTE] If you would like to use your own Vitis installation, please make sure the Vitis mounting destination path in the docker is the same as your system installation path. For example. if your Vitis installation path is
/tools/Xilinx/, you have to change the docker command todocker run -v /tools/Xilinx:/tools/Xilinx -v $(pwd):/root/allo-pldi24-artifact -it allo-container:latest /bin/bash. You also need to source the set up script for Vitis HLS before reproducing the experiment results.
Figure 10 - PolyBench (Est. Time: 16 hours)
As it requires more than 200G disk space to install all the baseline packages, we do not contain all the packages in the docker image, and thus will not perform end-to-end code generation, which also costs lots of time. Instead, we generate the optimized HLS C++ code from each baseline offline, and reviewers only need to run the Vitis HLS to obtain the final report. Specifically, the HLS C++ code from ScaleHLS, HeteroCL, Pylog, and Dahlia are generated offline.
For end-to-end testing (not for AE), we also include the scripts and instructions to generate the optimized HLS C++ code from each baseline. Please refer to 3rdparty/README.md to install the required packages and generate the optimized HLS C++ code.
Experimental Settings
We evaluate the following benchmarks from PolyBench. The reference implementations are available in the PolyBenchC-4.2.1: * 2mm * 3mm * atax * bicg * correlation * gemm * gesummv * jacobi-2d * mvt * symm * syr2k * syrk * trmm
Each folder contains the original baseline implementation files and the generated HLS C++ files. You can go to each folder to check the details. For example, polybench/allo/2mm is the Allo implementation for the 2mm benchmark, and there are three files under this folder:
* two_mm.py: The Allo implementation
* two_mm.cpp: The generated HLS C++ file
* run.tcl: The Vitis HLS script to run the synthesis
Run the Experiment
Inside the docker, you can use the following command to run the experiment:
```bash cd /root/allo-pldi24-artifact/polybench
Allo (Est. Time: 6 hours)
cd allo && python3 run.py
ScaleHLS (Est. Time: 4 hours)
cd ../scalehls && python3 run.py
HeteroCL (Est. Time: 3 hours)
cd ../heterocl && python3 run.py
Pylog (Est. Time: 1 hour)
cd ../pylog && python3 run.py
Dahlia (Est. Time: 1 hour)
cd ../dahlia && python3 run.py
Vitis (Est. Time: 1 hour)
cd ../vitis && python3 run.py ```
If you want to speed up the above experiments, you can invoke multiple terminals/processes to run the experiments in parallel. Since Vitis HLS only leverages one CPU core to run the synthesis, experiments of different frameworks can be run in parallel without impacting the final results. You can also call polybench/run.sh to automatically run all the above commands in one script.
The results will be dumped to each folder. Lastly, we can call the following command to collect the results and generate the final figure.
```bash cd /root/allo-pldi24-artifact/polybench/plot/
first build the result dataset with the results from previous experiments
python build_dataset.py
then, we are ready to generate the latency plot
python latency_plot.py ```
The result figure will be generated to: /root/allo-pldi24-artifact/polybench/plot/polybench.pdf.
End-to-end Generation (Not for AE)
The polybench folder contains all the required code to generate the optimized HLS C++ code from each baseline. Please refer to 3rdparty/README.md to install the required packages, and those optimized C++ code can be generated using the following commands:
| Frameworks | Commands |
| --- | --- |
| Allo | python <allo_code>.py |
| ScaleHLS | bash run_dse.sh |
| Pylog | python <pylog_code>.py |
| Dahlia | Copy <dahlia_code>.fuse to the website |
| HeteroCL | python <heterocl_code>.py |
Table 3 (Est. Time: 10 min)
Please run the experiments in Figure 10 first to obtain the necessary report files. For latency, II, DSP usage, the results are obtained from the HLS report file in the above experiments. For frequency, since each design may take more than 2 hours to run placement and routing (PnR), we directly provide the PnR results under the polybench/allo/pnr and polybench/scalehls/pnr directories. The compilation time is not for the AE, as it highly depends on the CPUs that run this artifact. Moreover, since ScaleHLS leverages randomized heuristic search, the compilation time is not deterministic and may vary from case to case.
To generate Table 3, please run the following commands inside the docker container:
bash
cd /root/allo-pldi24-artifact/polybench
python3 report.py
We also list the reference code below for counting the number of lines of customization code (LoC) for each benchmark. Empty lines and comments are not counted.
| Benchmark | LoC | Reference | | --- | --- | --- | | atax | 9 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/polybench/allo/atax/atax.py#L41-L53 | | correlation | 19 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/polybench/allo/correlation/correlation.py#L114-L136 | | jacobi-2d | 17 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/polybench/allo/jacobi2d/jacobi2d.py#L53-L71 | | symm | 15 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/polybench/allo/symm/symm.py#L52-L68 | | trmm | 12 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/polybench/allo/trmm/trmm.py#L37-L50 |
Backend Synthesis (Not for AE)
If you indeed want to run the placement and routing for these designs, you can run the Allo code in a push-button way, which will automatically invoke the Vitis toolchain for synthesis. For example, running the PnR for the atax benchmark:
bash
cd /root/allo-pldi24-artifact/polybench/allo/atax
python3 atax.py
Table 4 - CNN (Est. Time: 3 hours)
Next, we run the experiments for multiple kernels. We leverage the three CNN models, including MobileNet, ResNet18, and VGG16, to evaluate the performance of Allo. The scripts to run the experiments are provided below.
Please run the following commands in the docker and make sure you have launched the tmux terminal in case the experiments take a long time to finish.
```bash
cd /root/allo-pldi24-artifact/cnn
MobileNet
cd mobilenet vitishls -f mobilenetallo.tcl vitishls -f mobilenetscalehls.tcl
ResNet18
cd ../resnet18 vitishls -f resnet18allo.tcl vitishls -f resnet18scalehls.tcl
VGG16
cd ../vgg16 vitishls -f vgg16allo.tcl vitishls -f vgg16scalehls.tcl ```
After running all the neural network models, we can generate the result table by running:
bash
cd /root/allo-pldi24-artifact/cnn
python3 plot.py
Figure 12 - LLM (Not for AE)
As this experiment requires a U280 FPGA for evaluation and takes approximately 24 hours to push the design from high-level synthesis to backend synthesis and generate a bitstream, this experiment is NOT for AE purpose. However, we provide a reference HLS C++ code, which is generated from Allo, with modifications to fit on the chiplet-based FPGA. Also, we provide the post and route report under the report directory. Reviewers can find the following results from this report that matches the right-hand side table of Figure 12.
| Resources | Utilization | Report | | --- | --- | --- | | BRAM | 384 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/llm/reports/link/imp/impl1fullutilrouted.rpt#L113 | | DSP | 1780 | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/llm/reports/link/imp/impl1fullutilrouted.rpt#L129 | | FF | 652K | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/llm/reports/link/imp/impl1fullutilrouted.rpt#L45 | | LUT | 508K | https://github.com/cornell-zhang/allo-pldi24-artifact/blob/main/llm/reports/link/imp/impl1fullutilrouted.rpt#L84 |
To generate the hardware accelerator, please make sure you have set up the entire Vitis/Vivado toolchain and have device xpfm board file set as the XDEVICE environment variable. You may also need to prepare the exported parameters from a pretrained model, which should be put under a const folder. After setting up the environment, use the following command to invoke high-level synthesis and backend synthesis:
bash
make all TARGET=hw PLATFORM=$XDEVICE
It may take around a day to generate the final bitstream. After obtaining the bitstream, you can run the following command to deploy the bitstream to the FPGA and run the experiment:
bash
make run TARGET=hw PLATFORM=$XDEVICE EMU_PS=X86
For the Allo frontend code, please refer to the example folder in the Allo repository, which describes how to import models (e.g., GPT2) from PyTorch and generate the corresponding Allo code.
Further Usage
Examples
We provide a comprehensive sets of examples under the Allo repository. Please check out these examples and test cases if you are interested!
Tutorials
We also provide detailed documentation and tutorials for users who are interested in using Allo for designing other hardware accelerators. Please refer to this webpage for more information.
We highly recommend the reviewers can go through the Allo Vivado HLS Backend tutorial, which goes through the entire process of implementing the optimizations of row-wise product GEMM in the paper.
More information
For AE reviewers, please contact the authors through HotCRP for any questions. For other users, please open an issue publicly or contact chhzh123 for any technical questions.
Owner
- Name: Cornell Zhang Research Group
- Login: cornell-zhang
- Kind: organization
- Website: https://zhang.ece.cornell.edu/
- Repositories: 12
- Profile: https://github.com/cornell-zhang
GitHub Events
Total
- Issues event: 1
- Watch event: 13
Last Year
- Issues event: 1
- Watch event: 13