hb-pytorch
Repo to hold HammerBlade PyTorch port. Based on PyTorch v1.4.0
Science Score: 18.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Repository
Repo to hold HammerBlade PyTorch port. Based on PyTorch v1.4.0
Basic Info
- Host: GitHub
- Owner: cornell-brg
- License: other
- Language: C++
- Default Branch: master
- Size: 242 MB
Statistics
- Stars: 13
- Watchers: 20
- Forks: 10
- Open Issues: 17
- Releases: 0
Metadata Files
README.md

PyTorch HammerBlade Port

This work aims to port PyTorch to HammerBlade.
How to build PyTorch to use COSIM
This assumes that you have a working HB Cosimulation installed through bsg_bladerunner. Then:
- Enable devtoolset-8 or any toolchain that supports C++14.
- Set following variable to point to bsg_bladerunner clone:
export BRG_BSG_BLADERUNNER_DIR=<path to bsg_bladerunner that has be setup>
Clone hb-pytorch repo:
git clone -b hb-device git@github.com:cornell-brg/hb-pytorch.git
Create python virtual environment:
python3.6 -m venv ./venv_pytorch
Install dependencies:
pip install --upgrade pip pip install numpy pyyaml mkl mkl-include setuptools cmake cffi typing sklearn tqdm pytest ninja hypothesis thop pillow
Remove automatically installed PyTorch:
pip uninstall torch
Init pytorch third party dependencies:
git submodule update --init --recursive
Setup building environment variables:
cd hb-pytorch && source setupcosimbuild_env.sh
Build pytorch. This step can take up to 15 minutes:
python setup.py develop
Above command also compiles device kernels with RISCV toolchain and installs the kernel binary. Optionally, kernels can be compiled with Clang by running the following instead of above:
CLANG=1 python setup.py develop
It's important that CLANG=1 has to be present everytime we build/re-build `hb-pytorch to compile
kernels with Clang. To check if the current build compiled kernels with Clang, we can run:
readelf -p .comment <hb-pytorch-root>/build/c10/hammerblade/kernel.riscv
The output when compiled with Clang should be something like this:
String dump of section '.comment':
[ 0] clang version 10.0.0 (https://github.com/bespoke-silicon-group/llvm-project.git 3ee81f3def2c4c2a818f9f939f4421b3f3af313e)
[ 7a] GCC: (GNU) 9.2.0
- PyTorch can be used with cosim by running one of the following executables, instead of
python:pycosim: Runs python with cosim backendpycosim.trace: Enables device instruction tracepycosim.wave: Enbales device instruction trace AND waveform dumps
For example, a PyTorch program foo.py can be executed with hb-pytorch's cosim backend with on of the following:
pycosim foo.py
pycosim.trace foo.py # To get HB device execution trace
pycosim.wave foo.py # To get HB device execution trace and RTL simulation waveform.
How to build PyTorch with Emulation Layer
Clone this repository:
git clone git@github.com:cornell-brg/hb-pytorch.git
Create a Python virtual environment:
python3 -m venv ./venvpytorch source ./venvpytorch/bin/activate
Install some dependencies:
pip install numpy pyyaml mkl mkl-include setuptools cmake cffi typing sklearn tqdm pytest ninja hypothesis
Init PyTorch third party dependencies:
git submodule update --init --recursive
Setup building environment variables:
source setupemulbuild_env.sh
Build PyTorch. This step can take up to 15 minutes:
python setup.py develop
Turn on emulation debug info
export HBEMUL_DEBUG=1
Setup emulated HB device size
export HBEMULTILEXDIM=16 export HBEMULTILEYDIM=8
Run Pytests
- Goto hb-pytorch directory
cd hb-pytorch/hammerblade/torch - Run pytest
python pytest_runner.py
Important files and directories related to HammerBlade
files used to run pytest (adapted from Baseline)
hammerblade/fragments/hammerblade/environment.mkbaseline-README.mdrun-hb-pytest.sh(sourcethis one to run pytest!)hammerblade/torch/#### HammerBlade device codehammerblade/torch/kernel#### Pytest testshammerblade/torch/tests/#### files that interacts with HammerBlade CUDALite runtimec10/hammerblade/
How to implement a new kernel
- Register the kernel for HammerBlade with PyTorch by editing
aten/src/ATen/native/native_functions.yaml```diff func: sigmoid(Tensor self) -> Tensor usec10dispatcher: full supportsnamedtensor: True variants: function, method dispatch: CPU: sigmoid CUDA: sigmoid- HammerBlade: sigmoid MkldnnCPU: mkldnn_sigmoid ```
- Add host code to
aten/src/ATen/native/hammerblade/Sigmoid.cppAdd the dummiest host code possible, without calling the kernel. - Add tests to
hammerblade/torch/tests/test_sigmoid.py - With Emulation Layer, make sure the code compiles and tests fail only because of incorrect results
- Add kernel code to
hammerblade/torch/kernel/kernel_sigmoid.cpp, which is also the dummiest code. - Change the host code to be more realistic: call the kernel and do nothing else.
- Implement both the host and kernel code for real, assuming 1x1 tile group.
- Make sure everything pass on Emulation layer, and write more tests. Then you are ready to create a PR!
- Make sure your code works on COSIM.
- Optimizations, like parallelization etc.
### Kernel Development Tips
1. Maintaining two clones, one for emulation and one for cosim (eg., hb-pytorch/ and hb-pytorch-cosim/), eases
the burden of cosim evaluation. This requires two separate pytorch environments as well (eg., venv_pytorch and venv_pytorch_cosim).
Ideally, you would only ever need to run once, to debug an issue. Use
gdbextensively with emulation.$ gdb python (gdb) b tensorlib_sigmoid (gdb) r -m pytest test_sigmoid.pyLinking would become a bottleneck when running in tight loop. As a result,gdbcould save a lot of time compared to printf debugging.Sometimes new cpp files are not taken into account by cmake. Since kernel authors would only ever need to add new files either to
aten/src/Aten/nativeorhammerblade/torch/running following command might solve the failure:touch aten/src/ATen/CMakeLists.txt # New host code sources touch c10/hammerblade/CMakeLists.txt # New device code sources
Native Profiling Tools
Native Profiling tools provide ATen operator level info, including per operator execution time break down and unimplemented HB operator info.
To enable profiling tools, call torch.hammerblade.profiler.enable()
To disable profiling tools, call torch.hammerblade.profiler.disable()
To test if the profiling tools are currently running, call torch.hammerblade.profiler.is_in_ROI()
```python import torch
start of ROI
torch.hammerblade.profiler.enable() x = torch.randn(10) y = x + x
end of ROI
torch.hammerblade.profiler.disable() ```
To read profiling data, call torch.hammerblade.profiler.stats()
By default, this returns a string of per ATen operator execution time (ExecTime) and unimplemented operators (Unimpl).
One may also pass in a list using KeyArg key. Available options are ExecTime, ExecTime-Latex, ExecTime-Raw, Unimpl
```python import torch
torch.hammerblade.profiler.enable() x = torch.randn(10) torch.hammerblade.profiler.disable()
print(torch.hammerblade.profiler.stats(key=['ExecTime-Raw'], trimming=True)) ```
Here trimming is a "simulated time" correction mechanism.
HB Profiling
HB Kernel Call Logs
HB emulation can output a file with the list of kernel calls along with associated data in json format. This can be used as:
```python import torch import torch.hammerblade.kernel_logger as hblog
x = torch.rand(2, 2).hammerblade() y = torch.rand(2, 2).hammerblade()
Enables the log
hblog.enable()
print(x + y)
Disbales the log
hblog.disable()
This is excluded from the log
print(x - y)
Logs only the tensor add
print(hblog.json())
Clears above operations from the logger
hblog.clear()
hblog.enable() print(x * y) hblog.disable()
Logs only the tensor mul
print(hblog.json()) ```
HB Key Kernel Charting
Chart provides a way to log down the "execution chart" of key kernels in a workload.
To use Chart, one needs to register one or more ATen operator signatures.
```python import torch
M = torch.randn(2, 3) mat1 = torch.randn(2, 3) mat2 = torch.randn(3, 3)
reset chart
torch.hammerblade.profiler.chart.clear()
add signature
torch.hammerblade.profiler.chart.add("at::Tensor at::CPUType::{anonymous}::addmm(const at::Tensor&, const at::Tensor&, const at::Tensor&, c10::Scalar, c10::Scalar)")
turn on profiling
torch.hammerblade.profiler.enable()
run addmm
torch.addmm(M, mat1, mat2)
end profiling
torch.hammerblade.profiler.disable()
dump chart
print(torch.hammerblade.profiler.chart.json()) ```
The output should be
json
[
{
"offload": false,
"signature": "at::Tensor at::CPUType::{anonymous}::addmm(const at::Tensor&, const at::Tensor&, const at::Tensor&, c10::Scalar, c10::Scalar)"
}
]
HB Key Kernel Redispatching
One may choose to redispatch a kernel that should run on CPU to HB with Route. Route takes in the JSON produced by Chart. To redispatch a kernel, one just needs to
change "offload": false to "offload": true.
```python import torch
M = torch.randn(2, 3) mat1 = torch.randn(2, 3) mat2 = torch.randn(3, 3)
route = """[ { "offload": false, "signature": "at::Tensor at::CPUType::{anonymous}::addmm(const at::Tensor&, const at::Tensor&, const at::Tensor&, c10::Scalar, c10::Scalar)" }, { "offload": true, "signature": "at::Tensor at::CPUType::{anonymous}::add(const at::Tensor&, const at::Tensor&, c10::Scalar)" } ] """ data = json.loads(route) torch.hammerblade.profiler.route.setroutefrom_json(data)
torch.hammerblade.profiler.enable() torch.addmm(M, mat1, mat2)
this add should be redispatch to HB
torch.add(M, mat1) torch.hammerblade.profiler.disable() ```
Owner
- Name: Batten Research Group
- Login: cornell-brg
- Kind: organization
- Repositories: 46
- Profile: https://github.com/cornell-brg
Computer Systems Laboratory, Cornell University
Citation (CITATION)
@inproceedings{paszke2017automatic,
title={Automatic Differentiation in {PyTorch}},
author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
booktitle={NIPS Autodiff Workshop},
year={2017}
}
GitHub Events
Total
Last Year
Dependencies
- androidx.appcompat:appcompat 1.0.0 implementation
- com.android.support:appcompat-v7 28.0.0 implementation
- com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
- com.facebook.soloader:nativeloader 0.8.0 implementation
- com.google.code.findbugs:jsr305 3.0.1 implementation
- com.google.code.findbugs:jsr305 3.0.1 compileOnly
- com.facebook.soloader:nativeloader 0.8.0 implementation
- com.android.support:appcompat-v7 28.0.0 implementation
- com.facebook.soloader:nativeloader 0.8.0 implementation
- com.google.code.findbugs:jsr305 3.0.1 compileOnly
- com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
- com.facebook.soloader:nativeloader 0.8.0 implementation
- junit:junit 4.12 testImplementation
- com.android.support:appcompat-v7 28.0.0 implementation
- com.android.support:appcompat-v7 28.0.0 implementation
- actions/checkout v1 composite
- actions/setup-python v1 composite
- pytorch/add-annotations-github-action master composite
- ubuntu ${UBUNTU_VERSION} build
- nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build