hb-pytorch

Repo to hold HammerBlade PyTorch port. Based on PyTorch v1.4.0

https://github.com/cornell-brg/hb-pytorch

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Repo to hold HammerBlade PyTorch port. Based on PyTorch v1.4.0

Basic Info

Host: GitHub
Owner: cornell-brg
License: other
Language: C++
Default Branch: master
Size: 242 MB

Statistics

Stars: 13
Watchers: 20
Forks: 10
Open Issues: 17
Releases: 0

Created about 6 years ago · Last pushed over 3 years ago

Metadata Files

Readme Contributing License Citation Codeowners

PyTorch HammerBlade Port

This work aims to port PyTorch to HammerBlade.

How to build PyTorch to use COSIM

This assumes that you have a working HB Cosimulation installed through bsg_bladerunner. Then: - Enable devtoolset-8 or any toolchain that supports C++14. - Set following variable to point to bsg_bladerunner clone:

   export BRG_BSG_BLADERUNNER_DIR=<path to bsg_bladerunner that has be setup>

Clone hb-pytorch repo:

git clone -b hb-device git@github.com:cornell-brg/hb-pytorch.git
Create python virtual environment:

python3.6 -m venv ./venv_pytorch
Install dependencies:

pip install --upgrade pip pip install numpy pyyaml mkl mkl-include setuptools cmake cffi typing sklearn tqdm pytest ninja hypothesis thop pillow
Remove automatically installed PyTorch:

pip uninstall torch
Init pytorch third party dependencies:

git submodule update --init --recursive
Setup building environment variables:

cd hb-pytorch && source setupcosimbuild_env.sh
Build pytorch. This step can take up to 15 minutes:

python setup.py develop

Above command also compiles device kernels with RISCV toolchain and installs the kernel binary. Optionally, kernels can be compiled with Clang by running the following instead of above:

   CLANG=1 python setup.py develop

It's important that CLANG=1 has to be present everytime we build/re-build `hb-pytorch to compile kernels with Clang. To check if the current build compiled kernels with Clang, we can run:

   readelf -p .comment <hb-pytorch-root>/build/c10/hammerblade/kernel.riscv

The output when compiled with Clang should be something like this:

   String dump of section '.comment':
     [     0]  clang version 10.0.0 (https://github.com/bespoke-silicon-group/llvm-project.git 3ee81f3def2c4c2a818f9f939f4421b3f3af313e)
     [    7a]  GCC: (GNU) 9.2.0

PyTorch can be used with cosim by running one of the following executables, instead of python:
- pycosim: Runs python with cosim backend
- pycosim.trace: Enables device instruction trace
- pycosim.wave: Enbales device instruction trace AND waveform dumps

For example, a PyTorch program foo.py can be executed with hb-pytorch's cosim backend with on of the following:

   pycosim foo.py
   pycosim.trace foo.py # To get HB device execution trace
   pycosim.wave foo.py # To get HB device execution trace and RTL simulation waveform.

How to build PyTorch with Emulation Layer

Clone this repository:

git clone git@github.com:cornell-brg/hb-pytorch.git
Create a Python virtual environment:

python3 -m venv ./venvpytorch source ./venvpytorch/bin/activate
Install some dependencies:

pip install numpy pyyaml mkl mkl-include setuptools cmake cffi typing sklearn tqdm pytest ninja hypothesis
Init PyTorch third party dependencies:

git submodule update --init --recursive
Setup building environment variables:

source setupemulbuild_env.sh
Build PyTorch. This step can take up to 15 minutes:

python setup.py develop
Turn on emulation debug info

export HBEMUL_DEBUG=1
Setup emulated HB device size

export HBEMULTILEXDIM=16 export HBEMULTILEYDIM=8

Run Pytests

Goto hb-pytorch directory cd hb-pytorch/hammerblade/torch
Run pytest python pytest_runner.py

Important files and directories related to HammerBlade

files used to run pytest (adapted from Baseline)

hammerblade/fragments/
hammerblade/environment.mk
baseline-README.md
run-hb-pytest.sh (source this one to run pytest!)
hammerblade/torch/ #### HammerBlade device code
hammerblade/torch/kernel #### Pytest tests
hammerblade/torch/tests/ #### files that interacts with HammerBlade CUDALite runtime
c10/hammerblade/

How to implement a new kernel

Register the kernel for HammerBlade with PyTorch by editing aten/src/ATen/native/native_functions.yaml ```diff func: sigmoid(Tensor self) -> Tensor usec10dispatcher: full supportsnamedtensor: True variants: function, method dispatch: CPU: sigmoid CUDA: sigmoid
- HammerBlade: sigmoid MkldnnCPU: mkldnn_sigmoid ```
Add host code to aten/src/ATen/native/hammerblade/Sigmoid.cpp Add the dummiest host code possible, without calling the kernel.
Add tests to hammerblade/torch/tests/test_sigmoid.py
With Emulation Layer, make sure the code compiles and tests fail only because of incorrect results
Add kernel code to hammerblade/torch/kernel/kernel_sigmoid.cpp, which is also the dummiest code.
Change the host code to be more realistic: call the kernel and do nothing else.
Implement both the host and kernel code for real, assuming 1x1 tile group.
Make sure everything pass on Emulation layer, and write more tests. Then you are ready to create a PR!
Make sure your code works on COSIM.
Optimizations, like parallelization etc.

### Kernel Development Tips 1. Maintaining two clones, one for emulation and one for cosim (eg., hb-pytorch/ and hb-pytorch-cosim/), eases the burden of cosim evaluation. This requires two separate pytorch environments as well (eg., venv_pytorch and venv_pytorch_cosim).

Ideally, you would only ever need to run once, to debug an issue. Use gdb extensively with emulation. $ gdb python (gdb) b tensorlib_sigmoid (gdb) r -m pytest test_sigmoid.py Linking would become a bottleneck when running in tight loop. As a result, gdb could save a lot of time compared to printf debugging.
Sometimes new cpp files are not taken into account by cmake. Since kernel authors would only ever need to add new files either to aten/src/Aten/native or hammerblade/torch/ running following command might solve the failure: touch aten/src/ATen/CMakeLists.txt # New host code sources touch c10/hammerblade/CMakeLists.txt # New device code sources

Native Profiling Tools

Native Profiling tools provide ATen operator level info, including per operator execution time break down and unimplemented HB operator info.

To enable profiling tools, call torch.hammerblade.profiler.enable() To disable profiling tools, call torch.hammerblade.profiler.disable() To test if the profiling tools are currently running, call torch.hammerblade.profiler.is_in_ROI()

```python import torch

start of ROI

torch.hammerblade.profiler.enable() x = torch.randn(10) y = x + x

end of ROI

torch.hammerblade.profiler.disable() ```

To read profiling data, call torch.hammerblade.profiler.stats() By default, this returns a string of per ATen operator execution time (ExecTime) and unimplemented operators (Unimpl). One may also pass in a list using KeyArg key. Available options are ExecTime, ExecTime-Latex, ExecTime-Raw, Unimpl

```python import torch

torch.hammerblade.profiler.enable() x = torch.randn(10) torch.hammerblade.profiler.disable()

print(torch.hammerblade.profiler.stats(key=['ExecTime-Raw'], trimming=True)) ```

Here trimming is a "simulated time" correction mechanism.

HB Profiling

HB Kernel Call Logs

HB emulation can output a file with the list of kernel calls along with associated data in json format. This can be used as:

```python import torch import torch.hammerblade.kernel_logger as hblog

x = torch.rand(2, 2).hammerblade() y = torch.rand(2, 2).hammerblade()

Enables the log

hblog.enable()

print(x + y)

Disbales the log

hblog.disable()

This is excluded from the log

print(x - y)

Logs only the tensor add

print(hblog.json())

Clears above operations from the logger

hblog.clear()

hblog.enable() print(x * y) hblog.disable()

Logs only the tensor mul

print(hblog.json()) ```

HB Key Kernel Charting

Chart provides a way to log down the "execution chart" of key kernels in a workload.

To use Chart, one needs to register one or more ATen operator signatures.

```python import torch

M = torch.randn(2, 3) mat1 = torch.randn(2, 3) mat2 = torch.randn(3, 3)

reset chart

torch.hammerblade.profiler.chart.clear()

add signature

torch.hammerblade.profiler.chart.add("at::Tensor at::CPUType::{anonymous}::addmm(const at::Tensor&, const at::Tensor&, const at::Tensor&, c10::Scalar, c10::Scalar)")

turn on profiling

torch.hammerblade.profiler.enable()

run addmm

torch.addmm(M, mat1, mat2)

end profiling

torch.hammerblade.profiler.disable()

dump chart

print(torch.hammerblade.profiler.chart.json()) ```

The output should be json [ { "offload": false, "signature": "at::Tensor at::CPUType::{anonymous}::addmm(const at::Tensor&, const at::Tensor&, const at::Tensor&, c10::Scalar, c10::Scalar)" } ]

HB Key Kernel Redispatching

One may choose to redispatch a kernel that should run on CPU to HB with Route. Route takes in the JSON produced by Chart. To redispatch a kernel, one just needs to change "offload": false to "offload": true.

```python import torch

M = torch.randn(2, 3) mat1 = torch.randn(2, 3) mat2 = torch.randn(3, 3)

route = """[ { "offload": false, "signature": "at::Tensor at::CPUType::{anonymous}::addmm(const at::Tensor&, const at::Tensor&, const at::Tensor&, c10::Scalar, c10::Scalar)" }, { "offload": true, "signature": "at::Tensor at::CPUType::{anonymous}::add(const at::Tensor&, const at::Tensor&, c10::Scalar)" } ] """ data = json.loads(route) torch.hammerblade.profiler.route.setroutefrom_json(data)

torch.hammerblade.profiler.enable() torch.addmm(M, mat1, mat2)

this add should be redispatch to HB

torch.add(M, mat1) torch.hammerblade.profiler.disable() ```

Owner

Name: Batten Research Group
Login: cornell-brg
Kind: organization

Repositories: 46
Profile: https://github.com/cornell-brg

Computer Systems Laboratory, Cornell University

Citation (CITATION)

@inproceedings{paszke2017automatic,
  title={Automatic Differentiation in {PyTorch}},
  author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
  booktitle={NIPS Autodiff Workshop},
  year={2017}
}

GitHub Events

Total

Last Year

Dependencies

.circleci/docker/android/build.gradle maven

androidx.appcompat:appcompat 1.0.0 implementation
com.android.support:appcompat-v7 28.0.0 implementation
com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
com.facebook.soloader:nativeloader 0.8.0 implementation
com.google.code.findbugs:jsr305 3.0.1 implementation

android/libs/fbjni_local/build.gradle maven

com.google.code.findbugs:jsr305 3.0.1 compileOnly
com.facebook.soloader:nativeloader 0.8.0 implementation

android/pytorch_android/build.gradle maven

com.android.support:appcompat-v7 28.0.0 implementation
com.facebook.soloader:nativeloader 0.8.0 implementation

android/pytorch_android/host/build.gradle maven

com.google.code.findbugs:jsr305 3.0.1 compileOnly
com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
com.facebook.soloader:nativeloader 0.8.0 implementation
junit:junit 4.12 testImplementation

android/pytorch_android_torchvision/build.gradle maven

com.android.support:appcompat-v7 28.0.0 implementation

android/test_app/app/build.gradle maven

com.android.support:appcompat-v7 28.0.0 implementation

.github/workflows/lint.yml actions

actions/checkout v1 composite
actions/setup-python v1 composite
pytorch/add-annotations-github-action master composite

.circleci/docker/ubuntu/Dockerfile docker

ubuntu ${UBUNTU_VERSION} build

.circleci/docker/ubuntu-cuda/Dockerfile docker

nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build

hb-pytorch

Science Score: 18.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

PyTorch HammerBlade Port

How to build PyTorch to use COSIM

How to build PyTorch with Emulation Layer

Run Pytests

Important files and directories related to HammerBlade

files used to run pytest (adapted from Baseline)

How to implement a new kernel

Native Profiling Tools

start of ROI

end of ROI

HB Profiling

HB Kernel Call Logs

Enables the log

Disbales the log

This is excluded from the log

Logs only the tensor add

Clears above operations from the logger

Logs only the tensor mul

HB Key Kernel Charting

reset chart

add signature

turn on profiling

run addmm

end profiling

dump chart

HB Key Kernel Redispatching

this add should be redispatch to HB

Owner

Citation (CITATION)

GitHub Events

Total

Last Year

Dependencies