daceml

A Data-Centric Compiler for Machine Learning

https://github.com/spcl/daceml

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 7 committers (14.3%) from academic institutions
✓
Institutional organization owner
Organization spcl has institutional domain (spcl.inf.ethz.ch)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary

Keywords

compiler cuda deep-learning fpga high-performance-computing machine-learning pytorch

Last synced: 6 months ago · JSON representation ·

Repository

A Data-Centric Compiler for Machine Learning

Basic Info

Host: GitHub
Owner: spcl
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage: https://daceml.readthedocs.io
Size: 4.39 MB

Statistics

Stars: 84
Watchers: 8
Forks: 14
Open Issues: 22
Releases: 0

Topics

compiler cuda deep-learning fpga high-performance-computing machine-learning pytorch

Created over 5 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

DaCeML

Machine learning powered by data-centric parallel programming.

This project adds PyTorch and ONNX model loading support to DaCe, and adds ONNX operator library nodes to the SDFG IR. With access to DaCe's rich transformation library and productive development environment, DaCeML can generate highly efficient implementations that can be executed on CPUs, GPUs and FPGAs.

The white box approach allows us to see computation at all levels of granularity: from coarse operators, to kernel implementations, and even down to every scalar operation and memory access.

DaCeML can be used to achieve state of the art GPU performance on highly contested layers, such as BERT-fp16 or EfficientNet-B0. For more details, and other performance results, please see our ICS'22 publication.

IR visual example

Read more: Library Nodes

Integration

Converting PyTorch modules is as easy as adding a decorator... ```python @dacemodule class Model(nn.Module): def _init(self, kernel_size): super().init_() self.conv1 = nn.Conv2d(1, 4, kernelsize) self.conv2 = nn.Conv2d(4, 4, kernel_size)

def forward(self, x):
    x = F.relu(self.conv1(x))
    return F.relu(self.conv2(x))

... and ONNX models can also be directly imported using the model loader:python model = onnx.load(modelpath) dacemodel = ONNXModel("mymodel", model) ```

Read more: PyTorch Integration and Importing ONNX models.

Training

DaCeML modules support training using a symbolic automatic differentiation engine: ```python import torch.nn.functional as F from daceml.torch import dace_module

@dacemodule(backward=True) class Net(nn.Module): def _init(self): super().init__() self.fc1 = nn.Linear(784, 120) self.fc2 = nn.Linear(120, 32) self.fc3 = nn.Linear(32, 10) self.ls = nn.LogSoftmax(dim=-1)

def forward(self, x):
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    x = self.ls(x)
    return x

x = torch.randn(8, 784) y = torch.tensor([0, 1, 2, 3, 4, 5, 6, 7], dtype=torch.long)

model = Net()

criterion = nn.NLLLoss() prediction = model(x) loss = criterion(prediction, y)

gradients can flow through model!

loss.backward() ```

Read more: Automatic Differentiation.

Library Nodes

DaCeML extends the DaCe IR with machine learning operators. The added nodes perform computation as specificed by the ONNX specification. DaCeML leverages high performance kernels from ONNXRuntime, as well as pure SDFG implementations that are introspectable and transformable with data centric transformations.

The nodes can be used from the DaCe python frontend. ```python import dace import daceml.onnx as donnx import numpy as np

@dace.program def convprogram(Xarr: dace.float32[5, 3, 10, 10], Warr: dace.float32[16, 3, 3, 3]): output = dace.definelocal([5, 16, 4, 4], dace.float32) donnx.ONNXConv(X=Xarr, W=Warr, Y=output, strides=[2, 2]) return output

X = np.random.rand(5, 3, 10, 10).astype(np.float32) W = np.random.rand(16, 3, 3, 3).astype(np.float32)

result = convprogram(Xarr=X, W_arr=W) ```

Setup

The easiest way to get started is to run

make install

This will setup DaCeML in a newly created virtual environment.

For more detailed instructions, including ONNXRuntime installation, see Installation.

Development

Common development tasks are automated using the Makefile. See Development for more information.

Citing

If you use DaCeML, please cite us: bibtex @inproceedings{daceml, author = {Rausch, Oliver and Ben-Nun, Tal and Dryden, Nikoli and Ivanov, Andrei and Li, Shigang and Hoefler, Torsten}, title = {{DaCeML}: A Data-Centric Optimization Framework for Machine Learning}, year = {2022}, booktitle = {Proceedings of the 36th ACM International Conference on Supercomputing}, series = {ICS '22} }

Owner

Name: SPCL
Login: spcl
Kind: organization

Website: https://spcl.inf.ethz.ch/
Repositories: 100
Profile: https://github.com/spcl

Citation (CITATION.cff)

cff-version: 1.2.0
title: "DaCeML - A Data-Centric Optimization Framework for Machine Learning"
message: "Please cite as" 
authors:
  - family-names: Rausch
    given-names: Oliver
  - family-names: Ben-Nun
    given-names: Tal
  - family-names: Dryden
    given-names: Nikoli
  - family-names: Ivanov
    given-names: Andrei
  - family-names: Li
    given-names: Shigang
  - family-names: Hoefler
    given-names: Torsten
  - family-names: De Matteis
    given-names: Tiziano
  - family-names: Burger
    given-names: Manuel
preferred-citation:
  title: "DaCeML: A Data-Centric Optimization Framework for Machine Learning"
  doi: "10.1145/3524059.3532364"
  year: "2022"
  type: conference-paper
  collection-title: "Proceedings of the 36th ACM International Conference on Supercomputing"
  conference: 
    name: "ICS '22"
  authors:
    - family-names: Rausch
      given-names: Oliver
    - family-names: Ben-Nun
      given-names: Tal
    - family-names: Dryden
      given-names: Nikoli
    - family-names: Ivanov
      given-names: Andrei
    - family-names: Li
      given-names: Shigang
    - family-names: Hoefler
      given-names: Torsten

GitHub Events

Total

Watch event: 3
Fork event: 1

Last Year

Watch event: 3
Fork event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 655
Total Committers: 7
Avg Commits per committer: 93.571
Development Distribution Score (DDS): 0.179

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Oliver Rausch	o**9@g**m	538
Tal Ben-Nun	t**n@g**m	55
Manuel Burger	b**m@s**h	33
Shigang Li	s**s@g**m	19
am-ivanov	a****v	8
Tiziano De Matteis	5****s	1
Julia Bazińska	j**a@b**m	1

Committer Domains (Top 20 + Academic)

bazinski.com: 1 student.ethz.ch: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 33
Total pull requests: 99
Average time to close issues: 8 months
Average time to close pull requests: 27 days
Total issue authors: 7
Total pull request authors: 7
Average comments per issue: 0.45
Average comments per pull request: 1.19
Merged pull requests: 71
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

orausch (17)
tbennun (10)
rohanrayan (2)
Shbinging (1)
HeyDavid633 (1)
vselhakim1337 (1)
ruck314 (1)

Pull Request Authors

orausch (75)
tbennun (11)
manuelburger (7)
lamyiowce (2)
Shigangli (1)
TizianoDeMatteis (1)
and-ivanov (1)

Top Labels

Issue Labels

ops (5) enhancement (2) documentation (2)

Pull Request Labels

Dependencies

doc/requirements.txt pypi

Sphinx ==3.2.1
matplotlib ==3.4.2
sphinx-autodoc-typehints ==1.11.1
sphinx-gallery ==0.9.0
sphinx-rtd-theme ==0.5.2

setup.py pypi

dace *
dataclasses *
onnx *
onnx-simplifier *
protobuf *
torch *

.github/workflows/cpu-ci.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
actions/upload-artifact v2 composite

.github/workflows/docs-no-trigger.yml actions

actions/checkout v2 composite
actions/upload-artifact v2 composite

.github/workflows/docs.yml actions

actions/checkout v2 composite
actions/upload-artifact v2 composite

.github/workflows/fpga-ci.yml actions

actions/checkout v2 composite

.github/workflows/gpu-ci.yml actions

actions/checkout v2 composite

daceml

Science Score: 72.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

DaCeML

Integration

Training

gradients can flow through model!

Library Nodes

Setup

Development

Citing

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies