go-ml-benchmarks

⏱ Benchmarks of machine learning inference for Go

https://github.com/nikolaydubina/go-ml-benchmarks

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ieee.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary

Keywords

benchmarks cpp go grpc inference machine-learning python scikit-learn xgboost

Last synced: 6 months ago · JSON representation ·

Repository

⏱ Benchmarks of machine learning inference for Go

Basic Info

Host: GitHub
Owner: nikolaydubina
Language: Go
Default Branch: main
Homepage:
Size: 14.3 MB

Statistics

Stars: 32
Watchers: 2
Forks: 2
Open Issues: 6
Releases: 0

Topics

benchmarks cpp go grpc inference machine-learning python scikit-learn xgboost

Created about 5 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Funding Citation

Go Machine Learning Benchmarks

Given a raw data in a Go service, how quickly can I get machine learning inference for it?

Typically, Go is dealing with structured single sample data. Thus, we are focusing on tabular machine learning models only, such as popular XGBoost. It is common to run Go service in a backed form and on Linux platform, thus we do not consider other deployment options. In the work bellow, we compare typical implementations on how this inference task can be performed.

diagram

host: AWS EC2 t2.xlarge shared os: Ubuntu 20.04 LTS goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz BenchmarkXGB_Go_GoFeatureProcessing_GoLeaves_noalloc 491 ns/op BenchmarkXGB_Go_GoFeatureProcessing_GoLeaves 575 ns/op BenchmarkXGB_Go_GoFeatureProcessing_UDS_RawBytes_Python_XGB 243056 ns/op BenchmarkXGB_CGo_GoFeatureProcessing_XGB 244941 ns/op BenchmarkXGB_Go_GoFeatureProcessing_UDS_gRPC_CPP_XGB 367433 ns/op BenchmarkXGB_Go_GoFeatureProcessing_UDS_gRPC_Python_XGB 785147 ns/op BenchmarkXGB_Go_UDS_gRPC_Python_sklearn_XGB 21699830 ns/op BenchmarkXGB_Go_HTTP_JSON_Python_Gunicorn_Flask_sklearn_XGB 21935237 ns/op

Abbreviations and Frameworks

Transport: Unix Domain Sockets (UDS), TCP, HTTP
Encoding: JSON, gRPC, raw bytes with fixed number of float64 IEEE 754
Preprocessing: go-featureprocessing, sklearn
Model: XGBoost, Leaves (Leaves is XGBoost in native Go)
Web Servers: for Python used Gunicorn + Flask

Dataset and Model

We are using classic Titanic dataset. It contains numerical and categorical features, which makes it a representative of typical case. Data and notebooks to train model and preprocessor is available in /data and /notebooks.

Some numbers for reference

How fast do you need to get?

200ps - 4.6GHz single cycle time 1ns - L1 cache latency 10ns - L2/L3 cache SRAM latency 20ns - DDR4 CAS, first byte from memory latency 20ns - C++ raw hardcoded structs access 80ns - C++ FlatBuffers decode/traverse/dealloc 150ns - PCIe bus latency 171ns - cgo call boundary, 2015 200ns - HFT FPGA 475ns - 2020 MLPerf winner recommendation inference time per sample ----------> 500ns - go-featureprocessing + leaves 800ns - Go Protocol Buffers Marshal 837ns - Go json-iterator/go json unmarshal 1µs - Go protocol buffers unmarshal 3µs - Go JSON Marshal 7µs - Go JSON Unmarshal 10µs - PCIe/NVLink startup time 17µs - Python JSON encode/decode times 30µs - UNIX domain socket; eventfd; fifo pipes 100µs - Redis intrinsic latency; KDB+; HFT direct market access 200µs - 1GB/s network air latency; Go garbage collector pauses interval 2018 230µs - San Francisco to San Jose at speed of light 500µs - NGINX/Kong added latency 10ms - AWS DynamoDB; WIFI6 "air" latency 15ms - AWS Sagemaker latency; "Flash Boys" 300million USD HFT drama 30ms - 5G "air" latency 36ms - San Francisco to Hong-Kong at speed of light 100ms - typical roundtrip from mobile to backend 200ms - AWS RDS MySQL/PostgreSQL; AWS Aurora 10s - AWS Cloudfront 1MB transfer time

Profiling and Analysis

[491ns/575ns] Leaves — we see that most of time taken in Leaves Random Forest code. Leaves code does not have mallocs. Inplace preprocessing does not have mallocs, with non-inplace version malloc happen and takes and takes half of time of preprocessing. leaves

[243µs] UDS Raw bytes Python — we see that Python takes much longer time than preprocessing in Go, however Go is at least visible on the chart. We also note that Python spends most of the time in libgomp.so call, this library is in GNU OpenMP written in C which does parallel operations.

uds

[244µs] CGo version — similarly, we see that call to libgomp.so is being done. It is much smaller compare to rest of o CGo code, as compared to Python version above. Over overall results are not better then? Likely this is due to performance degradation from Go to CGo. We also note that malloc is done.

cgo

[367µs] gRPC over UDS to C++ — we see that Go code is around 50% of C++ version. In C++ 50% of time spend on gRPC code. Lastly, C++ also uses libgomp.so. We don't see on this chart, but likely Go code also spends considerable time on gRPC code.

cgo

[785µs] gRPC over UDS to Python wihout sklearn — we see that Go code is visible in the chart. Python spends only portion on time in libgomp.so.

cgo

[21ms] gRPC over UDS to Python with sklearn — we see that Go code (main.test) is no longer visible the chart. Python spends only small fraction of time on libgomp.so.

cgo

[22ms] REST service version with sklearn — similarly, we see that Go code (main.test) is no longer visible in the chart. Python spends more time in libgomp.so as compared to Python + gRPC + skelarn version, however it is not clear why results are worse.

cgo

Future work

[ ] go-featureprocessing - gRPCFlatBuffers - C++ - XGB
[ ] batch mode
[ ] UDS - gRPC - C++ - ONNX (sklearn + XGBoost)
[ ] UDS - gRPC - Python - ONNX (sklearn + XGBoost)
[ ] cgo ONNX (sklearn + XGBoost) (examples: 1)
[ ] native Go ONNX (sklearn + XGBoost) — no official support, https://github.com/owulveryck/onnx-go is not complete
[ ] text
[ ] images
[ ] videos

Reference

Go GC updates, 2018
cgo performance, GopherCon'18
cgo performance, CockroachDB
cgo call to CPython, Datadog
cgo call to CPython, EuroPython'19
HFT latency
HFT FPGA latency
HFT FPGA 200 nanoseconds, 2018
Google TPU latency
PCIe latency
"Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect", 2019
"Evaluation of Inter-Process Communication Mechanisms"
UNIX local IPC latencies
Cache and DRAM latency
MLPerf benchmarks
MLPerf benchmarks results, 2020
Redis latency
Huawei WIFI6 latency
Verizon 5G latency
NGINX added latency
AWS Sagemaker latency
AWS Aurora latency
AWS Cloudfront transfer rates

Owner

Name: Nikolay Dubina
Login: nikolaydubina
Kind: user

Repositories: 92
Profile: https://github.com/nikolaydubina

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you reference this library in publication, please cite it as below.
title: Benchmarking machine learning inference in Go
abstract: Benchmarking machine learning inference in Go
authors:
- family-names: Dubina
  given-names: Nikolay
version: 2.1
date-released: 2020-12-21
license: MIT
repository-code: https://github.com/nikolaydubina/go-ml-benchmarks
url: https://github.com/nikolaydubina/go-ml-benchmarks

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Committers

Last synced: over 1 year ago

All Time

Total Commits: 100
Total Committers: 1
Avg Commits per committer: 100.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Nikolay	n**b@g**m	100

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

nikolaydubina (3)

Pull Request Authors

dependabot[bot] (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (2)

Packages

Total packages: 2
Total downloads: unknown

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 0
(may contain duplicates)
Total versions: 0

proxy.golang.org: github.com/nikolaydubina/go-ml-benchmarks/cgo-version

Homepage: https://github.com/nikolaydubina/go-ml-benchmarks
Documentation: https://pkg.go.dev/github.com/nikolaydubina/go-ml-benchmarks/cgo-version#section-documentation

Versions: 0
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 8.9%

Average: 9.5%

Dependent repos count: 10.0%

Last synced: 6 months ago

proxy.golang.org: github.com/nikolaydubina/go-ml-benchmarks/go-client

Homepage: https://github.com/nikolaydubina/go-ml-benchmarks
Documentation: https://pkg.go.dev/github.com/nikolaydubina/go-ml-benchmarks/go-client#section-documentation

Versions: 0
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 8.9%

Average: 9.5%

Dependent repos count: 10.1%

Last synced: 6 months ago

Dependencies

cgo-version/go.mod go

github.com/dmitryikh/leaves v0.0.0-20210121075304-82771f84c313
github.com/golang/protobuf v1.4.3
github.com/google/gofuzz v1.2.0
github.com/nikolaydubina/go-featureprocessing v1.0.1
github.com/stretchr/testify v1.7.0
google.golang.org/grpc v1.35.0
google.golang.org/protobuf v1.25.0

cgo-version/go.sum go

113 dependencies

go-client/go.mod go

github.com/dmitryikh/leaves v0.0.0-20210121075304-82771f84c313
github.com/golang/protobuf v1.4.3
github.com/google/gofuzz v1.2.0
github.com/nikolaydubina/go-featureprocessing v1.0.1
github.com/stretchr/testify v1.7.0
google.golang.org/grpc v1.35.0
google.golang.org/protobuf v1.25.0

go-client/go.sum go

113 dependencies

bench-gofeatureprocessing-uds-raw-python-xgb/requirements.txt pypi

numpy *
scikit-learn ==0.24.0
xgboost >=1.3.3

bench-http-json-python-gunicorn-flask-sklearn-xgb/requirements.txt pypi

flask *
gunicorn *
numpy *
pandas *
scikit-learn ==0.24.0
xgboost >=1.3.3

bench-uds-grpc-python-xgb/requirements.txt pypi

grpcio *
grpcio-tools *
numpy *
sklearn *
xgboost >=1.3.3

go-ml-benchmarks

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Go Machine Learning Benchmarks

Abbreviations and Frameworks

Dataset and Model

Some numbers for reference

Profiling and Analysis

Future work

Reference

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/nikolaydubina/go-ml-benchmarks/cgo-version

Rankings

proxy.golang.org: github.com/nikolaydubina/go-ml-benchmarks/go-client

Rankings

Dependencies