https://github.com/amd/convnet-benchmarks
Easy benchmarking of all public open-source implementations of convnets
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.0%) to scientific vocabulary
Repository
Easy benchmarking of all public open-source implementations of convnets
Basic Info
- Host: GitHub
- Owner: amd
- Language: Python
- Default Branch: master
- Size: 590 KB
Statistics
- Stars: 3
- Watchers: 9
- Forks: 5
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
convnet-benchmarks
Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.
Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64
Imagenet Winners Benchmarking
I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.
AlexNet (One Weird Trick paper) - Input 128x3x224x224
| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:-----------------------------------------------------------------------------------------------------------:| ----------:| ------------:| -------------:| | Nervana-fp16 | ConvLayer | 92 | 29 | 62 | | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 96 | 30 | 66 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 96 | 32 | 64 | | Nervana-fp32 | ConvLayer | 101 | 32 | 69 | | fbfft | fbnn.SpatialConvolution | 104 | 31 | 72 | | cudaconvnet2* | ConvLayer | 177 | 42 | 135 | | CuDNN[R2] * | cudnn.SpatialConvolution | 231 | 70 | 161 | | Caffe (native) | ConvolutionLayer | 324 | 121 | 203 | | Torch-7 (native) | SpatialConvolutionMM | 342 | 132 | 210 | | CL-nn (Torch) | SpatialConvolutionMM | 963 | 388 | 574 | | Caffe-CLGreenTea | ConvolutionLayer | 1442 | 210 | 1232 |
Overfeat [fast] - Input 128x3x231x231
| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 313 | 107 | 206 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 326 | 113 | 213 | | fbfft | SpatialConvolutionCuFFT | 342 | 114 | 227 | | Nervana-fp16 | ConvLayer | 355 | 112 | 242 | | Nervana-fp32 | ConvLayer | 398 | 124 | 273 | | cudaconvnet2* | ConvLayer | 723 | 176 | 547 | | CuDNN[R2] * | cudnn.SpatialConvolution | 810 | 234 | 576 | | Caffe | ConvolutionLayer | 823 | 355 | 468 | | Torch-7 (native) | SpatialConvolutionMM | 878 | 379 | 499 | | CL-nn (Torch) | SpatialConvolutionMM | 963 | 388 | 574 | | Caffe-CLGreenTea | ConvolutionLayer | 2857 | 616 | 2240 |
OxfordNet [Model-A] - Input 64x3x224x224
| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | Nervana-fp16 | ConvLayer | 529 | 167 | 362 | | Nervana-fp32 | ConvLayer | 590 | 180 | 410 | | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 615 | 179 | 436 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 615 | 196 | 418 | | fbfft | SpatialConvolutionCuFFT | 1092 | 355 | 737 | | cudaconvnet2* | ConvLayer | 1229 | 408 | 821 | | CuDNN[R2] * | cudnn.SpatialConvolution | 1099 | 342 | 757 | | Caffe | ConvolutionLayer | 1068 | 323 | 745 | | Torch-7 (native) | SpatialConvolutionMM | 1105 | 350 | 755 | | CL-nn (Torch) | SpatialConvolutionMM | 3437 | 875 | 2562 | | Caffe-CLGreenTea | ConvolutionLayer | 5620 | 988 | 4632 |
GoogleNet V1 - Input 128x3x224x224
| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | Nervana-fp16 | ConvLayer | 283 | 85 | 197 | | Nervana-fp32 | ConvLayer | 322 | 90 | 232 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 431 | 117 | 313 | | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 501 | 109 | 392 | | Caffe | ConvolutionLayer | 1935 | 786 | 1148 | | CL-nn (Torch) | SpatialConvolutionMM | 7016 | 3027 | 3988 | | Caffe-CLGreenTea | ConvolutionLayer | 9462 | 746 | 8716 |
Layer-wise Benchmarking (Last Updated April 2015)
Spatial Convolution layer (3D input 3D output, densely connected)
forward + backprop (wrt input and weights)
| Original Library | Class/Function Benchmarked | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | fbfft | SpatialConvolutionCuFFT | 256 | 101 | 155 | | cuda-convnet2 * | ConvLayer | 977 | 201 | 776 | | cuda-convnet** | pylearn2.cuda_convnet | 1077 | 312 | 765 | | CuDNN R2 * | cudnn.SpatialConvolution | 1019 | 269 | 750 | | Theano | CorrMM | 1225 | 407 | 818 | | Caffe | ConvolutionLayer | 1231 | 396 | 835 | | Torch-7 | SpatialConvolutionMM | 1265 | 418 | 877 | | DeepCL | ConvolutionLayer | 6280 | 2648 | 3632 | | cherry-picking**** | best per layer | 235 | 79 | 155 |
This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.
| Original Library | Class/Function Benchmarked | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | Theano (experimental)*** | conv2d_fft | 1178 | 304 | 874 | | Torch-7 | nn.SpatialConvolutionBHWD | 1892 | 581 | 1311 | | ccv | ccvconvnetlayer | 809+bw | 809 | | | Theano (legacy) | conv2d | 70774 | 3833 | 66941 |
- * indicates that the library was tested with Torch bindings of the specific kernels.
- ** indicates that the library was tested with Pylearn2 bindings.
- *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
- **** The last row shows results obtainable when choosing the best-performing library for each layer.
- L1 - Input:
128x128Batch-size128, Feature maps:3->96, Kernel Size:11x11, Stride:1x1 - L2 - Input:
64x64Batch-size128, Feature maps:64->128, Kernel Size:9x9, Stride:1x1 - L3 - Input:
32x32Batch-size128, Feature maps:128->128, Kernel Size:9x9, Stride:1x1 - L4 - Input:
16x16Batch-size128, Feature maps:128->128, Kernel Size:7x7, Stride:1x1 - L5 - Input:
13x13Batch-size128, Feature maps:384->384, Kernel Size:3x3, Stride:1x1 - The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)
Breakdown
forward
Columns L1, L2, L3, L4, L5, Total are times in milliseconds
| Original Library | Class/Function Benchmarked | L1 | L2 | L3 | L4 | L5 | Total | |:------------------------:|:---------------------------------------------------------------------------------------------------------------------------------:| ---:| ----:| ---:| --:| ---:| -----:| | fbfft | SpatialConvolutionCuFFT | 57 | 27 | 6 | 2 | 9 | 101 | | cuda-convnet2 * | ConvLayer | 36 | 113 | 40 | 4 | 8 | 201 | | cuda-convnet** | pylearn2.cuda_convnet | 38 | 183 | 68 | 7 | 16 | 312 | | CuDNN R2 |cudnn.SpatialConvolution | 56 | 143 | 53 | 6 | 11 | 269 | | Theano | CorrMM | 91 | 143 | 121 | 24 | 28 | 407 | | Caffe | ConvolutionLayer<Dtype> | 93 | 136 | 116 | 24 | 27 | 396 | | Torch-7 |nn.SpatialConvolutionMM | 94 | 149 | 123 | 24 | 28 | 418 | | DeepCL | ConvolutionLayer | 738| 1241 | 518| 47 |104 |2648 | | cherry-picking**** | best per layer |36|27 | 6| 2| 8| 79 |
backward (gradInput + gradWeight)
Columns L1, L2, L3, L4, L5, Total are times in milliseconds
| Original Library | Class/Function Benchmarked | L1 | L2 | L3 | L4 | L5| Total | |:------------------------:|:---------------------------------------------------------------------------------------------------------------------------------:| ---:| ---:| ---:| --:| --:| -----:| | fbfft | SpatialConvolutionCuFFT | 76 | 45 | 12 | 4 | 18 | 155 | | cuda-convnet2 * | ConvLayer | 103 | 467 | 162 | 15 | 29 | 776 | | cuda-convnet** | pylearn2.cuda_convnet | 136 | 433 | 147 | 15 | 34 | 765 | | CuDNN R2 |cudnn.SpatialConvolution | 139 | 401 | 159 | 19 | 32 | 750 | | Theano | CorrMM | 179 | 405 | 174 | 29 | 31 | 818 | | Caffe | ConvolutionLayer<Dtype> | 200 | 405 | 172 | 28 | 30 | 835 | | Torch-7 |nn.SpatialConvolutionMM | 206 | 432 | 178 | 29 | 32 | 877 | | DeepCL | ConvolutionLayer | 484 |2144 | 747 | 59 |198 | 3632 | | cherry-picking**** | best per layer | 76| 45| 12|4 |18|155 |
Owner
- Name: AMD
- Login: amd
- Kind: organization
- Email: dl.DevSecOps-Github-Admin@amd.com
- Website: http://www.amd.com
- Repositories: 56
- Profile: https://github.com/amd
GitHub Events
Total
- Fork event: 3
Last Year
- Fork event: 3