https://github.com/amd/convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets

https://github.com/amd/convnet-benchmarks

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Easy benchmarking of all public open-source implementations of convnets

Basic Info
  • Host: GitHub
  • Owner: amd
  • Language: Python
  • Default Branch: master
  • Size: 590 KB
Statistics
  • Stars: 3
  • Watchers: 9
  • Forks: 5
  • Open Issues: 0
  • Releases: 0
Fork of soumith/convnet-benchmarks
Created over 10 years ago · Last pushed over 10 years ago
Metadata Files
Readme

README.md

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

Imagenet Winners Benchmarking

I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:-----------------------------------------------------------------------------------------------------------:| ----------:| ------------:| -------------:| | Nervana-fp16 | ConvLayer | 92 | 29 | 62 | | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 96 | 30 | 66 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 96 | 32 | 64 | | Nervana-fp32 | ConvLayer | 101 | 32 | 69 | | fbfft | fbnn.SpatialConvolution | 104 | 31 | 72 | | cudaconvnet2* | ConvLayer | 177 | 42 | 135 | | CuDNN[R2] * | cudnn.SpatialConvolution | 231 | 70 | 161 | | Caffe (native) | ConvolutionLayer | 324 | 121 | 203 | | Torch-7 (native) | SpatialConvolutionMM | 342 | 132 | 210 | | CL-nn (Torch) | SpatialConvolutionMM | 963 | 388 | 574 | | Caffe-CLGreenTea | ConvolutionLayer | 1442 | 210 | 1232 |

Overfeat [fast] - Input 128x3x231x231

| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 313 | 107 | 206 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 326 | 113 | 213 | | fbfft | SpatialConvolutionCuFFT | 342 | 114 | 227 | | Nervana-fp16 | ConvLayer | 355 | 112 | 242 | | Nervana-fp32 | ConvLayer | 398 | 124 | 273 | | cudaconvnet2* | ConvLayer | 723 | 176 | 547 | | CuDNN[R2] * | cudnn.SpatialConvolution | 810 | 234 | 576 | | Caffe | ConvolutionLayer | 823 | 355 | 468 | | Torch-7 (native) | SpatialConvolutionMM | 878 | 379 | 499 | | CL-nn (Torch) | SpatialConvolutionMM | 963 | 388 | 574 | | Caffe-CLGreenTea | ConvolutionLayer | 2857 | 616 | 2240 |

OxfordNet [Model-A] - Input 64x3x224x224

| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | Nervana-fp16 | ConvLayer | 529 | 167 | 362 | | Nervana-fp32 | ConvLayer | 590 | 180 | 410 | | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 615 | 179 | 436 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 615 | 196 | 418 | | fbfft | SpatialConvolutionCuFFT | 1092 | 355 | 737 | | cudaconvnet2* | ConvLayer | 1229 | 408 | 821 | | CuDNN[R2] * | cudnn.SpatialConvolution | 1099 | 342 | 757 | | Caffe | ConvolutionLayer | 1068 | 323 | 745 | | Torch-7 (native) | SpatialConvolutionMM | 1105 | 350 | 755 | | CL-nn (Torch) | SpatialConvolutionMM | 3437 | 875 | 2562 | | Caffe-CLGreenTea | ConvolutionLayer | 5620 | 988 | 4632 |

GoogleNet V1 - Input 128x3x224x224

| Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | Nervana-fp16 | ConvLayer | 283 | 85 | 197 | | Nervana-fp32 | ConvLayer | 322 | 90 | 232 | | CuDNN[R3]-fp32 | cudnn.SpatialConvolution | 431 | 117 | 313 | | CuDNN[R3]-fp16 | cudnn.SpatialConvolution | 501 | 109 | 392 | | Caffe | ConvolutionLayer | 1935 | 786 | 1148 | | CL-nn (Torch) | SpatialConvolutionMM | 7016 | 3027 | 3988 | | Caffe-CLGreenTea | ConvolutionLayer | 9462 | 746 | 8716 |

Layer-wise Benchmarking (Last Updated April 2015)

Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)

| Original Library | Class/Function Benchmarked | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | fbfft | SpatialConvolutionCuFFT | 256 | 101 | 155 | | cuda-convnet2 * | ConvLayer | 977 | 201 | 776 | | cuda-convnet** | pylearn2.cuda_convnet | 1077 | 312 | 765 | | CuDNN R2 * | cudnn.SpatialConvolution | 1019 | 269 | 750 | | Theano | CorrMM | 1225 | 407 | 818 | | Caffe | ConvolutionLayer | 1231 | 396 | 835 | | Torch-7 | SpatialConvolutionMM | 1265 | 418 | 877 | | DeepCL | ConvolutionLayer | 6280 | 2648 | 3632 | | cherry-picking**** | best per layer | 235 | 79 | 155 |

This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.

| Original Library | Class/Function Benchmarked | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | Theano (experimental)*** | conv2d_fft | 1178 | 304 | 874 | | Torch-7 | nn.SpatialConvolutionBHWD | 1892 | 581 | 1311 | | ccv | ccvconvnetlayer | 809+bw | 809 | | | Theano (legacy) | conv2d | 70774 | 3833 | 66941 |

  • * indicates that the library was tested with Torch bindings of the specific kernels.
  • ** indicates that the library was tested with Pylearn2 bindings.
  • *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
  • **** The last row shows results obtainable when choosing the best-performing library for each layer.
  • L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
  • L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
  • L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
  • L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
  • L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
  • The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)
Breakdown
forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

| Original Library | Class/Function Benchmarked | L1 | L2 | L3 | L4 | L5 | Total | |:------------------------:|:---------------------------------------------------------------------------------------------------------------------------------:| ---:| ----:| ---:| --:| ---:| -----:| | fbfft | SpatialConvolutionCuFFT | 57 | 27 | 6 | 2 | 9 | 101 | | cuda-convnet2 * | ConvLayer | 36 | 113 | 40 | 4 | 8 | 201 | | cuda-convnet** | pylearn2.cuda_convnet | 38 | 183 | 68 | 7 | 16 | 312 | | CuDNN R2 |cudnn.SpatialConvolution | 56 | 143 | 53 | 6 | 11 | 269 | | Theano | CorrMM | 91 | 143 | 121 | 24 | 28 | 407 | | Caffe | ConvolutionLayer<Dtype> | 93 | 136 | 116 | 24 | 27 | 396 | | Torch-7 |nn.SpatialConvolutionMM | 94 | 149 | 123 | 24 | 28 | 418 | | DeepCL | ConvolutionLayer | 738| 1241 | 518| 47 |104 |2648 | | cherry-picking**** | best per layer |36|27 | 6| 2| 8| 79 |

backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

| Original Library | Class/Function Benchmarked | L1 | L2 | L3 | L4 | L5| Total | |:------------------------:|:---------------------------------------------------------------------------------------------------------------------------------:| ---:| ---:| ---:| --:| --:| -----:| | fbfft | SpatialConvolutionCuFFT | 76 | 45 | 12 | 4 | 18 | 155 | | cuda-convnet2 * | ConvLayer | 103 | 467 | 162 | 15 | 29 | 776 | | cuda-convnet** | pylearn2.cuda_convnet | 136 | 433 | 147 | 15 | 34 | 765 | | CuDNN R2 |cudnn.SpatialConvolution | 139 | 401 | 159 | 19 | 32 | 750 | | Theano | CorrMM | 179 | 405 | 174 | 29 | 31 | 818 | | Caffe | ConvolutionLayer<Dtype> | 200 | 405 | 172 | 28 | 30 | 835 | | Torch-7 |nn.SpatialConvolutionMM | 206 | 432 | 178 | 29 | 32 | 877 | | DeepCL | ConvolutionLayer | 484 |2144 | 747 | 59 |198 | 3632 | | cherry-picking**** | best per layer | 76| 45| 12|4 |18|155 |

Owner

  • Name: AMD
  • Login: amd
  • Kind: organization
  • Email: dl.DevSecOps-Github-Admin@amd.com

GitHub Events

Total
  • Fork event: 3
Last Year
  • Fork event: 3