mish
Official Repository for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Official Repository for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]
Basic Info
- Host: GitHub
- Owner: digantamisra98
- License: mit
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://www.bmvc2020-conference.com/assets/papers/0928.pdf
- Size: 186 MB
Statistics
- Stars: 1,302
- Watchers: 28
- Forks: 129
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Mish: Self Regularized
Non-Monotonic Activation Function
BMVC 2020 (Official Paper)
Notes: (Click to expand)
* A considerably faster version based on CUDA can be found here - [Mish CUDA](https://github.com/thomasbrandon/mish-cuda) (All credits to Thomas Brandon for the same) * Memory Efficient Experimental version of Mish can be found [here](https://github.com/rwightman/gen-efficientnet-pytorch/blob/8795d3298d51ea5d993ab85a222dacffa8211f56/geffnet/activations/activations_autofn.py#L41) * Faster variants for Mish and H-Mish by [Yashas Samaga](https://github.com/YashasSamaga) can be found here - [ConvolutionBuildingBlocks](https://github.com/YashasSamaga/ConvolutionBuildingBlocks) * Alternative (experimental improved) variant of H-Mish developed by [Pll Haraldsson](https://github.com/PallHaraldsson) can be found here - [H-Mish](https://github.com/PallHaraldsson/H-Mish/blob/master/README.md) (Available in Julia) * Variance based initialization method for Mish (experimental) by [Federico Andres Lois](https://twitter.com/federicolois) can be found here - [Mish_init](https://gist.github.com/redknightlois/b5d36fd2ae306cb8b3484c1e3bcce253)Changelogs/ Updates: (Click to expand)
* [07/17] Mish added to [OpenVino](https://github.com/openvinotoolkit/openvino) - [Open-1187](https://github.com/openvinotoolkit/openvino/pull/1187), [Merged-1125](https://github.com/openvinotoolkit/openvino/pull/1125) * [07/17] Mish added to [BetaML.jl](https://github.com/sylvaticus/BetaML.jl) * [07/17] Loss Landscape exploration progress in collaboration with [Javier Ideami](https://ideami.com/ideami/) and [Ajay Uppili Arasanipalai](https://github.com/iyaja)* [07/17] Poster accepted for presentation at [DLRLSS](https://dlrlsummerschool.ca/) hosted by [MILA](https://mila.quebec/en/), [CIFAR](https://www.cifar.ca/), [Vector Institute](https://vectorinstitute.ai/) and [AMII](https://www.amii.ca/) * [07/20] Mish added to [Google's AutoML](https://github.com/google/automl) - [502](https://github.com/google/automl/commit/28cf011689dacda90fe1ae6da59b92c0d3f2c9d9) * [07/27] Mish paper accepted to [31st British Machine Vision Conference (BMVC), 2020](https://bmvc2020.github.io/index.html). ArXiv version to be updated soon. * [08/13] New updated PyTorch benchmarks and pretrained models available on [PyTorch Benchmarks](https://github.com/digantamisra98/Mish/tree/master/PyTorch%20Benchmarks). * [08/14] New updated [Arxiv](https://arxiv.org/abs/1908.08681v3) version of the paper is out. * [08/18] Mish added to [Sony Nnabla](https://github.com/sony/nnabla) - [Merged-700](https://github.com/sony/nnabla/pull/700) * [09/02] Mish added to [TensorFlow Swift APIs](https://github.com/tensorflow/swift-apis) - [Merged - 1068](https://github.com/tensorflow/swift-apis/commit/c1d822535c458a925087298289aa63b3535c0196) * [06/09] Official paper and presentation video for BMVC is released at this [link](https://www.bmvc2020-conference.com/conference/papers/paper_0928.html). * [23/09] CSP-p7 + Mish (multi-scale) is currently the SOTA in Object Detection on MS-COCO test-dev while CSP-p7 + Mish (single-scale) is currently the 3rd best model in Object detection on MS-COCO test dev. Further details on [paperswithcode leaderboards](https://paperswithcode.com/sota/object-detection-on-coco). * [11/11] Mish added to [TFLearn](https://github.com/tflearn/tflearn) - [Merged 1159 (Follow up 1141)](https://github.com/tflearn/tflearn/pull/1159) * [17/11] Mish added to [MONAI](https://github.com/Project-MONAI/MONAI) - [Merged 1235](https://github.com/Project-MONAI/MONAI/pull/1235) * [20/11] Mish added to [plaidml](https://github.com/plaidml/plaidml) - [Merged 1566](https://github.com/plaidml/plaidml/pull/1566) * [10/12] Mish added to [Simd](http://ermig1979.github.io/Simd/) and [Synet](https://github.com/ermig1979/Synet) - [Docs](http://ermig1979.github.io/Simd/help/group__synet__activation.html#ga0dc8979a94ceaaf82dee82c4761e4afc) * [14/12] Mish added to [OneFlow](https://github.com/Oneflow-Inc/oneflow) - [Merged 3972](https://github.com/Oneflow-Inc/oneflow/pull/3972) * [24/12] Mish added to [GPT-Neo](https://github.com/EleutherAI/gpt-neo) * [21/04] Mish added to [TensorFlow JS](https://github.com/tensorflow/tfjs/pull/4950) * [02/05] Mish added to [Axon](https://github.com/elixir-nx/axon/commit/e2b5a46eb4d62d163256637692c43195d112a72b) * [26/05] Mish is added to [PyTorch](https://github.com/pytorch/pytorch/pull/58648). Will be added in PyTorch 1.9. * [27/05] Mish is added to [PyTorch YOLO v3](https://github.com/eriklindernoren/PyTorch-YOLOv3/commit/1c03ebe433ef8e085f99a7c147ac291ed224e578) * [09/06] Mish is added to [MXNet](https://github.com/apache/incubator-mxnet/pull/20320). * [03/07] Mish is added to [TorchSharp](https://github.com/xamarin/TorchSharp/commit/2b1068eada7a9ca1b5a67db3772807636683a46c). * [05/08] Mish is added to [KotlinDL](https://github.com/JetBrains/KotlinDL/pull/173).
News/ Media Coverage:
(02/2020): Podcast episode on Mish at Machine Learning Caf is out now. Listen on:
(02/2020): Talk on Mish and Non-Linear Dynamics at Sicara is out now. Watch on:
(07/2020): CROWN: A comparison of morphology for Mish, Swish and ReLU produced in collaboration with Javier Ideami. Watch on:
(08/2020): Talk on Mish and Non-Linear Dynamics at Computer Vision Talks. Watch on:
(12/2020): Talk on From Smooth Activations to Robustness to Catastrophic Forgetting at Weights & Biases Salon is out now. Watch on:
(12/2020) Weights & Biases integration is now added . Get started.
(08/2021) Comprehensive hardware based computation performance benchmark for Mish has been conducted by Benjamin Warner. Blogpost.
MILA/ CIFAR 2020 DLRLSS (Click on arrow to view)

Contents: (Click to expand)
1. [Mish](https://github.com/digantamisra98/Mish/blob/master/README.md#mish)a. [Loss landscape](https://github.com/digantamisra98/Mish#loss-landscape) 2. [ImageNet Scores](https://github.com/digantamisra98/Mish#imagenet-scores) 3. [MS-COCO](https://github.com/digantamisra98/Mish#ms-coco) 4. [Variation of Parameter Comparison](https://github.com/digantamisra98/Mish#variation-of-parameter-comparison)
a. [MNIST](https://github.com/digantamisra98/Mish#mnist)
b. [CIFAR10](https://github.com/digantamisra98/Mish#cifar10)
5. [Significance Level](https://github.com/digantamisra98/Mish#significance-level)
6. [Results](https://github.com/digantamisra98/Mish#results)
a. [Summary of Results (Vision Tasks)](https://github.com/digantamisra98/Mish#summary-of-results-vision-tasks)
b. [Summary of Results (Language Tasks)](https://github.com/digantamisra98/Mish#summary-of-results-language-tasks)
7. [Try It!](https://github.com/digantamisra98/Mish#try-it)
8. Acknowledgements 9. [Cite this work](https://github.com/digantamisra98/Mish#cite-this-work)
Mish:
Mish has a parametric order of continuity of: C Derivative of Mish with respect to Swish and (x) preconditioning:


Mish provides much better accuracy, overall lower loss, smoother and well conditioned easy-to-optimize loss landscape as compared to both Swish and ReLU. For all loss landscape visualizations please visit this [readme](https://github.com/digantamisra98/Mish/blob/master/landscapes/Landscape.md). We also investigate the output landscape of randomly initialized neural networks as shown below. Mish has a much smoother profile than ReLU.

Gaussian Noise with varying standard deviation was added to the input in case of MNIST classification using a simple conv net to observe the trend in decreasing test top-1 accuracy for Mish and compare it to that of ReLU and Swish. Mish mostly maintained a consistent lead over that of Swish and ReLU (Less than ReLU in just 1 instance and less than Swish in 3 instance) as shown below. The trend for test loss was also observed following the same procedure. (Mish has better loss than both Swish and ReLU except in 1 instance)
CIFAR10:
Significance Level:
The P-values were computed for different activation functions in comparison to that of Mish on terms of Top-1 Testing Accuracy of a Squeeze Net Model on CIFAR-10 for 50 epochs for 23 runs using Adam Optimizer at a Learning Rate of 0.001 and Batch Size of 128. It was observed that Mish beats most of the activation functions at a high significance level in the 23 runs, specifically it beats ReLU at a high significance of P < 0.0001. Mish also had a comparatively lower standard deviation across 23 runs which proves the consistency of performance for Mish.
|Activation Function| Mean Accuracy | Mean Loss| Standard Deviation of Accuracy | P-value | Cohen's d Score | 95% CI| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| |Mish|87.48%|4.13%|0.3967|-|-|-| |Swish-1|87.32%|4.22%|0.414|P = 0.1973|0.386|-0.3975 to 0.0844| |E-Swish (=1.75)|87.49%|4.156%|0.411|P = 0.9075|0.034444|-0.2261 to 0.2539| |GELU|87.37%|4.339%|0.472|P = 0.4003|0.250468|-0.3682 to 0.1499| |ReLU|86.66%|4.398%|0.584|P < 0.0001|1.645536|-1.1179 to -0.5247| |ELU(=1.0)|86.41%|4.211%|0.3371|P < 0.0001|2.918232|-1.2931 to -0.8556| |Leaky ReLU(=0.3)|86.85%|4.112%|0.4569|P < 0.0001|1.47632|-0.8860 to -0.3774| |RReLU|86.87%|4.138%|0.4478|P < 0.0001|1.444091|-0.8623 to -0.3595| |SELU|83.91%|4.831%|0.5995|P < 0.0001|7.020812|-3.8713 to -3.2670| |SoftPlus( = 1)|83.004%|5.546%|1.4015|P < 0.0001|4.345453|-4.7778 to -4.1735| |HardShrink( = 0.5)|75.03%|7.231%|0.98345|P < 0.0001|16.601747|-12.8948 to -12.0035| |Hardtanh|82.78%|5.209%|0.4491|P < 0.0001|11.093842|-4.9522 to -4.4486| |LogSigmoid|81.98%|5.705%|1.6751|P < 0.0001|4.517156|-6.2221 to -4.7753| |PReLU|85.66%|5.101%|2.2406|P = 0.0004|1.128135|-2.7715 to -0.8590| |ReLU6|86.75%|4.355%|0.4501|P < 0.0001| 1.711482|-0.9782 to -0.4740| |CELU(=1.0)|86.23%|4.243%|0.50941|P < 0.0001| 2.741669|-1.5231 to -0.9804| |Sigmoid|74.82%|8.127%|5.7662|P < 0.0001|3.098289|-15.0915 to -10.2337| |Softshrink( = 0.5)|82.35%|5.4915%|0.71959|P < 0.0001|8.830541|-5.4762 to -4.7856| |Tanhshrink|82.35%|5.446%|0.94508|P < 0.0001|7.083564|-5.5646 to -4.7032| |Tanh|83.15%|5.161%|0.6887| P < 0.0001|7.700198|-4.6618 to -3.9938| |Softsign|82.66%|5.258%|0.6697|P < 0.0001|8.761157|-5.1493 to -4.4951| |Aria-2( = 1, =1.5)|81.31%|6.0021%|2.35475|P < 0.0001|3.655362|-7.1757 to -5.1687| |Bent's Identity|85.03%|4.531%|0.60404|P < 0.0001|4.80211|-2.7576 to -2.1502| |SQNL|83.44%|5.015%|0.46819|P < 0.0001|9.317237|-4.3009 to -3.7852| |ELisH|87.38%|4.288%|0.47731|P = 0.4283|0.235784|-0.3643 to 0.1573| |Hard ELisH|85.89%|4.431%|0.62245|P < 0.0001|3.048849|-1.9015 to -1.2811| |SReLU|85.05%|4.541%|0.5826|P < 0.0001|4.883831|-2.7306 to -2.1381| |ISRU (=1.0)|86.85%|4.669%|0.1106|P < 0.0001|5.302987|-4.4855 to -3.5815| |Flatten T-Swish|86.93%|4.459%|0.40047|P < 0.0001|1.378742|-0.7865 to -0.3127| |SineReLU ( = 0.001)|86.48%|4.396%|0.88062|P < 0.0001|1.461675|-1.4041 to -0.5924| |Weighted Tanh (Weight = 1.7145)|80.66%|5.985%|1.19868|P < 0.0001|7.638298|-7.3502 to -6.2890| |LeCun's Tanh|82.72%|5.322%|0.58256|P < 0.0001|9.551812|-5.0566 to -4.4642| |Soft Clipping (=0.5)|55.21%|18.518%|10.831994|P < 0.0001|4.210373|-36.8255 to -27.7154| |ISRLU (=1.0)|86.69%|4.231%|0.5788|P < 0.0001|1.572874|-1.0753 to -0.4856|
Values rounded up which might cause slight deviation in the statistical values reproduced from these tests
Results:
News: Ajay Arasanipalai recently submitted benchmark for CIFAR-10 training for the Stanford DAWN Benchmark using a Custom ResNet-9 + Mish which achieved 94.05% accuracy in just 10.7 seconds in 14 epochs on the HAL Computing Cluster. This is the current fastest training of CIFAR-10 in 4 GPUs and 2nd fastest training of CIFAR-10 overall in the world.
Summary of Results (Vision Tasks):
Comparison is done based on the high priority metric, for image classification the Top-1 Accuracy while for Generative Networks and Image Segmentation the Loss Metric. Therefore, for the latter, Mish > Baseline is indicative of better loss and vice versa. For Embeddings, the AUC metric is considered.
|Activation Function| Mish > Baseline Model | Mish < Baseline Model | |---|---|---| |ReLU|55|20| |Swish-1|53|22| |SELU|26|1| |Sigmoid|24|0| |TanH|24|0| |HardShrink( = 0.5)|23|0| |Tanhshrink|23|0| |PReLU(Default Parameters) |23|2| |Softsign|22|1| |Softshrink ( = 0.5)|22|1| |Hardtanh|21|2| |ELU(=1.0)|21|7| |LogSigmoid|20|4| |GELU|19|3| |E-Swish (=1.75)|19|7| |CELU(=1.0)|18|5| |SoftPlus( = 1)|17|7| |Leaky ReLU(=0.3)|17|8| |Aria-2( = 1, =1.5)|16|2| |ReLU6|16|8| |SQNL|13|1| |Weighted TanH (Weight = 1.7145)|12|1| |RReLU|12|11| |ISRU (=1.0)|11|1| |Le Cun's TanH|10|2| |Bent's Identity|10|5| |Hard ELisH|9|1| |Flatten T-Swish|9|3| |Soft Clipping (=0.5)|9|3| |SineReLU ( = 0.001)|9|4| |ISRLU (=1.0)|9|4| |ELisH|7|3| |SReLU|7|6| |Hard Sigmoid|1|0| |Thresholded ReLU(=1.0)|1|0|
Summary of Results (Language Tasks):
Comparison is done based on the best metric score (Test accuracy) across 3 runs.
|Activation Function| Mish > Baseline Model | Mish < Baseline Model | |---|---|---| |Penalized TanH|5|0| |ELU|5|0| |Sigmoid|5|0| |SReLU|4|0| |TanH|4|1| |Swish|3|2| |ReLU|2|3| |Leaky ReLU|2|3| |GELU|1|2|
Try It!
|Torch|DarkNet|Julia|FastAI|TensorFlow|Keras|CUDA| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| |Source|Source|Source|Source|Source|Source|Source|
Acknowledgments: (Click to expand)
Thanks to all the people who have helped and supported me massively through this project who include: 1. [Sparsha Mishra](https://github.com/SparshaMishra) 2. [Alexandra Deis](https://github.com/Lexie88rus) 3. [Alexey Bochkovskiy](https://github.com/AlexeyAB) 4. [Chien-Yao Wang](https://github.com/WongKinYiu/CrossStagePartialNetworks) 5. [Thomas Brandon](https://github.com/thomasbrandon) 6. [Less Wright](https://github.com/lessw2020) 7. [Manjunath Bhat](https://github.com/thebhatman) 8. [Ajay Uppili Arasanipalai](https://github.com/iyaja) 9. [Federico Lois](https://github.com/redknightlois) 10. [Javier Ideami](https://github.com/javismiles) 11. [Ioannis Anifantakis](https://github.com/ioannisa) 12. [George Christopoulos](https://github.com/geochri) 13. [Miklos Toth](https://hu.linkedin.com/in/miklostoth) And many more including the [Fast AI community](https://forums.fast.ai/t/meet-mish-new-activation-function-possible-successor-to-relu/53299/647), [Weights and Biases Community](https://www.wandb.com/), [TensorFlow Addons team](https://www.tensorflow.org/addons), [SpaCy/Thinc team](https://explosion.ai/), [Sicara team](https://www.sicara.fr/), [Udacity scholarships team](https://www.udacity.com/scholarships) to name a few. *Apologies if I missed out anyone.*Cite this work:
@article{misra2019mish,
title={Mish: A self regularized non-monotonic neural activation function},
author={Misra, Diganta},
journal={arXiv preprint arXiv:1908.08681},
year={2019}
}
Owner
- Name: Xa9aX ツ
- Login: digantamisra98
- Kind: user
- Repositories: 51
- Profile: https://github.com/digantamisra98
GitHub Events
Total
- Watch event: 16
- Push event: 79
- Fork event: 2
Last Year
- Watch event: 16
- Push event: 79
- Fork event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Diganta Misra | m****1@g****m | 2,878 |
| snyk-bot | s****t@s****o | 1 |
| dependabot[bot] | 4****] | 1 |
| Stark | 4****a | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 33
- Total pull requests: 7
- Average time to close issues: about 2 months
- Average time to close pull requests: 12 days
- Total issue authors: 27
- Total pull request authors: 6
- Average comments per issue: 5.0
- Average comments per pull request: 0.14
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: about 16 hours
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- DonaldTsang (4)
- chris-ha458 (2)
- njcurtis3 (2)
- DaniyarM (2)
- tranhungnghiep (1)
- mehmetalianil (1)
- senovr (1)
- henbucuoshanghai (1)
- philipturner (1)
- jpcenteno80 (1)
- evanatyourservice (1)
- SkeletonOne (1)
- aicrumb (1)
- DrewdropLife (1)
- LifeIsStrange (1)
Pull Request Authors
- imgbot[bot] (2)
- Vinay0508 (2)
- delzac (1)
- snyk-bot (1)
- dependabot[bot] (1)
- ashishbairwa (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- torch >1.9
- torchvision *
- wandb *


