https://github.com/anishacharya/double-descent
We investigate double descent more deeply and try to precisely characterize the phenomenon under different settings. Specifically, we focus on the impact of label noise and regularization on double descent. None of the existing works consider these aspects in detail and we hypothesize that these play an integral role in double descent.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.0%) to scientific vocabulary
Repository
We investigate double descent more deeply and try to precisely characterize the phenomenon under different settings. Specifically, we focus on the impact of label noise and regularization on double descent. None of the existing works consider these aspects in detail and we hypothesize that these play an integral role in double descent.
Basic Info
- Host: GitHub
- Owner: anishacharya
- License: mit
- Default Branch: master
- Homepage: https://anishacharya.github.io/double-descent/
- Size: 103 KB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
https://github.com/anishacharya/double-descent/blob/master/
# Understanding Double Descent We pose a particular question: can we mitigate double descent by applying adequate regularization in the case of noisy data? Further, we also plan to investigate if suitable regularization can bias the trajectory of gradient-based optimization algorithms in such a way that double descent can be mitigated, i.e. we do not observe an intermediate dip at all or observe a very small intermediate dip in the test performance. ### BackgroundWe focus on the phenomenon of double descent in deep learning wherein when we increase model size or the num-ber of epochs, performance on the test set initially improves(as expected), then worsens but again starts to improve andfinally saturates, which is against conventional wisdom. This phenomenon has already been demonstrated on traditional machine learning models [[Belikin et.al.](https://arxiv.org/pdf/1812.11118.pdf)], [[Belkin et.al.](https://arxiv.org/pdf/1903.07571.pdf)] but more recently was also observed in complex deep learning models [[Nakkiran et.al.](https://arxiv.org/pdf/1912.02292.pdf)]. There have also been attempts at mathematically explaining double descent for simple linear regression settings. ### DataSets We shall try to reproduce the limited existing results from OpenAI [[Nakkiran et.al.](https://arxiv.org/pdf/1912.02292.pdf)] which use commonly used datasets such as CIFAR-10, CIFAR-100, etc. on common architectures such as ResNets, VGG, etc. We shall also try to observe this in simple problems such as linear regression. More importantly, we shall try to observe its variation with different kinds and degrees of regularization as well as different noise levels. Existing works do not consider regularization in too much detail but we think it is a critical factor controlling the extent of double descent.
Owner
- Name: Anish Acharya
- Login: anishacharya
- Kind: user
- Location: Austin, Tx
- Company: University of Texas at Austin
- Website: http://anishacharya.github.io
- Repositories: 8
- Profile: https://github.com/anishacharya
We focus on the phenomenon of double descent in deep learning wherein when we increase model size or
the num-ber of epochs, performance on the test set initially improves(as expected),
then worsens but again starts to improve andfinally saturates, which is against conventional wisdom.
This phenomenon has already been demonstrated on traditional machine
learning models [[Belikin et.al.](https://arxiv.org/pdf/1812.11118.pdf)],
[[Belkin et.al.](https://arxiv.org/pdf/1903.07571.pdf)] but more recently was also observed in complex deep learning models
[[Nakkiran et.al.](https://arxiv.org/pdf/1912.02292.pdf)].
There have also been attempts at mathematically explaining
double descent for simple linear regression settings.
### DataSets
We shall try to reproduce the limited existing results from OpenAI
[[Nakkiran et.al.](https://arxiv.org/pdf/1912.02292.pdf)]
which use commonly used datasets such as CIFAR-10, CIFAR-100, etc. on common
architectures such as ResNets, VGG, etc. We shall also try to observe this
in simple problems such as linear regression.
More importantly, we shall try to observe its variation with different kinds and
degrees of regularization as well as different noise levels. Existing works
do not consider regularization in too much detail but we think it is a critical factor
controlling the extent of double descent.