tadam

https://github.com/dddong2/tadam

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: sciencedirect.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Last synced: 7 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: dddong2
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 3.11 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Created almost 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

TADAM(Trust region ADAptive Moment estimation)

ℹ️ Summary:

Tadam approximates the loss up to the second order using the Fisher.
Tadam approximates the Fisher and reduces the computational burdens to the O(N) level.
Tadam employs an adaptive trust region scheme to reduce approximate errors and guarantee stability.
Tadam evaluates how well it minimizes the loss function and uses this information to adjust the trust region dynamically.

Experiment

We use our Tadam to train the deep auto-encoder. The training data sets are MNIST, Fashion-MNIST, CIFAR-10, and celebA. We train each auto-encoder ten times and record the loss's mean and standard deviations. Tadam exhibits a space and time complexity of $O(N)$, placing it on par with other widely used optimizers such as Adam, AMSGrad, Radam, and Nadam.

Validation loss per epoch

L2 loss per epoch

Tadam converges faster than the benchmarks.

Validation loss by varying $\gamma$

L2 loss per epoch

We use the hyper-parameter $\gamma \in (0, 0.25]$ to measure Tadam's training performance and update the $\delta_n$, which controls the trust region size.
We evaluate the impact of $\gamma$, we use $\gamma$ values of $0.1$, $0.2$, and $0.25$ while maintaining a fixed learning rate $\eta$ of $0.001$, respectively. We observe that Tadam consistently maintains a relatively stable validation loss across the different $\gamma$ values, suggesting that Tadam's performance is relatively insensitive to the specific choices of $\gamma$.

Q&A

Q. I don't quite understand the update equation for vn in your Algorithm 1. Why is the expression MA(gn - gbarn-1)(gn - gbarn)? The gbarn-1 term is a little surprising to me.

A. Initially, we searched for references on how others handle the moving average of the second moment, and we found both MA(gn - gbarn)(gn - gbarn) and MA(gn - gbarn-1)(gn - gbarn). We experimented using both representations; the second performed better than the first, and we reported only the second in the paper. gn is the current gradient, gbarn is the moving average containing the current, and gbarn-1 is without the current. So, MA(gn - gbarn-1)(gn - gbar_n) is a mixture of (backward) Nesterov momentum moving average and more traditional moving average.

Q. The "for n = 1 to N" loop in Algorithm 1, does n represent the n-th sample, n-th mini-batch, or n-th epoch?

A. One likely uses adam in the code when one trains a model. To use tadam instead of adam, just add t in front of adam, i.e., change adam to tadam. That is the original intention of our algorithm. One can interpret the for loop in Algorithm 1 in this respect. For our experiment setting, however, to quickly observe the difference between the adam and tadam, we update the model parameters for each mini-batch.

Paper

https://www.sciencedirect.com/science/article/abs/pii/S089360802300504X

Owner

Login: dddong2
Kind: user

Repositories: 1
Profile: https://github.com/dddong2

Citation (CITATION.cff)

message: "If you use this software, please cite it as below."
authors:
- family-names: "Yang"
  given-names: "Donghee"
  orcid: "https://orcid.org/0000-0003-0734-9929"
title: "TADAM(Trust region ADAptive Moment estimation)"
version: 2.1.0
url: "https://github.com/dddong2/Tadam.git"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

tadam

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TADAM(Trust region ADAptive Moment estimation)

ℹ️ Summary:

Tadam approximates the loss up to the second order using the Fisher.

Tadam approximates the Fisher and reduces the computational burdens to the O(N) level.

Tadam employs an adaptive trust region scheme to reduce approximate errors and guarantee stability.

Tadam evaluates how well it minimizes the loss function and uses this information to adjust the trust region dynamically.