Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: sciencedirect.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: dddong2
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 3.11 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
TADAM(Trust region ADAptive Moment estimation)
ℹ️ Summary:
Tadam approximates the loss up to the second order using the Fisher.
Tadam approximates the Fisher and reduces the computational burdens to the O(N) level.
Tadam employs an adaptive trust region scheme to reduce approximate errors and guarantee stability.
Tadam evaluates how well it minimizes the loss function and uses this information to adjust the trust region dynamically.
Experiment
- We use our Tadam to train the deep auto-encoder. The training data sets are MNIST, Fashion-MNIST, CIFAR-10, and celebA. We train each auto-encoder ten times and record the loss's mean and standard deviations. Tadam exhibits a space and time complexity of $O(N)$, placing it on par with other widely used optimizers such as Adam, AMSGrad, Radam, and Nadam.
Validation loss per epoch

- Tadam converges faster than the benchmarks.
Validation loss by varying $\gamma$

- We use the hyper-parameter $\gamma \in (0, 0.25]$ to measure Tadam's training performance and update the $\delta_n$, which controls the trust region size.
- We evaluate the impact of $\gamma$, we use $\gamma$ values of $0.1$, $0.2$, and $0.25$ while maintaining a fixed learning rate $\eta$ of $0.001$, respectively. We observe that Tadam consistently maintains a relatively stable validation loss across the different $\gamma$ values, suggesting that Tadam's performance is relatively insensitive to the specific choices of $\gamma$.
Q&A
Q. I don't quite understand the update equation for vn in your Algorithm 1. Why is the expression MA(gn - gbarn-1)(gn - gbarn)? The gbarn-1 term is a little surprising to me.
A. Initially, we searched for references on how others handle the moving average of the second moment, and we found both MA(gn - gbarn)(gn - gbarn) and MA(gn - gbarn-1)(gn - gbarn). We experimented using both representations; the second performed better than the first, and we reported only the second in the paper. gn is the current gradient, gbarn is the moving average containing the current, and gbarn-1 is without the current. So, MA(gn - gbarn-1)(gn - gbar_n) is a mixture of (backward) Nesterov momentum moving average and more traditional moving average.
Q. The "for n = 1 to N" loop in Algorithm 1, does n represent the n-th sample, n-th mini-batch, or n-th epoch?
A. One likely uses adam in the code when one trains a model. To use tadam instead of adam, just add t in front of adam, i.e., change adam to tadam. That is the original intention of our algorithm. One can interpret the for loop in Algorithm 1 in this respect. For our experiment setting, however, to quickly observe the difference between the adam and tadam, we update the model parameters for each mini-batch.
Paper
https://www.sciencedirect.com/science/article/abs/pii/S089360802300504X
Owner
- Login: dddong2
- Kind: user
- Repositories: 1
- Profile: https://github.com/dddong2
Citation (CITATION.cff)
message: "If you use this software, please cite it as below." authors: - family-names: "Yang" given-names: "Donghee" orcid: "https://orcid.org/0000-0003-0734-9929" title: "TADAM(Trust region ADAptive Moment estimation)" version: 2.1.0 url: "https://github.com/dddong2/Tadam.git"
GitHub Events
Total
- Issues event: 1
- Watch event: 1
- Push event: 16
Last Year
- Issues event: 1
- Watch event: 1
- Push event: 16