frog

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

https://github.com/helmholtz-ai-energy/frog

Science Score: 85.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
    Organization helmholtz-ai-energy has institutional domain (www.helmholtz.ai)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

Basic Info
  • Host: GitHub
  • Owner: Helmholtz-AI-Energy
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 337 KB
Statistics
  • Stars: 5
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

Forward gradients are an approach to approximate gradients from directional derivatives along random tangents. Multi-tangent forward gradients improve this approximation by aggregating over multiple tangents.

This repository provides experimental code to analyze multi-tangent forward gradients along with instructions on how to reproduce the results of our paper "Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients".

For more details, check out our preprint: https://arxiv.org/abs/2410.17764

If you find this repository useful, please cite our paper as:

Flügel, Katharina, Daniel Coquelin, Marie Weiel, Achim Streit, and Markus Götz. 2024. “Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients.” arXiv. https://doi.org/10.48550/arXiv.2410.17764.

Intro to Multi-Tangent Forward Gradients

Gradient descent is a powerful to optimize differentiable functions and drives the training of most modern neural networks. The necessary gradients are typically computed using backpropagation. However, the alternating forward and backward passes make it biologically implausible and hinder parallelization.

What are Forward Gradients?

Forward gradients are a way to approximate the gradient using forward-mode automatic differentiation (AD) along a specific tangent direction. This is more efficient than full forward-mode AD, which requires as many passes as parameters, while still being free of backward passes through the model.

These forward gradients have multiple nice properties, for example: - A forward gradient is an unbiased estimator of the gradient for a random tangent $\sim\mathcal{N}(0, I_n)$ - A forward gradient is always a descending direction - A forward gradient is always (anti-)parallel to it's corresponding tangent.

The following figure illustrates a forward gradient $g_v$ for the tangent $v$ and the gradient $\nabla f$. A single tangent forward gradient

However, as the dimension increases, the variance of the forward gradient increases, and with it, the approximation quality decreases. Essentially, the forward gradient limits the high-dimensional gradient to a single dimension, that of the tangent. This approximation of course depends strongly on how close the selected tangent is to the gradient. In high dimensional spaces, a random tangent is quite unlikely to be close to the gradient and is in fact expected to be near-orthogonal to the gradient.

Multi-Tangent Forward Gradients

Multi-tangent forward gradients aggregate the forward gradients over multiple tangents. This improves the approximation quality and enable the optimization of higher dimensional problems. This repository offers multiple different aggregation approaches, from simple sums and averages to the provably most accurate orthogonal projection.

The following figure illustrates the orthogonal projection $PU(\nabla f)$ of a gradient $\nabla f$ on the subspace $U$ spanned by the tangents $v1$ and $v_2$.

The forward gradient as orthogonal projection

Installation

Our experiments were implemented using Python 3.9, newer versions of python might work but have not yet been tested. It is recommended to create a new virtual environment. Then, install the requirements from requirements.txt, e.g. with bash pip install -r requirements.txt

Usage

Reproducing our results

Here, we give instructions on how to reproduce the experimental results presented in Section 4. The outputs of all our experiments are automatically saved to results/ and ordered by date and experiment. All the following scripts come with a command-line interface. Use --help to find out more about additional parameters, e.g. to reduce the number of samples, seeds, or epochs.

Approximation Quality (Section 4.1)

To evaluate the cosine similarity and norm of the forward gradients compared to the true gradient $\nabla f$, call bash PYTHONPATH=. python approximation_quality.py You can reduce the number of samples via the --num_samples to get faster results.

Optimization of Closed-Form Functions (Section 4.2)

To reproduce the optimization of the closed-form functions, set <function> to sphere, rosenbrock, or styblinski-tang and call bash PYTHONPATH=. python function_optimization/math_experiments.py --function <function> math_experiments This runs the optimization for all gradient approaches and all dimensions $n$, automatically reading the corresponding learning rate from lrs/math.csv.

Using Custom Tangents (Section 4.3)

To reproduce the approximation quality and optimization results for tangents with specific angles to the first tangent, call bash PYTHONPATH=. python approximation_quality.py --tangents angle --angles 15 30 45 60 75 90 and bash PYTHONPATH=. python function_optimization/math_experiments.py --function styblinski-tang custom_tangents

Neural Network Training (Sections 4.4 and 4.5)

The nn_training/train.py provides the interface to train neural networks with different gradients. It downloads the datasets automatically to data/.

To specify the gradient, use - <GRADIENT>=bp for the true gradient $\nabla f$ obtained via backpropagation - <GRADIENT>=fg and <K>=1 for the single-tangent forward gradient baseline $gv$ - <GRADIENT>=fg and <K> in 2, 4, 16 for multi-tangent forward gradient with mean aggregation $\overline{gV}$ - <GRADIENT>=frog and <K> in 2, 4, 16 for multi-tangent forward gradient with orthogonal projection $P_U$

The learning rate <LR> should be set according to the tables given in the appendix or lrs/fc_nns.csv and lrs/sota_nns.csv. The random seed is set via <SEED>, we used seeds 0 to 2. Pass --device cuda to use the GPU.

For the fully-connected neural networks trained in Section 4.4, use bash PYTHONPATH=. python nn_training/train.py --model fc --model_hidden_size <WIDTH> --experiment_id fc_nn --output_name fc_w<WIDTH> --gradient_computation <GRADIENT> --num_directions <K> --initial_lr <LR> --seed <SEED> for the hidden layer width <WIDTH> set to 256, 1024, or 4096.

For ResNet18 and ViT trained in Section 4.5, use bash PYTHONPATH=. python nn_training/train.py --model <MODEL> --dataset <DATASET> --experiment_id sota_nn --output_name <MODEL>_<DATASET> --gradient_computation <GRADIENT> --num_directions <K> --initial_lr <LR> --seed <SEED> with <MODEL> set to resnet18 or vit and <DATASET> set to mnist or cifar10.

Learning Rate Search

Code and instructions to run the learning rate search are given in lr_search/. The learning rates determined by the search are given as CSV files in lrs/.

Owner

  • Name: Helmholtz AI Energy
  • Login: Helmholtz-AI-Energy
  • Kind: organization
  • Email: consultant-helmholtz.ai@kit.edu
  • Location: Karlsruhe, Germany

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: >-
  Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients
type: software
url: http://arxiv.org/abs/2410.17764
repository-code: https://github.com/Helmholtz-AI-Energy/frog
authors:
  - family-names: Flügel
    given-names: Katharina
  - family-names: Coquelin
    given-names: Daniel
  - family-names: Weiel
    given-names: Marie
  - family-names: Streit
    given-names: Achim
  - family-names: Götz
    given-names: Markus
preferred-citation:
  type: article
  title: >-
    Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients
  abstract: >-
    The gradients used to train neural networks are typically computed using backpropagation. While an efficient way
    to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is
    biologically implausible. Forward gradients are an approach to approximate the gradients from directional
    derivatives along random tangents computed by forward-mode automatic differentiation. So far, research has focused
    on using a single tangent per step. This paper provides an in-depth analysis of multi-tangent forward gradients
    and introduces an improved approach to combining the forward gradients from multiple tangents based on orthogonal
    projections. We demonstrate that increasing the number of tangents improves both approximation quality and
    optimization performance across various tasks.
  keywords:
    - Computer Science - Artificial Intelligence
    - Computer Science - Machine Learning
  authors:
    - family-names: Flügel
      given-names: Katharina
    - family-names: Coquelin
      given-names: Daniel
    - family-names: Weiel
      given-names: Marie
    - family-names: Streit
      given-names: Achim
    - family-names: Götz
      given-names: Markus
  doi: 10.48550/arXiv.2410.17764
  year: 2024

GitHub Events

Total
  • Watch event: 5
  • Public event: 1
  • Push event: 5
Last Year
  • Watch event: 5
  • Public event: 1
  • Push event: 5

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 15
  • Total Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 15
  • Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Katharina Fluegel k****l@k****u 15
Committer Domains (Top 20 + Academic)
kit.edu: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

lr_search/requirements.txt pypi
  • einops *
  • mpi4py *
  • numpy *
  • pandas *
  • prefixed *
  • torch *
  • torchvision *
  • tqdm *
requirements.txt pypi
  • einops *
  • numpy *
  • pandas *
  • prefixed *
  • torch *
  • torchvision *
  • tqdm *