https://github.com/cheind/mingru

Torch MinGRU implementation based on "Were RNNs All We Needed?"

https://github.com/cheind/mingru

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Torch MinGRU implementation based on "Were RNNs All We Needed?"

Basic Info
  • Host: GitHub
  • Owner: cheind
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 585 KB
Statistics
  • Stars: 13
  • Watchers: 1
  • Forks: 3
  • Open Issues: 2
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

torch-mingru

PyTorch (convolutional) MinGRU implementation based on

Feng, Leo, et al. "Were RNNs All We Needed?" (2024).

Convolutional MinGRU based on

Heindl, Christoph et al. "Convolutional MinGRU" (2024).

Features

In alignment with torch recurrent modules, mingru provides the following core modules - mingru.MinGRUCell single layer MinGRU - mingru.MinGRU multi-layer stacked MinGRU - mingru.MinConv2dGRUCell single layer convolutional MinGRU - mingru.MinConv2dGRU multi-layer stacked convolutional MinGRU

Each module supports the following features (if applicable to type) - Parallel: Efficient log-space parallel evaluation support plus sequential support for testing. Automatically dispatches to the most efficient implementation. - Multilayer: Stack multiple MinGRU layers via hidden_sizes= arguments. When len(hidden_sizes)>1, the output hidden states of layer $i$ are passed as inputs to $i+1$. Varying hidden sizes are supported. - Dropout: Via parameter dropout=, when > 0 all inputs of each layer are effected except for the last layer. - Residual: Residual connections betweeen outputs of minGRU layers via residual= argument. - Bias: Biases in linear layers can be enabled and disabled via the bias= argument. - Bidirectional: Bi-directional processing can be enabled by wrapping RNNs via mingru.Bidirectional. - Normalization: LayerNorm and GroupNorms between stacked MinGRUs via norm=argument. - Scripting: MinGRU is compatible with torch.jit.script. - Compatibility: Interface of mingru.* is mostly compatible with that of torch.nn.GRU/GRUCell, except that and sequence-first arguments are not supported and bi-directional is provided by mingru.Bidirectional wrapper. Cells in mingru also support sequence arguments to benefit from parallel computation.

Installation

```shell

Install directly from github

pip install git+https://github.com/cheind/mingru.git ```

Usage

MinGRU

The following snippet demonstrates a multi-layer stacked MinGRU.

```python import torch import mingru

Instantiate

B, inputsize, hiddensizes, S = 10, 3, [32, 64], 128 rnn = mingru.MinGRU( inputsize=inputsize, hiddensizes=hiddensizes, dropout=0.0, residual=True, ).eval()

Invoke for input x with sequence length S and batch-size B

This will implicitly assume a 'zero' hidden state

for each layer.

x = torch.randn(B, S, input_size) out, h = rnn(x) assert out.shape == (B, S, 64) assert h[0].shape == (B, 1, 32) assert h[1].shape == (B, 1, 64)

Invoke with initial/previous hidden states.

h = rnn.inithiddenstate(x) out, h = rnn(torch.randn(B, S, input_size), h=h)

Sequential prediction pattern

h = rnn.inithiddenstate(x) outseq = [] for i in range(x.shape[1]): out, h = rnn(x[:, i : i + 1], h=h) outseq.append(out) outseq = torch.cat(outseq, 1) assert out_seq.shape == (B, S, 64)

Parallel prediction pattern

outpar, h = rnn(x, rnn.inithiddenstate(x)) assert torch.allclose(outseq, out_par, atol=1e-4) ```

MinConv2dGRU

Following sample demonstrates convolutional multi-layer stacked MinGRUs.

```python import torch import mingru

B, S = 5, 10 inputsize = 3 hiddensizes = [16, 32, 64] kernel_sizes = [3, 3, 3] padding = 1 stride = 2

rnn = mingru.MinConv2dGRU( inputsize=inputsize, hiddensizes=hiddensizes, kernelsizes=kernelsizes, paddings=padding, strides=stride, dropout=0.0, residual=True, ).eval()

Invoke for input x with sequence length S and batch-size B

This will implicitly assume a 'zero' hidden state

for each layer.

x = torch.randn(B, S, input_size, 64, 64) out, h = rnn(x) assert out.shape == (B, S, 64, 8, 8) assert h[0].shape == (B, 1, 16, 32, 32) assert h[1].shape == (B, 1, 32, 16, 16) assert h[2].shape == (B, 1, 64, 8, 8)

Invoke with initial/previous hidden states.

h = rnn.inithiddenstate(x) out, h = rnn(x, h=h)

Sequential prediction pattern

h = rnn.inithiddenstate(x) outseq = [] for i in range(x.shape[1]): out, h = rnn(x[:, i : i + 1], h=h) outseq.append(out) outseq = torch.cat(outseq, 1) assert out_seq.shape == (B, S, 64, 8, 8)

Parallel prediction pattern

outpar, h = rnn(x, rnn.inithiddenstate(x)) assert torch.allclose(outseq, out_par, atol=1e-4) ```

Examples

Selective Copying

For a more complete example check the examples/selective_copying.py, which attempts to learn to selectively pick specific tokens in order from a generated sequence.

shell python -m examples.selective_copying ... Step [1941/2000], Loss: 0.0002, Accuracy: 99.61% Step [1961/2000], Loss: 0.0002, Accuracy: 100.00% Step [1981/2000], Loss: 0.0002, Accuracy: 99.61% Validation Accuracy: 100.00%

Per default, the example is configured for a small usecase (sequence length 64, vocab size 6, memorize 4), but you might just change to a much larger test by adopting cfg dict at the end of the file.

Task is based on

Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." (2023).

Video Classification

Trains a video classification network using convolutional MinGRUs from scratch using UCF101 train/test splits. Mimicks the (first) architecture of

Ballas, Nicolas, Li Yao1 Chris Pal, and Aaron Courville. "Delving deeper into convolution networks for learning video representation." (2015).

On fold 1 this achieves a validation top-1 accuracy 95% and 78% on test, which replicates the results from the paper. The architecture uses a VGG16 backbone trained on ImageNet. One can expect better test results when pre-training is done on larger video action datasets.

First, register these environment variables

```shell

Set path to UCF dataset and annotations

export UCF101PATH=/path/to/UCF/dir export UCF101ANNPATH=/path/to/ann/dir ```

Train

shell python -m examples.video_classification train -f 1 ... 2024-12-01 07:53:26,868: Epoch 7, Step 75961, Loss: 0.0042, Accuracy: 100.00% 2024-12-01 07:53:43,763: Epoch 7, Step 75981, Loss: 0.1159, Accuracy: 93.75% 2024-12-01 07:54:05,992: Epoch 7, Step 76000, Validation Accuracy: 99.50%, Validation Loss: 0.00

Test

Test protocol is based on Paper using 25 clips from each video and perform average/majority voting

shell python -m examples.video_classification test -f 1 tmp/video_classifier_best.pt ... 2024-12-01 08:19:27,585: Acc: 0.7048961511382305 2024-12-01 08:19:27,762: Acc: 0.7047927727099405 2024-12-01 08:19:27,799: Test accuracy 0.70

Generative Predictive Text

Trains and samples from a GPT2-like model, but uses stacked MinGRUs instead of transformers. Adapted from nanoGPT.

Train

Dataset is currently restricted to a single text file. We use Tiny-Shakespeare

shell python -m examples.nlp train tmp/tinyshakespeare.txt

Sample

```shell python -m examples.nlp sample --num-tokens 512 tmp/tinyshakespeare.nlp_best.pt

ISABELLA:
One of my sister must confess come,
And two spain under mine honour humbly out:
Yea, you'll be made in wicked Pompe. What, ho!
This is a gallful device shall rise.
I do beseech you, gentle my lord,
And bring him well, and nothing but my life,
But your beauty knows stands with your beauty,
In your mistress and your brother come.
...

```

Owner

  • Name: Christoph Heindl
  • Login: cheind
  • Kind: user
  • Location: Austrian area

I am a computer scientist working at the interface of perception, robotics and deep learning.

GitHub Events

Total
  • Issues event: 3
  • Watch event: 15
  • Issue comment event: 9
  • Push event: 34
  • Fork event: 4
  • Create event: 5
Last Year
  • Issues event: 3
  • Watch event: 15
  • Issue comment event: 9
  • Push event: 34
  • Fork event: 4
  • Create event: 5

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 2 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 2.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 2 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 2.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Ttimofeyka (1)
  • yenhochen (1)
  • WhyAreYouJay (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

dev_requirements.in pypi
  • numpy * development
  • pytest * development
pyproject.toml pypi
requirements.in pypi
  • torch *