stnls

Space-Time Attention with a Shifted Non-Local Search

https://github.com/gauenk/stnls

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Keywords

autograd cuda differentiable non-local non-local-search pytorch video

Last synced: 6 months ago · JSON representation ·

Repository

Space-Time Attention with a Shifted Non-Local Search

Basic Info

Host: GitHub
Owner: gauenk
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage:
Size: 22.2 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

autograd cuda differentiable non-local non-local-search pytorch video

Created almost 4 years ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

Space-Time Non-Local Search (stnls)

A Pytorch-friendly C++/CUDA library to support Space-Time Attention with a Shifted Non-Local Search. The shifted non-local search corrects the small spatial inaccuracies from predicted, long-range offsets such as optical flow (as in Guided Deformable Attention).

[arxiv]

Related Works & Module Summary

Our module corrects small spatial errors of long-range predicted offsets to identify regions of high-affinity between the query and keys within attention. The module executes a small grid search surrounding the predicted offset locations. The first figure compares our search method with other recent attention modules. The second figure outlines each conceptual step of our Shifted Non-Local Search.

related works

shifted nls

Install & Usage

bash git clone git@github.com:gauenk/stnls.git cd stnls python -m pip install -e .

See "example_attn.py" for usage details. Another example is below:

``` import torch as th import stnls

-- init --

B,T = 1,5 # batch size, number of frames F,H,W = 16,128,128 # number of features, height, width device = "cuda" qvid = th.randn((B,T,F,H,W),device=device) kvid = th.randn((B,T,F,H,W),device=device) v_vid = th.randn((B,T,F,H,W),device=device)

-- search info --

ws = 5 # spatial window size wt = 2 # temporal window size; searching total frames W_t = 2*wt+1 ps,K,HD = 3,10,2 # patch size, number of neighbors, number of heads stride0,stride1 = 1,0.5 # query & key stride

-- run search --

search = stnls.search.NonLocalSearch(ws,wt,ps,K,nheads=HD, stride0=stride0,stride1=stride1, selfaction="anchor",itype="float") dists,srchflows = search(qvid,kvid,flows)

print(srch_flows.shape) # B,HD,T,nH,nW,K,3; nH=(H-1)//stride0+1

-- normalize --

weights = th.nn.functional.softmax(10*dists,-1)

-- aggregate --

agg = stnls.agg.WeightedPatchSum(ps=ps,stride0=stride0,itype="float") Vout = agg(vvid,weights,srchflows) print("Vout.shape: ",V_out.shape) # B,T,F,H,W ```

Snippet of Results

Video Alignment

The Shifted Non-Local Search (Shifted-NLS) corrects the small spatial errors of predicted offsets such as optical flow. This section illustrates the significant impact of these small spatial errors through video alignment. This experiment uses the first 10 frames from the DAVIS training dataset. When searching and computing the TV-L1 optical flow, we add a small amount of Gaussian noise (σ = 15) to simulate the uncertainty of the trained query and key values of an attention module within a network during training

shifted nls

Upgrading Existing Space-Time Attention

We upgrade Guided Deformable Attention (GDA) with our Shifted Non-Local Search (Shifted-NLS) module to show the value of correcting the errors of predicted offsets for video denoising rvrt. GDA requires 9 offsets for each pixel in the image. In the original network, 9 offsets are output from a small convolution network whose input includes optical flow. Our method omits the small network and searches the local region surrounding the optical flow. In this experiment, our spatial window is 9x9 and the temporal window is fixed to 1 by architecture design. The most similar 9 locations are selected to replace the offsets from the network. Table 1 shows the denoising quality improves when using our search method compared to using predicted offsets. The improvement is between 0.20 - 0.40 dB across all noise levels, an increase often attributed to an entirely new architecture.

upgrading rvrt

Citation

If you find this work useful, please cite our paper:

bibtex @article{gauen2023space, title={Space-Time Attention with Shifted Non-Local Search}, author={Gauen, Kent and Chan, Stanley}, journal={arXiv}, year={2023} }

Owner

Name: Kent Gauen
Login: gauenk
Kind: user
Company: Purdue University

Website: http://gauenk.github.io/
Repositories: 20
Profile: https://github.com/gauenk

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Gauen"
  given-names: "Kent"
  orcid: "https://orcid.org/0000-0001-9582-8318"
- family-names: "Chan"
  given-names: "Stanley"
  orcid: "https://orcid.org/0000-0001-5876-2073"
title: "Space-Time Non-Local Search"
version: 1.0.0
doi: 10.48550/arXiv.2309.16849
date-released: 2023-12-04
url: "https://github.com/gauenk/stnls"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science