https://github.com/amazon-science/long-short-term-transformer

[NeurIPS 2021 Spotlight] Official implementation of Long Short-Term Transformer for Online Action Detection

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Keywords

online-action-detection video-analysis video-transformer

Last synced: 9 months ago · JSON representation

Repository

[NeurIPS 2021 Spotlight] Official implementation of Long Short-Term Transformer for Online Action Detection

Basic Info

Host: GitHub
Owner: amazon-science
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 142 KB

Statistics

Stars: 132
Watchers: 7
Forks: 19
Open Issues: 13
Releases: 0

Topics

online-action-detection video-analysis video-transformer

Created over 4 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Contributing License Code of conduct

Long Short-Term Transformer for Online Action Detection

Introduction

This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

network

Environment

The code is developed with CUDA 10.2, Python >= 3.7.7, PyTorch >= 1.7.1

0. [Optional but recommended] create a new conda environment.
    ```
    conda create -n lstr python=3.7.7
    ```
    And activate the environment.
    ```
    conda activate lstr
    ```

1. Install the requirements
    ```
    pip install -r requirements.txt
    ```

Data Preparation

Option1: Prepare the features and targets by yourself.

Download the THUMOS'14 and TVSeries datasets.
Extract feature representations for video frames.

* For **ActivityNet** pretrained features, we use the [`ResNet-50`](https://arxiv.org/pdf/1512.03385.pdf) model for the RGB and optical flow inputs. We recommend to use this [`checkpoint`](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsn/README.md#activitynet-v13) in [`MMAction2`](https://github.com/open-mmlab/mmaction2).

* For **Kinetics** pretrained features, we use the [`ResNet-50`](https://arxiv.org/pdf/1512.03385.pdf) model for the RGB inputs. We recommend to use this [`checkpoint`](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsn/tsn_r50_320p_1x1x8_100e_kinetics400_rgb.py) in [`MMAction2`](https://github.com/open-mmlab/mmaction2). We use the [`BN-Inception`](https://arxiv.org/pdf/1502.03167.pdf) model for the optical flow inputs. We recommend to use the model [`here`](https://drive.google.com/drive/folders/1Q8yf2u8YWkva-apAxW_9_TzvLGuWZaix?usp=sharing).

***Note:*** We compute the optical flow using [`DenseFlow`](https://github.com/xumingze0308/denseflow).

If you want to use our dataloaders, please make sure to put the files as the following structure:

* THUMOS'14 dataset:
    ```
    $YOUR_PATH_TO_THUMOS_DATASET
    ├── rgb_kinetics_resnet50/
    |   ├── video_validation_0000051.npy (of size L x 2048)
    │   ├── ...
    ├── flow_kinetics_bninception/
    |   ├── video_validation_0000051.npy (of size L x 1024)
    |   ├── ...
    ├── target_perframe/
    |   ├── video_validation_0000051.npy (of size L x 22)
    |   ├── ...
    ```

* TVSeries dataset:
    ```
    $YOUR_PATH_TO_TVSERIES_DATASET
    ├── rgb_kinetics_resnet50/
    |   ├── Breaking_Bad_ep1.npy (of size L x 2048)
    │   ├── ...
    ├── flow_kinetics_bninception/
    |   ├── Breaking_Bad_ep1.npy (of size L x 1024)
    |   ├── ...
    ├── target_perframe/
    |   ├── Breaking_Bad_ep1.npy (of size L x 31)
    |   ├── ...
    ```

Create softlinks of datasets: cd long-short-term-transformer ln -s $YOUR_PATH_TO_THUMOS_DATASET data/THUMOS ln -s $YOUR_PATH_TO_TVSERIES_DATASET data/TVSeries

Option2: Directly download the pre-extracted features and targets from `TeSTra`.

If you want to skip the data preprocessing and quickly try LSTR, please refer to TeSTra. The features and targets there exactly follow LSTR's data structure and should be able to reproduce LSTR's performance. However, if you have any question about the processing of these features and targets, please contact the authors of TeSTra directly.

Training

Training LSTR with 512 seconds long-term memory and 8 seconds short-term memory requires less 3 GB GPU memory.

The commands are as follows.

``` cd long-short-term-transformer

Training from scratch

python tools/trainnet.py --configfile $PATHTOCONFIGFILE --gpu $CUDAVISIBLE_DEVICES

Finetuning from a pretrained model

python tools/trainnet.py --configfile $PATHTOCONFIGFILE --gpu $CUDAVISIBLEDEVICES \ MODEL.CHECKPOINT $PATHTO_CHECKPOINT ```

Online Inference

There are three kinds of evaluation methods in our code.

First, you can use the config SOLVER.PHASES "['train', 'test']" during training. This process devides each test video into non-overlapping samples, and makes prediction on the all the frames in the short-term memory as if they were the latest frame. Note that this evaluation result is not the final performance, since (1) for most of the frames, their short-term memory is not fully utlized and (2) for simplicity, samples in the boundaries are mostly ignored.

``` cd long-short-term-transformer

Inference along with training

python tools/trainnet.py --configfile $PATHTOCONFIGFILE --gpu $CUDAVISIBLE_DEVICES \ SOLVER.PHASES "['train', 'test']" ```
Second, you could run the online inference in batch mode. This process evaluates all video frames by considering each of them as the latest frame and filling the long- and short-term memories by tracing back in time. Note that this evaluation result matches the numbers reported in the paper, but batch mode cannot be further accelerated as descibed in paper's Sec 3.6. On the other hand, this mode can run faster when you use a large batch size, and we recomand to use it for performance benchmarking.

``` cd long-short-term-transformer

Online inference in batch mode

python tools/testnet.py --configfile $PATHTOCONFIGFILE --gpu $CUDAVISIBLEDEVICES \ MODEL.CHECKPOINT $PATHTOCHECKPOINT MODEL.LSTR.INFERENCEMODE batch ```
Third, you could run the online inference in stream mode. This process tests frame by frame along the entire video, from the beginning to the end. Note that this evaluation result matches the both LSTR's performance and runtime reported in the paper. It processes the entire video as LSTR is applied to real-world scenarios. However, currently it only supports to test one video at each time.

``` cd long-short-term-transformer

Online inference in stream mode

python tools/testnet.py --configfile $PATHTOCONFIGFILE --gpu $CUDAVISIBLEDEVICES \ MODEL.CHECKPOINT $PATHTOCHECKPOINT MODEL.LSTR.INFERENCEMODE stream DATA.TESTSESSIONSET "['$VIDEO_NAME']" ```

Evaluation

Evaluate LSTR's performance for online action detection using perframe mAP or mcAP.

cd long-short-term-transformer python tools/eval/eval_perframe --pred_scores_file $PRED_SCORES_FILE

Evaluate LSTR's performance at different action stages by evaluating each decile (ten-percent interval) of the video frames separately.

cd long-short-term-transformer python tools/eval/eval_perstage --pred_scores_file $PRED_SCORES_FILE

Citations

If you are using the data/code/model provided here in a publication, please cite our paper:

@inproceedings{xu2021long,
    title={Long Short-Term Transformer for Online Action Detection},
    author={Xu, Mingze and Xiong, Yuanjun and Chen, Hao and Li, Xinyu and Xia, Wei and Tu, Zhuowen and Soatto, Stefano},
    booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
    year={2021}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Science
Login: amazon-science
Kind: organization

Website: https://amazon.science
Twitter: AmazonScience
Repositories: 80
Profile: https://github.com/amazon-science

GitHub Events

Total

Issues event: 4
Watch event: 7
Issue comment event: 6

Last Year

Issues event: 4
Watch event: 7
Issue comment event: 6

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 48
Total pull requests: 7
Average time to close issues: about 1 month
Average time to close pull requests: about 13 hours
Total issue authors: 24
Total pull request authors: 1
Average comments per issue: 2.08
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 7

Past Year

Issues: 0
Pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 5

View more stats

Top Authors

Issue Authors

sqiangcao99 (3)
Prot-debug (2)
Echo0125 (2)
priyamdey (2)
007invictus (2)
568937537 (1)
junwenchen (1)
Quadwo (1)
ManuBenavent (1)
floriculture (1)
Chenhongchang (1)
wssxjtu (1)
takfate (1)
dqj5182 (1)
jbistanbul (1)

Pull Request Authors

dependabot[bot] (7)

Top Labels

Issue Labels

Pull Request Labels

dependencies (7)

Dependencies

requirements.txt pypi

opencv-python ==4.4.0.46
scikit-image ==0.16.2
scikit-learn ==0.23.1
torch ==1.7.1
torchvision ==0.8.2
tqdm ==4.44.1
yacs ==0.1.8

https://github.com/amazon-science/long-short-term-transformer

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Long Short-Term Transformer for Online Action Detection

Introduction

Environment

Data Preparation

Option1: Prepare the features and targets by yourself.

Option2: Directly download the pre-extracted features and targets from TeSTra.

Training

Training from scratch

Finetuning from a pretrained model

Online Inference

Inference along with training

Online inference in batch mode

Online inference in stream mode

Evaluation

Citations

Security

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Option2: Directly download the pre-extracted features and targets from `TeSTra`.