https://github.com/amazon-science/tubelet-transformer
This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Keywords
Repository
This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://openaccess.thecvf.com/content/CVPR2022/supplemental/Zhao_TubeR_Tubelet_Transformer_CVPR_2022_supplemental.pdf
- Size: 9.42 MB
Statistics
- Stars: 79
- Watchers: 1
- Forks: 20
- Open Issues: 15
- Releases: 0
Topics
Metadata Files
README.md
TubeR: Tubelet Transformer for Video Action Detection
This repo contains the supported code to reproduce spatio-temporal action detection results of TubeR: Tubelet Transformer for Video Action Detection.
Updates
08/08/2022 Initial commits
Results and Models
AVA 2.1 Dataset
| Backbone | Pretrain | #view | mAP | FLOPs | config | model | | :---: | :---: | :---: |:----:| :---: | :---: | :---: | | CSN-50 | Kinetics-400 | 1 view | 27.2 | 78G | config | S3 | | CSN-50 (with long-term context) | Kinetics-400 | 1 view | 28.8 | 78G | config | Comming soon | | CSN-152 | Kinetics-400+IG65M | 1 view | 29.7 | 120G | config | S3 | | CSN-152 (with long-term context) | Kinetics-400+IG65M | 1 view | 31.7 | 120G | config | Comming soon |
AVA 2.2 Dataset
| Backbone | Pretrain | #view | mAP | FLOPs | config | model | | :---: | :---: | :---: |:----:| :---: | :---: | :---: | | CSN-152 | Kinetics-400+IG65M | 1 view | 31.1 | 120G | config | S3 | | CSN-152 (with long-term context) | Kinetics-400+IG65M | 1 view | 33.4 | 120G | config | Comming soon |
JHMDB Dataset
| Backbone | #view | mAP@0.2 | mAP@0.5 | config | model | | :---: | :---: | :---: | :---: | :---: | :---: | | CSN-152 | 1 view | 87.4 | 82.3 | config | S3 |
Usage
The project is developed based on GluonCV-torch. Please refer to tutorial for details.
Dependency
The project is tested working on: - Torch 1.12 + CUDA 11.3 - timm==0.4.5 - tensorboardX
Dataset
Please download the asset.zip and unzip them at ./datasets.
[AVA] Please refer to DATASET.md for AVA dataset downloading and pre-processing. [JHMDB] Please refer to JHMDB for JHMDB dataset and Dataset Section for UCF dataset. You also can refer to ACT-Detector to prepare the two datasets.
Inference
To run inference, first modify the config file:
- set the correct WORLD_SIZE, GPU_WORLD_SIZE, DIST_URL, WOLRD_URLS based on experiment setup.
- set the LABEL_PATH, ANNO_PATH, DATA_PATH to your local directory accordingly.
- Download the pre-trained model and set PRETRAINED_PATH to model path.
- make sure LOAD and LOAD_FC are set to True
Then run: ```
run testing
python3 evaltuberava.py
for example, to evaluate ava from scratch, run:
python3 evaltuberava.py configuration/TubeRCSN152AVA21.yaml ```
Training
To train TubeR from scratch, first modify the configfile:
- set the correct WORLD_SIZE, GPU_WORLD_SIZE, DIST_URL, WOLRD_URLS based on experiment setup.
- set the LABEL_PATH, ANNO_PATH, DATA_PATH to your local directory accordingly.
- Download the pre-trained feature backbone and transformer weights and set PRETRAIN_BACKBONE_DIR (CSN50, CSN152), PRETRAIN_TRANSFORMER_DIR (DETR) accordingly.
- make sure LOAD and LOAD_FC are set to False
Then run: ```
run training from scratch
python3 traintuber.py <CONFIGFILE>
for example, to train ava from scratch, run:
python3 traintuberava.py configuration/TubeRCSN152AVA21.yaml ```
TODO
[ ]Add tutorial and pre-trained weights for TubeR with long-term memory
[ ]Add weights for UCF24
Citing TubeR
@inproceedings{zhao2022tuber,
title={TubeR: Tubelet transformer for video action detection},
author={Zhao, Jiaojiao and Zhang, Yanyi and Li, Xinyu and Chen, Hao and Shuai, Bing and Xu, Mingze and Liu, Chunhui and Kundu, Kaustav and Xiong, Yuanjun and Modolo, Davide and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={13598--13607},
year={2022}
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 11
- Issue comment event: 3
- Fork event: 2
Last Year
- Watch event: 11
- Issue comment event: 3
- Fork event: 2
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 38
- Total pull requests: 7
- Average time to close issues: 23 days
- Average time to close pull requests: about 10 hours
- Total issue authors: 16
- Total pull request authors: 2
- Average comments per issue: 1.39
- Average comments per pull request: 0.43
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AlexeyG (3)
- huang-chenhai (2)
- wenzhengzeng (2)
- lemonheadboy (1)
- jinsingsangsung (1)
- DanLuoNEU (1)
- quangtn266 (1)
- furqanabid412 (1)
- hongminglin08 (1)
- Tsunehiko (1)
- sqiangcao99 (1)
- DCBXZ66 (1)
- ykyk000 (1)
- sibonjia (1)
- yassouali (1)
Pull Request Authors
- coocoo90 (4)
- salmank255 (1)