uniformerv2
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Repository
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Basic Info
- Host: GitHub
- Owner: innat
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 2.99 MB
Statistics
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 1
- Releases: 2
Metadata Files
README.md
UniFormerV2

UniFormerV2, a generic paradigm to build a powerful family of video networks, by arming the pre-trained ViTs with efficient UniFormer designs. It gets the state-of-the-art recognition performance on 8 popular video benchmarks, including scene-related Kinetics-400/600/700 and Moments in Time, temporal-related Something-Something V1/V2, untrimmed ActivityNet and HACS. In particular, it is the first model to achieve 90% top-1 accuracy on Kinetics-400.
This is unofficial keras implementation of UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.. The official PyTorch code is here.
News
- [24-10-2023]: Kinetics-400 test data set can be found on kaggle, link.
- [20-10-2023]: GPU(s), TPU-VM for fine-tune training are supported, colab.
- [19-10-2023]: UFV2 checkpoints for HACS becomes available, link.
- [19-10-2023]: UFV2 checkpoints for ActivityNet becomes available, link.
- [18-10-2023]: UFV2 checkpoints for Moments in Time becomes available, link.
- [18-10-2023]: UFV2 checkpoints for K710 becomes available, link.
- [17-10-2023]: UFV2 checkpoints for SSV2 becomes available, link.
- [17-10-2023]: UFV2 checkpoints for Kinetics-600/700 becomes available, link.
- [16-10-2023]: UFV2 checkpoints for Kinetics-400 becomes available, link.
- [15-10-2023]: Code of UniFormerV2 (UFV2) in Keras becomes available.
Install
python
git clone https://github.com/innat/UniFormerV2.git
cd UniFormerV2
pip install -e .
Usage
The UniFormerV2 checkpoints are available in both SavedModel and H5 formats on total 8 datasets, i.e. Kinetics-400/600/700/710, Something Something V2, Moments in Time V1, ActivityNet and HACS. The variants of this models are base and large. Each variants may have further variation for different number of input size and input frame. That gives around 35 checkpoints for UniFormerV2. Check this release and model zoo page to know details of it. Also check model_configs.py to get overall looks of avaiable model config. Following are some hightlights.
Inference
```python from uniformerv2 import UniFormerV2
model = UniFormerV2(name='K400B168x224') model.loadweights('TFUniFormerV2K400B168x224.h5') container = readvideo('sample.mp4') frames = framesampling(container, num_frames=8) y = model(frames) y.shape TensorShape([1, 400])
probabilities = tf.nn.softmax(ypredtf) probabilities = probabilities.numpy().squeeze(0) confidences = { labelmapinv[i]: float(probabilities[i]) \ for i in np.argsort(probabilities)[::-1] } confidences ```
A classification results on a sample from Kinetics-400.
| Video | Top-5 |
|:------------------------------:|:-----|
|
|
{
'playing cello': 0.9992249011,
'playing violin': 0.00016990336,
'playing clarinet': 6.66150512e-05,
'playing harp': 4.858616014e-05,
'playing bass guitar': 2.0927140212e-05
} |
Fine Tune
Each uniformerv2 checkpoints returns logits. We can just add a custom classifier on top of it. A sample view is shown below. See the above notebook for more details.
```python from uniformerv2 import UniFormerV2
import pretrained model, i.e.
modelname = 'ANETL1416x224' uniformerv2 = UniFormerV2(name=modelname) uniformerv2.loadweights(f'TFUniFormerV2{modelname}.h5') uniformerv2.trainable = False
downstream model
model = keras.Sequential([ uniformerv2, layers.Dense( len(classfolders), dtype='float32', activation=None ) ]) model.compile(...) model.fit(...) model.predict(...)
```
Model Zoo
The uniformer-v2 checkpoints are listed in MODEL_ZOO.md.
TODO
- [x] Custom fine-tuning code.
- [ ] Publish on TF-Hub.
- [ ] Support
Keras V3to support multi-framework backend.
Citation
If you use this uniformerv2 implementation in your research, please cite it using the metadata from our CITATION.cff file.
swift
@misc{li2022uniformerv2,
title={UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer},
author={Kunchang Li and Yali Wang and Yinan He and Yizhuo Li and Yi Wang and Limin Wang and Yu Qiao},
year={2022},
eprint={2211.09552},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Owner
- Name: Mohammed Innat
- Login: innat
- Kind: user
- Location: Dhaka, Bangladesh
- Company: 株式会社 調和技研 | CHOWA GIKEN Corp
- Website: https://www.linkedin.com/in/innat2k14/
- Twitter: m_innat
- Repositories: 139
- Profile: https://github.com/innat
AI Research Software Engineer | Kaggler
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: uniformerv2-keras
message: >-
If you use this implementation, please cite it using the
metadata from this file
type: software
authors:
- given-names: Mohammed
family-names: Innat
email: innat.dev@gmail.com
identifiers:
- type: url
value: 'https://github.com/innat/UniFormerV2'
description: Keras reimplementation of UniFormerV2
keywords:
- software
license: MIT
version: 1.0.0
date-released: '2023-10-20'
GitHub Events
Total
- Watch event: 1
- Fork event: 2
Last Year
- Watch event: 1
- Fork event: 2
Dependencies
- decord *
- opencv-python >=4.1.2
- tensorflow >=2.12
- opencv-python >=4.1.2
- tensorflow >=2.12