uniformerv2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

https://github.com/innat/uniformerv2

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

Basic Info
  • Host: GitHub
  • Owner: innat
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 2.99 MB
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 3
  • Open Issues: 1
  • Releases: 2
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

UniFormerV2

Palestine

arXiv keras-2.12. Open In Colab HugginFace badge HugginFace badge

UniFormerV2, a generic paradigm to build a powerful family of video networks, by arming the pre-trained ViTs with efficient UniFormer designs. It gets the state-of-the-art recognition performance on 8 popular video benchmarks, including scene-related Kinetics-400/600/700 and Moments in Time, temporal-related Something-Something V1/V2, untrimmed ActivityNet and HACS. In particular, it is the first model to achieve 90% top-1 accuracy on Kinetics-400.

This is unofficial keras implementation of UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.. The official PyTorch code is here.

News

  • [24-10-2023]: Kinetics-400 test data set can be found on kaggle, link.
  • [20-10-2023]: GPU(s), TPU-VM for fine-tune training are supported, colab.
  • [19-10-2023]: UFV2 checkpoints for HACS becomes available, link.
  • [19-10-2023]: UFV2 checkpoints for ActivityNet becomes available, link.
  • [18-10-2023]: UFV2 checkpoints for Moments in Time becomes available, link.
  • [18-10-2023]: UFV2 checkpoints for K710 becomes available, link.
  • [17-10-2023]: UFV2 checkpoints for SSV2 becomes available, link.
  • [17-10-2023]: UFV2 checkpoints for Kinetics-600/700 becomes available, link.
  • [16-10-2023]: UFV2 checkpoints for Kinetics-400 becomes available, link.
  • [15-10-2023]: Code of UniFormerV2 (UFV2) in Keras becomes available.

Install

python git clone https://github.com/innat/UniFormerV2.git cd UniFormerV2 pip install -e .

Usage

The UniFormerV2 checkpoints are available in both SavedModel and H5 formats on total 8 datasets, i.e. Kinetics-400/600/700/710, Something Something V2, Moments in Time V1, ActivityNet and HACS. The variants of this models are base and large. Each variants may have further variation for different number of input size and input frame. That gives around 35 checkpoints for UniFormerV2. Check this release and model zoo page to know details of it. Also check model_configs.py to get overall looks of avaiable model config. Following are some hightlights.

Inference

```python from uniformerv2 import UniFormerV2

model = UniFormerV2(name='K400B168x224') model.loadweights('TFUniFormerV2K400B168x224.h5') container = readvideo('sample.mp4') frames = framesampling(container, num_frames=8) y = model(frames) y.shape TensorShape([1, 400])

probabilities = tf.nn.softmax(ypredtf) probabilities = probabilities.numpy().squeeze(0) confidences = { labelmapinv[i]: float(probabilities[i]) \ for i in np.argsort(probabilities)[::-1] } confidences ```

A classification results on a sample from Kinetics-400.

| Video | Top-5 | |:------------------------------:|:-----| | |

{
'playing cello': 0.9992249011,
'playing violin': 0.00016990336,
'playing clarinet': 6.66150512e-05,
'playing harp': 4.858616014e-05,
'playing bass guitar': 2.0927140212e-05
}
|

Fine Tune

Each uniformerv2 checkpoints returns logits. We can just add a custom classifier on top of it. A sample view is shown below. See the above notebook for more details.

```python from uniformerv2 import UniFormerV2

import pretrained model, i.e.

modelname = 'ANETL1416x224' uniformerv2 = UniFormerV2(name=modelname) uniformerv2.loadweights(f'TFUniFormerV2{modelname}.h5') uniformerv2.trainable = False

downstream model

model = keras.Sequential([ uniformerv2, layers.Dense( len(classfolders), dtype='float32', activation=None ) ]) model.compile(...) model.fit(...) model.predict(...)

```

Model Zoo

The uniformer-v2 checkpoints are listed in MODEL_ZOO.md.

TODO

  • [x] Custom fine-tuning code.
  • [ ] Publish on TF-Hub.
  • [ ] Support Keras V3 to support multi-framework backend.

Citation

If you use this uniformerv2 implementation in your research, please cite it using the metadata from our CITATION.cff file.

swift @misc{li2022uniformerv2, title={UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer}, author={Kunchang Li and Yali Wang and Yinan He and Yizhuo Li and Yi Wang and Limin Wang and Yu Qiao}, year={2022}, eprint={2211.09552}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Owner

  • Name: Mohammed Innat
  • Login: innat
  • Kind: user
  • Location: Dhaka, Bangladesh
  • Company: 株式会社 調和技研 | CHOWA GIKEN Corp

AI Research Software Engineer | Kaggler

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: uniformerv2-keras
message: >-
  If you use this implementation, please cite it using the  
  metadata from this file
type: software
authors:
  - given-names: Mohammed
    family-names: Innat
    email: innat.dev@gmail.com
identifiers:
  - type: url
    value: 'https://github.com/innat/UniFormerV2'
    description: Keras reimplementation of UniFormerV2
keywords:
  - software
license: MIT
version: 1.0.0
date-released: '2023-10-20' 

GitHub Events

Total
  • Watch event: 1
  • Fork event: 2
Last Year
  • Watch event: 1
  • Fork event: 2

Dependencies

requirements.txt pypi
  • decord *
  • opencv-python >=4.1.2
  • tensorflow >=2.12
setup.py pypi
  • opencv-python >=4.1.2
  • tensorflow >=2.12