videoswin

Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling

https://github.com/innat/videoswin

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

keras tensorflow torch video-classification video-dataset
Last synced: 6 months ago · JSON representation ·

Repository

Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling

Basic Info
Statistics
  • Stars: 34
  • Watchers: 2
  • Forks: 4
  • Open Issues: 2
  • Releases: 3
Topics
keras tensorflow torch video-classification video-dataset
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Video Swin Transformer

Palestine

arXiv keras-3 Static Badge Static Badge Static Badge Open In Colab HugginFace badge HugginFace badge

VideoSwin is a pure transformer based video modeling algorithm, attained top accuracy on the major video recognition benchmarks. In this model, the author advocates an inductive bias of locality in video transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models.

This is a unofficial Keras 3 implementation of Video Swin transformers. The official PyTorch implementation is here based on mmaction2. The official PyTorch weight has been converted to Keras 3 compatible. This implementaiton supports to run the model on multiple backend, i.e. TensorFlow, PyTorch, and Jax. However, to work with tensorflow.keras, check the tfkeras branch.

Install

python !git clone https://github.com/innat/VideoSwin.git %cd VideoSwin !pip install -e .

Checkpoints

The VideoSwin checkpoints are available in .weights.h5 for Kinetrics 400/600 and Something Something V2 datasets. The variants of this models are tiny, small, and base. Check model zoo page to know details of it.

Inference

A sample usage is shown below with a pretrained weight. We can pick any backend, i.e. tensorflow, torch or jax.

```python import os import torch os.environ["KERAS_BACKEND"] = "torch" # or any backend. from videoswin import VideoSwinT

def vswintiny(): !wget https://github.com/innat/VideoSwin/releases/download/v2.0/videoswintinykinetics400classifier.weights.h5 -q

model = VideoSwinT(
    num_classes=400,
    include_rescaling=False,
    activation=None
)
model.load_weights(
    'videoswin_tiny_kinetics400_classifier.weights.h5'
)
return model

model = vswintiny() container = readvideo('sample.mp4') frames = framesampling(container, numframes=32) ypred = model(frames) ypred.shape # [1, 400]

probabilities = torch.nn.functional.softmax(ypred).detach().numpy() probabilities = probabilities.squeeze(0) confidences = { labelmap_inv[i]: float(probabilities[i]) \ for i in np.argsort(probabilities)[::-1] } confidences ``` A classification results on a sample from Kinetics-400.

| Video | Top-5 | |:---:|:---| | |

{
'playingcello': 0.9941741824150085,
'playing
violin': 0.0016851733671501279,
'playingrecorder': 0.0011555481469258666,
'playing
clarinet': 0.0009695519111119211,
'playing_harp': 0.0007713600643910468
}
|

To get the backbone of video swin, we can pass include_top=False params to exclude the classification layer. For example:

```python from videoswin.backbone import VideoSwinBackbone

backbone = VideoSwinT( includetop=False, inputshape=(32, 224, 224, 3) ) ```

Or, we use use the VideoSwinBackbone API directly from from videoswin.backbone.

Arbitrary Input Shape

By default, the video swin officially is trained with input shape of 32, 224, 224, 3. But, We can load the model with different shape. And also load the pretrained weight partially.

python model = VideoSwinT( input_shape=(8, 224, 256, 3), include_rescaling=False, num_classes=10, ) model.load_weights('...weights.h5', skip_mismatch=True)

Guides

  1. Comparison of Keras 3 implementaiton VS Official PyTorch implementaiton.
  2. Full Evaluation on Kinetics 400 Test Set using PyTorch backend
  3. Fine tune with TensorFlow backend.
  4. Fine tune with Jax backend
  5. Fine tune with native PyTorch backend
  6. Fine tune with PyTorch Lightening
  7. Convert to ONNX Format

Citation

If you use this videoswin implementation in your research, please cite it using the metadata from our CITATION.cff file, along with the literature.

bash @article{liu2021video, title={Video Swin Transformer}, author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han}, journal={arXiv preprint arXiv:2106.13230}, year={2021} }

Owner

  • Name: Mohammed Innat
  • Login: innat
  • Kind: user
  • Location: Dhaka, Bangladesh
  • Company: 株式会社 調和技研 | CHOWA GIKEN Corp

AI Research Software Engineer | Kaggler

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: videoswin-keras
message: >-
  If you use this implementation, please cite it using the  
  metadata from this file.
type: software
authors:
  - given-names: Mohammed
    family-names: Innat
    email: innat.dev@gmail.com
identifiers:
  - type: url
    value: 'https://github.com/innat/VideoSwin'
    description: Keras 3 implementation of VideoSwin
keywords:
  - software
license: Apache License
version: 2.0.0
date-released: '2024-04-06'

GitHub Events

Total
  • Issues event: 6
  • Watch event: 10
  • Issue comment event: 1
  • Push event: 1
Last Year
  • Issues event: 6
  • Watch event: 10
  • Issue comment event: 1
  • Push event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • innat (4)
  • bitabaroutian (1)
  • Yousra-Regaya (1)
Pull Request Authors
  • innat (1)
Top Labels
Issue Labels
enhancement (1) good first issue (1) type:feature (1) feature reviewing (1)
Pull Request Labels
enhancement (1)

Dependencies

requirements.txt pypi
  • opencv-python >=4.1.2
  • tensorflow >=2.12
setup.py pypi
  • opencv-python >=4.1.2
  • tensorflow >=2.12