videoswin
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
Repository
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
Basic Info
- Host: GitHub
- Owner: innat
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://arxiv.org/abs/2106.13230
- Size: 7.69 MB
Statistics
- Stars: 34
- Watchers: 2
- Forks: 4
- Open Issues: 2
- Releases: 3
Topics
Metadata Files
README.md
Video Swin Transformer

VideoSwin is a pure transformer based video modeling algorithm, attained top accuracy on the major video recognition benchmarks. In this model, the author advocates an inductive bias of locality in video transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models.
This is a unofficial Keras 3 implementation of Video Swin transformers. The official PyTorch implementation is here based on mmaction2. The official PyTorch weight has been converted to Keras 3 compatible. This implementaiton supports to run the model on multiple backend, i.e. TensorFlow, PyTorch, and Jax. However, to work with tensorflow.keras, check the tfkeras branch.
Install
python
!git clone https://github.com/innat/VideoSwin.git
%cd VideoSwin
!pip install -e .
Checkpoints
The VideoSwin checkpoints are available in .weights.h5 for Kinetrics 400/600 and Something Something V2 datasets. The variants of this models are tiny, small, and base. Check model zoo page to know details of it.
Inference
A sample usage is shown below with a pretrained weight. We can pick any backend, i.e. tensorflow, torch or jax.
```python import os import torch os.environ["KERAS_BACKEND"] = "torch" # or any backend. from videoswin import VideoSwinT
def vswintiny(): !wget https://github.com/innat/VideoSwin/releases/download/v2.0/videoswintinykinetics400classifier.weights.h5 -q
model = VideoSwinT(
num_classes=400,
include_rescaling=False,
activation=None
)
model.load_weights(
'videoswin_tiny_kinetics400_classifier.weights.h5'
)
return model
model = vswintiny() container = readvideo('sample.mp4') frames = framesampling(container, numframes=32) ypred = model(frames) ypred.shape # [1, 400]
probabilities = torch.nn.functional.softmax(ypred).detach().numpy() probabilities = probabilities.squeeze(0) confidences = { labelmap_inv[i]: float(probabilities[i]) \ for i in np.argsort(probabilities)[::-1] } confidences ``` A classification results on a sample from Kinetics-400.
| Video | Top-5 |
|:---:|:---|
|
|
{
'playingcello': 0.9941741824150085,
'playingviolin': 0.0016851733671501279,
'playingrecorder': 0.0011555481469258666,
'playingclarinet': 0.0009695519111119211,
'playing_harp': 0.0007713600643910468
} |
To get the backbone of video swin, we can pass include_top=False params to exclude the classification layer. For example:
```python from videoswin.backbone import VideoSwinBackbone
backbone = VideoSwinT( includetop=False, inputshape=(32, 224, 224, 3) ) ```
Or, we use use the VideoSwinBackbone API directly from from videoswin.backbone.
Arbitrary Input Shape
By default, the video swin officially is trained with input shape of 32, 224, 224, 3. But, We can load the model with different shape. And also load the pretrained weight partially.
python
model = VideoSwinT(
input_shape=(8, 224, 256, 3),
include_rescaling=False,
num_classes=10,
)
model.load_weights('...weights.h5', skip_mismatch=True)
Guides
- Comparison of Keras 3 implementaiton VS Official PyTorch implementaiton.
- Full Evaluation on Kinetics 400 Test Set using PyTorch backend
- Fine tune with TensorFlow backend.
- Fine tune with Jax backend
- Fine tune with native PyTorch backend
- Fine tune with PyTorch Lightening
- Convert to ONNX Format
Citation
If you use this videoswin implementation in your research, please cite it using the metadata from our CITATION.cff file, along with the literature.
bash
@article{liu2021video,
title={Video Swin Transformer},
author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
journal={arXiv preprint arXiv:2106.13230},
year={2021}
}
Owner
- Name: Mohammed Innat
- Login: innat
- Kind: user
- Location: Dhaka, Bangladesh
- Company: 株式会社 調和技研 | CHOWA GIKEN Corp
- Website: https://www.linkedin.com/in/innat2k14/
- Twitter: m_innat
- Repositories: 139
- Profile: https://github.com/innat
AI Research Software Engineer | Kaggler
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: videoswin-keras
message: >-
If you use this implementation, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Mohammed
family-names: Innat
email: innat.dev@gmail.com
identifiers:
- type: url
value: 'https://github.com/innat/VideoSwin'
description: Keras 3 implementation of VideoSwin
keywords:
- software
license: Apache License
version: 2.0.0
date-released: '2024-04-06'
GitHub Events
Total
- Issues event: 6
- Watch event: 10
- Issue comment event: 1
- Push event: 1
Last Year
- Issues event: 6
- Watch event: 10
- Issue comment event: 1
- Push event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- innat (4)
- bitabaroutian (1)
- Yousra-Regaya (1)
Pull Request Authors
- innat (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- opencv-python >=4.1.2
- tensorflow >=2.12
- opencv-python >=4.1.2
- tensorflow >=2.12