videoswin

Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling

https://github.com/innat/videoswin

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary

Keywords

keras tensorflow torch video-classification video-dataset

Last synced: 6 months ago · JSON representation ·

Repository

Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling

Basic Info

Host: GitHub
Owner: innat
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage: https://arxiv.org/abs/2106.13230
Size: 7.69 MB

Statistics

Stars: 34
Watchers: 2
Forks: 4
Open Issues: 2
Releases: 3

Topics

keras tensorflow torch video-classification video-dataset

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Video Swin Transformer

VideoSwin is a pure transformer based video modeling algorithm, attained top accuracy on the major video recognition benchmarks. In this model, the author advocates an inductive bias of locality in video transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models.

This is a unofficial Keras 3 implementation of Video Swin transformers. The official PyTorch implementation is here based on mmaction2. The official PyTorch weight has been converted to Keras 3 compatible. This implementaiton supports to run the model on multiple backend, i.e. TensorFlow, PyTorch, and Jax. However, to work with tensorflow.keras, check the tfkeras branch.

Install

python !git clone https://github.com/innat/VideoSwin.git %cd VideoSwin !pip install -e .

Checkpoints

The VideoSwin checkpoints are available in .weights.h5 for Kinetrics 400/600 and Something Something V2 datasets. The variants of this models are tiny, small, and base. Check model zoo page to know details of it.

Inference

A sample usage is shown below with a pretrained weight. We can pick any backend, i.e. tensorflow, torch or jax.

```python import os import torch os.environ["KERAS_BACKEND"] = "torch" # or any backend. from videoswin import VideoSwinT

def vswintiny(): !wget https://github.com/innat/VideoSwin/releases/download/v2.0/videoswintinykinetics400classifier.weights.h5 -q

model = VideoSwinT(
    num_classes=400,
    include_rescaling=False,
    activation=None
)
model.load_weights(
    'videoswin_tiny_kinetics400_classifier.weights.h5'
)
return model

model = vswintiny() container = readvideo('sample.mp4') frames = framesampling(container, numframes=32) ypred = model(frames) ypred.shape # [1, 400]

probabilities = torch.nn.functional.softmax(ypred).detach().numpy() probabilities = probabilities.squeeze(0) confidences = { labelmap_inv[i]: float(probabilities[i]) \ for i in np.argsort(probabilities)[::-1] } confidences ``` A classification results on a sample from Kinetics-400.

| Video | Top-5 | |:---:|:---| | |

{
    'playingcello': 0.9941741824150085,
    'playingviolin': 0.0016851733671501279,
    'playingrecorder': 0.0011555481469258666,
    'playingclarinet': 0.0009695519111119211,
    'playing_harp': 0.0007713600643910468
}

To get the backbone of video swin, we can pass include_top=False params to exclude the classification layer. For example:

```python from videoswin.backbone import VideoSwinBackbone

backbone = VideoSwinT( includetop=False, inputshape=(32, 224, 224, 3) ) ```

Or, we use use the VideoSwinBackbone API directly from from videoswin.backbone.

Arbitrary Input Shape

By default, the video swin officially is trained with input shape of 32, 224, 224, 3. But, We can load the model with different shape. And also load the pretrained weight partially.

python model = VideoSwinT( input_shape=(8, 224, 256, 3), include_rescaling=False, num_classes=10, ) model.load_weights('...weights.h5', skip_mismatch=True)

Guides

Citation

If you use this videoswin implementation in your research, please cite it using the metadata from our CITATION.cff file, along with the literature.

bash @article{liu2021video, title={Video Swin Transformer}, author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han}, journal={arXiv preprint arXiv:2106.13230}, year={2021} }

Owner

Name: Mohammed Innat
Login: innat
Kind: user
Location: Dhaka, Bangladesh
Company: 株式会社調和技研 | CHOWA GIKEN Corp

Website: https://www.linkedin.com/in/innat2k14/
Twitter: m_innat
Repositories: 139
Profile: https://github.com/innat

AI Research Software Engineer | Kaggler

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: videoswin-keras
message: >-
  If you use this implementation, please cite it using the  
  metadata from this file.
type: software
authors:
  - given-names: Mohammed
    family-names: Innat
    email: innat.dev@gmail.com
identifiers:
  - type: url
    value: 'https://github.com/innat/VideoSwin'
    description: Keras 3 implementation of VideoSwin
keywords:
  - software
license: Apache License
version: 2.0.0
date-released: '2024-04-06'

GitHub Events

Total

Issues event: 6
Watch event: 10
Issue comment event: 1
Push event: 1

Last Year

Issues event: 6
Watch event: 10
Issue comment event: 1
Push event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: about 1 month
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.33
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 0
Average time to close issues: about 1 month
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.33
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

innat (4)
bitabaroutian (1)
Yousra-Regaya (1)

Pull Request Authors

innat (1)

Top Labels

Issue Labels

enhancement (1) good first issue (1) type:feature (1) feature reviewing (1)

Pull Request Labels

enhancement (1)

Dependencies

requirements.txt pypi

opencv-python >=4.1.2
tensorflow >=2.12

setup.py pypi

opencv-python >=4.1.2
tensorflow >=2.12

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science