videoswin - v2.0

Summary

Keras 3 implementation of Video Swin Transformer. The official PyTorch weight has been converted to Keras 3 compatible. This implementaiton supports to run the model on multiple backend, i.e. TensorFlow, PyTorch, and Jax.

Full Changelog: https://github.com/innat/VideoSwin/compare/v1.1...v2.0

- Jupyter Notebook
Published by innat about 2 years ago

videoswin - v1.1

TensorFlow SavedModel formet weights. Details.

- Jupyter Notebook
Published by innat over 2 years ago

videoswin - v1.0

Checkpoints of VideoSwin in Keras

Checkpoints of VideoSwin: Video Swin Transformer model in keras. The pretrained weights are ported from official pytorch model. Following are the list of all available model in .h5 format.

Checkpoint Naming Style

For the variation and brevity, the general format is:

``python dataset = 'K400' # K400, SSV2 pretrained_dataset = 'IN1K' # 'IN1K', 'IN22K size = 'B' # 'B', 'L' patchsize = (2,4,4) windowsize=(8,7,7) # (8,7,7), (16,7,7) numframes = 32 inputsize = 224

checkpointname = ( f'TFVideoSwin{size}' f'{dataset}' f'{datasetext + ""' f'P{patchsize}' f'W{windowsize}' f'{numframes}x{inputsize}.h5' ) checkpointname TFUniFormerV2K400K710L14_32x224.h5 ```

Here, size represents tiny, small, and base. The pretrained_dataset refers the initialized pretrained weights while training the video swin model. For example, IN22K or ImageNet 22K pretrained 2D swin image models are used to initialize in 3D video swin model. The dataset refers the benchmark dataset, i.e., Kinetics, Something-Something-V2. The patch_size and window_size refer the internal parameter of model architecture. The input_frame and input_size for video-swin is 32 and 224 respectively. In keras implementation, the checkpoints are also available in SavedModel and h5 format. Check release page of v.1.1 for the SavedModel checkpoints.

| Model Name | |-------------------------------------| | TFVideoSwinTK400IN1KP244W87732x224.h5 | | TFVideoSwinSK400IN1KP244W87732x224.h5 | | TFVideoSwinBSSV2K400P244W167732x224.h5 | | TFVideoSwinBK600IN22KP244W87732x224.h5 | | TFVideoSwinBK400IN22KP244W87732x224.h5 | | TFVideoSwinBK400IN1KP244W87732x224.h5 |

Here, IN1K and IN22K refer to ImageNet 1K and ImageNet 22K. The P244 refers to patch_size of [2,4,4] and W877 refers to window_size of [8,7,7]. All these models give logit as output that makes it easy to add custom head on top of it for downstream task further. Check the notebook.

- Jupyter Notebook
Published by innat over 2 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of videoswin

videoswin - v2.0

Summary

videoswin - v1.1

videoswin - v1.0

Checkpoints of VideoSwin in Keras