Recent Releases of videomae
videomae - v1.1
TensorFlow SavedModel formet weights. Details.
- Jupyter Notebook
Published by innat over 2 years ago
videomae - v1.0
This is a keras implementation of VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training model. The pre trained and fine tuned weights are ported from official pytorch model. Following are the list of all available model in .h5 format. It includes both pre-trained and fine-tuned models.
Naming style for these model is: TFVideoMAE_{size}_{dataset}_{input_frame}x{input_size}_FT/PT. Here, size represent base, small, large and huge for the available models. The PT or pre-trained is the video masked autoencoder model, trained with self-supervised manner and FT or fine-tuned is the encoder part of PT + task specific classification head. For the downstream task, the benchmark dataset are used, i.e. Kinetics-400, Something-Something-V2, and UCF101.
In keras implementation, these models are available in SavedModel and h5 format, check release page of v.1.1 for other checkpoints. Please note, the officially, for Kinetics-400, there is another huge model size variant is available. But the official PT version is corrumpted, https://github.com/MCG-NJU/VideoMAE/issues/89. And the FT is size of above 2GB, makes it unable to upload here, but it can be found here.
| Model Name | arch | params | |-------------------------------------|----------|----------| | TFVideoMAESK40016x224FT.h5 | encoder | 22 MB | | TFVideoMAESK40016x224PT.h5 | encoder + decoder | 24 MB | | TFVideoMAEBK40016x224FT.h5 | encoder | 87 MB | | TFVideoMAEBK40016x224PT.h5 | encoder + decoder | 94 MB | | TFVideoMAELK40016x224FT.h5 | encoder | 304 MB | | TFVideoMAELK40016x224PT.h5 | encoder + decoder | 343 MB | | TFVideoMAESSSv216x224FT.h5 | encoder | 22 MB | | TFVideoMAESSSv216x224PT.h5 | encoder + decoder |24 MB | | TFVideoMAEBSSv216x224FT.h5 | encoder | 86 MB | | TFVideoMAEBSSv216x224PT.h5 | encoder + decoder | 94 MB | | TFVideoMAEBUCF16x224FT.h5 | encoder | 86 MB | | TFVideoMAEBUCF16x224PT.h5 | encoder + decoder | 94 MB |
- Jupyter Notebook
Published by innat over 2 years ago