https://github.com/cyrilzakka/mae3d
Masked Auto-Encoding for Large Scale Pretraining of Video Data
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
Masked Auto-Encoding for Large Scale Pretraining of Video Data
Basic Info
- Host: GitHub
- Owner: cyrilzakka
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.01 MB
Statistics
- Stars: 18
- Watchers: 3
- Forks: 0
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Masked Autoencoders As Spatiotemporal Learners
This is an unofficial PyTorch/GPU implementation of Masked Autoencoders As Spatiotemporal Learners
@Article{STMaskedAutoencoders2022,
author = {Feichtenhofer, Christoph and Fan, Haoqi and Li, Yanghao and He, Kaiming},
journal = {arXiv:2205.09113},
title = {Masked Autoencoders As Spatiotemporal Learners},
year = {2022},
}
Getting Started
This repository runs on PyTorch 11.1 and above. To get started, clone the repository and install the required dependencies:
$ git clone https://github.com/cyrilzakka/MAE3D
$ cd MAE3D
$ pip install -r requirements.txt
Optionally, install wandb for training visualization:
$ pip install wandb
Pretraining
Dataset Preparation
In order to perform large-scale pre-training, your data should be organized in the following way:
dataset
│
├───ledger.csv
└───train
├───video_1
│ ├───img_00001.jpg
│ .
│ └───img_03117.jpg
├───video_2
│ ├───img_00001.jpg
│ .
│ └───img_02744.jpg
└───video_3
├───img_00001.jpg
.
└───img_0323.jpg
with the accompanying ledger.csv containing rows listing the video_name, start_frame, end_frame and class/pseudoclass:
video_1 1 3117 1
video_2 1 2744 0
video_3 1 323 0
Dataloader
Fast and efficient loading of video data for training is done using the VideoFrameDataset library:
python
dataset_train = VideoFrameDataset(root_path:str, annotationfile_path:str, num_segments:int, frames_per_segment:int, transform:None, test_mode:bool)
where each video is split into even num_segments, from which a random start index is sampled and frames_per_segment consecutive frames are loaded.
Training
To train with the default --model vit_large_patch16 for --epochs 400 and a --batch_size 8 at an --input_size 224 run:
$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py
More training options and parameters can be viewed and modified in main_pretrain.py.
Visualization
A visualization of MAE-3D can be found in the included interactive notebook.
License
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
Owner
- Name: Cyril Zakka, MD
- Login: cyrilzakka
- Kind: user
- Location: Palo Alto, California
- Company: @hiesingerlab
- Website: https://cyrilzakka.github.io
- Twitter: cyrilzakka
- Repositories: 4
- Profile: https://github.com/cyrilzakka
Medical Doctor, Postdoctoral Fellow at Stanford Medicine.
GitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 4
- Total pull requests: 1
- Average time to close issues: 5 days
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- likemby (3)
- klinic (1)
Pull Request Authors
- akashc1 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- einops *
- pandas *
- timm *
- wandb *