https://github.com/cyrilzakka/mae3d

Masked Auto-Encoding for Large Scale Pretraining of Video Data

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Masked Auto-Encoding for Large Scale Pretraining of Video Data

Basic Info

Host: GitHub
Owner: cyrilzakka
License: other
Language: Jupyter Notebook
Default Branch: main
Size: 1.01 MB

Statistics

Stars: 18
Watchers: 3
Forks: 0
Open Issues: 2
Releases: 0

Created about 4 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

Masked Autoencoders As Spatiotemporal Learners

3D MAE Concept

This is an unofficial PyTorch/GPU implementation of Masked Autoencoders As Spatiotemporal Learners

@Article{STMaskedAutoencoders2022, author = {Feichtenhofer, Christoph and Fan, Haoqi and Li, Yanghao and He, Kaiming}, journal = {arXiv:2205.09113}, title = {Masked Autoencoders As Spatiotemporal Learners}, year = {2022}, }

Getting Started

This repository runs on PyTorch 11.1 and above. To get started, clone the repository and install the required dependencies: $ git clone https://github.com/cyrilzakka/MAE3D $ cd MAE3D $ pip install -r requirements.txt Optionally, install wandb for training visualization: $ pip install wandb

Pretraining

Dataset Preparation

In order to perform large-scale pre-training, your data should be organized in the following way: dataset │ ├───ledger.csv └───train ├───video_1 │ ├───img_00001.jpg │ . │ └───img_03117.jpg ├───video_2 │ ├───img_00001.jpg │ . │ └───img_02744.jpg └───video_3 ├───img_00001.jpg . └───img_0323.jpg with the accompanying ledger.csv containing rows listing the video_name, start_frame, end_frame and class/pseudoclass: video_1 1 3117 1 video_2 1 2744 0 video_3 1 323 0

Dataloader

Fast and efficient loading of video data for training is done using the VideoFrameDataset library:

python dataset_train = VideoFrameDataset(root_path:str, annotationfile_path:str, num_segments:int, frames_per_segment:int, transform:None, test_mode:bool) where each video is split into even num_segments, from which a random start index is sampled and frames_per_segment consecutive frames are loaded.

Training

To train with the default --model vit_large_patch16 for --epochs 400 and a --batch_size 8 at an --input_size 224 run: $ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py More training options and parameters can be viewed and modified in main_pretrain.py.

Visualization

A visualization of MAE-3D can be found in the included interactive notebook.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Owner

Name: Cyril Zakka, MD
Login: cyrilzakka
Kind: user
Location: Palo Alto, California
Company: @hiesingerlab

Website: https://cyrilzakka.github.io
Twitter: cyrilzakka
Repositories: 4
Profile: https://github.com/cyrilzakka

Medical Doctor, Postdoctoral Fellow at Stanford Medicine.

GitHub Events

Total

Watch event: 3

Last Year

Watch event: 3

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 4
Total pull requests: 1
Average time to close issues: 5 days
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

likemby (3)
klinic (1)

Pull Request Authors

akashc1 (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

einops *
pandas *
timm *
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cyrilzakka/mae3d

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Masked Autoencoders As Spatiotemporal Learners

Getting Started

Pretraining

Dataset Preparation

Dataloader

Training

Visualization

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies