https://github.com/animesh/multimae
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.0%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders
Basic Info
- Host: GitHub
- Owner: animesh
- License: other
- Default Branch: main
- Homepage: https://multimae.epfl.ch
- Size: 2.12 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of EPFL-VILAB/MultiMAE
Created about 4 years ago
· Last pushed over 4 years ago
https://github.com/animesh/MultiMAE/blob/main/
# MultiMAE: Multi-modal Multi-task Masked Autoencoders [Roman Bachmann*](https://roman-bachmann.github.io/), [David Mizrahi*](https://dmizrahi.com), [Andrei Atanov](https://andrewatanov.github.io/), [Amir Zamir](https://vilab.epfl.ch/zamir/) [`Website`](https://multimae.epfl.ch) | [`arXiv`](https://arxiv.org/abs/2204.01678) | [`BibTeX`](#citation) Official PyTorch implementation and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders.We introduce Multi-modal Multi-task Masked Autoencoders (**MultiMAE**), an efficient and effective pre-training strategy for Vision Transformers. Given a small random sample of visible patches from multiple modalities, the MultiMAE pre-training objective is to reconstruct the masked-out regions. Once pre-trained, a single MultiMAE encoder can then be used for both single-modal and multi-modal downstream transfer, yielding competitive to or significantly better results than the baselines. ## Catalog - [x] Pre-trained models - [x] MultiMAE pre-training code - [x] ImageNet-1K classification fine-tuning code - [x] Semantic segmentation fine-tuning code (single-modal & multi-modal) - [x] Depth estimation fine-tuning code - [x] Taskonomy fine-tuning code - [ ] Colab demo (coming soon) ## Pre-trained models We provide the weights of our pre-trained MultiMAE ViT-B model, in MultiViT (multi-modal) format and [timm](https://github.com/rwightman/pytorch-image-models/tree/master/timm) (RGB-only) format. For comparison, we also provide the weights of a MAE ViT-B model that we pre-trained using the [official MAE codebase](https://github.com/facebookresearch/mae) following the recommended settings. | Method | Arch. | Pre-training
![]()
modalities | Pre-training
epochs | Weights
(MultiViT) | Weights
(timm) | Config | |----------------|---------|------------------------------|--------------------------|-----------------------------|-------------------------|----------------------------------------------------------------------------| | MAE | ViT-B | RGB | 1600 | [download](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/mae-b_dec512d8b_1600e_multivit-c477195b.pth) | [download](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/mae-b_dec512d8b_1600e_timm-f74f3a8d.pth) | See [MAE](https://github.com/facebookresearch/mae/blob/main/PRETRAIN.md) | | **MultiMAE** | ViT-B | RGB+D+S | 1600 | [**download**](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/multimae-b_98_rgb+-depth-semseg_1600e_multivit-afff3f8c.pth) | [**download**](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/multimae-b_98_rgb+-depth-semseg_1600e_timm-bafa5499.pth) | [link](cfgs/pretrain/multimae-b_98_rgb+-depth-semseg_1600e.yaml) | These pre-trained models can then be fine-tuned using this codebase to reach the following performance:
| Method | Classif. (@1) | Semantic Segmentation (mIoU) | Depth (1) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| ImageNet-1K (RGB) |
ADE20K (RGB) |
Hypersim (RGB / D / RGB + D) |
NYUv2 (RGB / D / RGB + D) |
NYUv2 (RGB) |
|||||
| Sup. (DeiT) | 81.8 | 45.8 | 33.9 | - | - | 50.1 | - | - | 80.7 |
| MAE | 83.3 | 46.2 | 36.5 | - | - |
50.8 | - | - | 85.1 |
| MultiMAE | 83.3 | 46.2 | 37.0 | 38.5 | 47.6 | 52.0 | 41.4 | 56.0 | 86.4 |
Owner
- Name: Ani
- Login: animesh
- Kind: user
- Location: Norway
- Company: Norwegian University of Science and Technology
- Website: https://www.fuzzylife.org
- Twitter: animesh1977
- Repositories: 749
- Profile: https://github.com/animesh
A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.