https://github.com/animesh/multimae

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

https://github.com/animesh/multimae

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of EPFL-VILAB/MultiMAE
Created about 4 years ago · Last pushed over 4 years ago

https://github.com/animesh/MultiMAE/blob/main/

# MultiMAE: Multi-modal Multi-task Masked Autoencoders

[Roman Bachmann*](https://roman-bachmann.github.io/), [David Mizrahi*](https://dmizrahi.com), [Andrei Atanov](https://andrewatanov.github.io/), [Amir Zamir](https://vilab.epfl.ch/zamir/)


 [`Website`](https://multimae.epfl.ch) | [`arXiv`](https://arxiv.org/abs/2204.01678) | [`BibTeX`](#citation)



Official PyTorch implementation and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders.


We introduce Multi-modal Multi-task Masked Autoencoders (**MultiMAE**), an efficient and effective pre-training strategy for Vision Transformers. Given a small random sample of visible patches from multiple modalities, the MultiMAE pre-training objective is to reconstruct the masked-out regions. Once pre-trained, a single MultiMAE encoder can then be used for both single-modal and multi-modal downstream transfer, yielding competitive to or significantly better results than the baselines. ## Catalog - [x] Pre-trained models - [x] MultiMAE pre-training code - [x] ImageNet-1K classification fine-tuning code - [x] Semantic segmentation fine-tuning code (single-modal & multi-modal) - [x] Depth estimation fine-tuning code - [x] Taskonomy fine-tuning code - [ ] Colab demo (coming soon) ## Pre-trained models We provide the weights of our pre-trained MultiMAE ViT-B model, in MultiViT (multi-modal) format and [timm](https://github.com/rwightman/pytorch-image-models/tree/master/timm) (RGB-only) format. For comparison, we also provide the weights of a MAE ViT-B model that we pre-trained using the [official MAE codebase](https://github.com/facebookresearch/mae) following the recommended settings. | Method | Arch. | Pre-training
modalities | Pre-training
epochs | Weights
(MultiViT) | Weights
(timm) | Config | |----------------|---------|------------------------------|--------------------------|-----------------------------|-------------------------|----------------------------------------------------------------------------| | MAE | ViT-B | RGB | 1600 | [download](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/mae-b_dec512d8b_1600e_multivit-c477195b.pth) | [download](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/mae-b_dec512d8b_1600e_timm-f74f3a8d.pth) | See [MAE](https://github.com/facebookresearch/mae/blob/main/PRETRAIN.md) | | **MultiMAE** | ViT-B | RGB+D+S | 1600 | [**download**](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/multimae-b_98_rgb+-depth-semseg_1600e_multivit-afff3f8c.pth) | [**download**](https://github.com/EPFL-VILAB/MultiMAE/releases/download/pretrained-weights/multimae-b_98_rgb+-depth-semseg_1600e_timm-bafa5499.pth) | [link](cfgs/pretrain/multimae-b_98_rgb+-depth-semseg_1600e.yaml) | These pre-trained models can then be fine-tuned using this codebase to reach the following performance:
Method Classif. (@1) Semantic Segmentation (mIoU) Depth (1)
ImageNet-1K
(RGB)
ADE20K
(RGB)
Hypersim
(RGB / D / RGB + D)
NYUv2
(RGB / D / RGB + D)
NYUv2
(RGB)
Sup. (DeiT) 81.8 45.8 33.9 - - 50.1 - - 80.7
MAE 83.3 46.2 36.5 - -
50.8 - - 85.1
MultiMAE 83.3 46.2 37.0 38.5 47.6 52.0 41.4 56.0 86.4
### Model formats We provide pre-trained weights in two different formats: the single-modal ViT / timm format, which is compatible with other popular ViT repositories (e.g., [timm](https://github.com/rwightman/pytorch-image-models/tree/master/timm), [DINO](https://github.com/facebookresearch/dino ), [MAE](https://github.com/facebookresearch/mae)), and the multi-modal MultiMAE / MultiViT format, which is used throughout this codebase for multi-modal pre-training and fine-tuning. See [`multimae/multimae.py`](multimae/multimae.py) for the documentation and implementation of MultiMAE / MultiViT. You can convert between these formats using the provided [`vit2multimae_converter.py`](tools/vit2multimae_converter.py) and [`multimae2vit_converter.py`](tools/multimae2vit_converter.py) scripts. ## Usage ### Set-up See [SETUP.md](SETUP.md) for set-up instructions. ### Pre-training See [PRETRAINING.md](PRETRAINING.md) for pre-training instructions. ### Fine-tuning See [FINETUNING.md](FINETUNING.md) for fine-tuning instructions. ## Acknowledgement This repository is built using the [timm](https://github.com/rwightman/pytorch-image-models/tree/master/timm), [DeiT](https://github.com/facebookresearch/deit), [DINO](https://github.com/facebookresearch/dino ), [MoCo v3](https://github.com/facebookresearch/moco-v3), [BEiT](https://github.com/microsoft/unilm/tree/master/beit), [MAE-priv](https://github.com/BUPT-PRIV/MAE-priv), and [MAE](https://github.com/facebookresearch/mae) repositories. ## License This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details. ## Citation If you find this repository helpful, please consider citing our work: ```BibTeX @article{bachmann2022multimae, author = {Roman Bachmann and David Mizrahi and Andrei Atanov and Amir Zamir}, title = {{MultiMAE}: Multi-modal Multi-task Masked Autoencoders}, journal = {arXiv preprint arXiv:2204.01678}, year = {2022}, } ```

Owner

  • Name: Ani
  • Login: animesh
  • Kind: user
  • Location: Norway
  • Company: Norwegian University of Science and Technology

A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.

GitHub Events

Total
Last Year