moe-jetpack

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

https://github.com/adlith/moe-jetpack

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, scholar.google
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary

Keywords

conditional-computation deep-learning mixture-of-experts vision-transformer
Last synced: 10 months ago · JSON representation ·

Repository

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Basic Info
Statistics
  • Stars: 115
  • Watchers: 2
  • Forks: 1
  • Open Issues: 3
  • Releases: 0
Topics
conditional-computation deep-learning mixture-of-experts vision-transformer
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

MoE Jetpack Logo

MoE Jetpack

From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Xingkui Zhu*, Yiran Guan*, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai

Huazhong University of Science and Technology

* Equal Contribution      Corresponding Author

NeurIPS 2024 | arXiv | 中文解读
If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

  • 2024.09.26: MoE Jetpack has been accepted by NeurIPS 2024. 🎉
  • 2024.06.07: MoE Jetpack paper released. 🔥

⭐️ Highlights

  • 🔥 Strong performance. MoE Jetpack boosts accuracy across multiple vision tasks, outperforming both dense and Soft MoE models.

  • Fast Convergence. Leveraging checkpoint recycling, MoE Jetpack speeds up convergence, achieving target accuracies significantly faster than training from scratch.

  • 🤝 Strong generalization. MoE Jetpack achieves significant performance improvements on both Transformer and CNN across 8 downstream vision datasets.

  • 😮 Running Efficiency. We provide an efficient implementation of expert parallelization, whereby the FLOPs and training wall time remain nearly identical to those of a dense model.

⚡ Overview

We present MoE Jetpack, a framework that fine-tunes pre-trained dense models into Mixture of Experts with checkpoint recycling and SpheroMoE layers, improving convergence speed, accuracy, and computational efficiency across several downstream vision tasks.

📦 Download URL

| File Type | Description | Download Link (Google Drive) | |-------------------------------------|----------------------------------------------------------------------------|------------------------------------------------------------------| | Checkpoint Recycling | Sampling from Dense Checkpoints to Initialize MoE Weights | | | Dense Checkpoint (ViT-T) | Pre-trained ViT-T weights on ImageNet-21k for checkpoint recycling | 🤗 ViT-T Weights | | Dense Checkpoint (ViT-S) | Pre-trained ViT-S weights on ImageNet-21k for checkpoint recycling | 🤗 ViT-S Weights | | MoE Jetpack Init Weights | Initialized weights using checkpoint recycling (ViT-T/ViT-S) | MoE Init Weights | | MoE Jetpack | Fine-tuning initialized SpheroMoE on ImageNet-1k | | | Config | Config file for fine-tuning SpheroMoE model using checkpoint recycling weights | MoE Jetpack Config | | Fine-tuning Logs | Logs from fine-tuning SpheroMoE | MoE Jetpack Logs | | MoE Jetpack Weights | Final weights after fine-tuning on ImageNet-1K | MoE Jetpack Weights |

📊 Main Results

Comparisons between MoE Jetpack, Densely activated ViT, and Soft MoE

🚀 Getting Started

🔧 Installation

Follow these steps to set up the environment for MoE Jetpack:

1. Install PyTorch v2.1.0 with CUDA 12.1

bash pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

2. Install MMCV 2.1.0

bash pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html

3. Install MoE Jetpack

Clone the repository and install it: bash git clone https://github.com/Adlith/MoE-Jetpack.git cd path/to/MoE-Jetpack pip install -U openmim && mim install -e . For more details and prepare datasets, refer to MMPretrain Installation

4. Install Additional Dependencies

bash pip install timm einops entmax python-louvain scikit-learn pymetis

Now you're ready to run MoE Jetpack!

📁 Project Directory Structure

Below is an overview of the MoE Jetpack project structure with descriptions of the key components:

bash MoE-Jetpack/ │ ├── data/ │ ├── imagenet/ │ │ ├── train/ │ │ ├── val/ │ │ └── ... │ └── ... │ ├── moejet/ # Main project folder │ ├── configs/ # Configuration files │ │ └── timm/ │ │ ├── vit_tiny_dual_moe_timm_21k_ft.py │ │ └── ... │ │ │ ├── models/ # Contains the model definition files │ │ └── ... │ │ │ ├── tools/ │ │ └── gen_ViT_MoE_weight.py # Script to convert ViT dense checkpoints into MoE format │ │ │ │ │ ├── weights/ # Folder for storing pre-trained weights │ │ └── gen_weight/ # MoE initialization weights go here │ │ └── ... │ │ │ └── ... # Other project-related files and folders │ ├── README.md # Project readme and documentation └── ...

🗝️ Training & Validating

1. Initialize MoE Weights (Checkpoint Recycling)

Run the following script to initialize the MoE weights from pre-trained ViT weights:

bash python moejet/tools/gen_ViT_MoE_weight.py

2. Start Training

```python

For example, to train MoE Jet on ImageNet-1K, use:

CUDAVISIBLEDEVICES=0,1,2,3 PORT=29500 ./tools/disttrain.sh moejet/configs/timm/vittinydualmoetimm21k_ft.py 4 ``` By default, we use 4 GPUs with a batch size of 256 per GPU. Gradient accumulation simulates a total batch size of 4096.

To customize hyperparameters, modify the relevant settings in the configuration file.

🖊️ Citation

@article{zhu2024moe, title={MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks}, author={Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai}, journal={Proceedings of Advances in Neural Information Processing Systems}, year={2024} }

👍 Acknowledgement

We thank the following great works and open-source repositories: - MMPreTrain - Official Soft MoE - Soft MoE PyTorch (by lucidrains) - Weight Selection

Owner

  • Login: Adlith
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks"
authors:
  - name: "Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai"
version: 1.0.0
date-released: 2024-10-21
repository-code: "https://github.com/Adlith/MoE-Jetpack"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 3
  • Watch event: 130
  • Delete event: 1
  • Issue comment event: 6
  • Member event: 3
  • Push event: 23
  • Fork event: 1
  • Create event: 1
Last Year
  • Issues event: 3
  • Watch event: 130
  • Delete event: 1
  • Issue comment event: 6
  • Member event: 3
  • Push event: 23
  • Fork event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • giangdip2410 (1)
  • zjYao36 (1)
  • sharkdrop (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (3)
Pull Request Labels

Dependencies

.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/pr_stage_test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v1.0.14 composite
.github/workflows/publish-to-pypi.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/test_mim.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
tests/data/meta.yml cpan
.circleci/docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/torchserve latest-gpu build
projects/internimage_classification/ops_dcnv3/setup.py pypi
requirements/docs.txt pypi
  • docutils ==0.18.1
  • modelindex *
  • myst-parser *
  • pytorch_sphinx_theme *
  • sphinx ==6.1.3
  • sphinx-copybutton *
  • sphinx-notfound-page *
  • sphinx-tabs *
  • sphinxcontrib-jquery *
  • tabulate *
requirements/mminstall.txt pypi
  • mmcv >=2.0.0,<2.4.0
  • mmengine >=0.8.3,<1.0.0
requirements/multimodal.txt pypi
  • pycocotools *
  • transformers >=4.28.0
requirements/optional.txt pypi
  • albumentations >=0.3.2
  • grad-cam >=1.3.7,<1.5.0
  • requests *
  • scikit-learn *
requirements/readthedocs.txt pypi
  • mmcv-lite >=2.0.0rc4
  • mmengine *
  • pycocotools *
  • torch *
  • torchvision *
  • transformers *
requirements/runtime.txt pypi
  • einops *
  • importlib-metadata *
  • mat4py *
  • matplotlib *
  • modelindex *
  • numpy *
  • rich *
requirements/tests.txt pypi
  • coverage * test
  • interrogate * test
  • pytest * test
requirements.txt pypi
setup.py pypi