moe-jetpack

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

https://github.com/adlith/moe-jetpack

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary

Keywords

conditional-computation deep-learning mixture-of-experts vision-transformer

Last synced: 10 months ago · JSON representation ·

Repository

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Basic Info

Host: GitHub
Owner: Adlith
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2406.04801
Size: 32.3 MB

Statistics

Stars: 115
Watchers: 2
Forks: 1
Open Issues: 3
Releases: 0

Topics

conditional-computation deep-learning mixture-of-experts vision-transformer

Created about 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Citation

MoE Jetpack

From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Xingkui Zhu^*, Yiran Guan^*, Dingkang Liang, Yuchao Chen, Yuliang Liu^✉, Xiang Bai

Huazhong University of Science and Technology

^* Equal Contribution ^✉ Corresponding Author

NeurIPS 2024 | arXiv | 中文解读

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

2024.09.26: MoE Jetpack has been accepted by NeurIPS 2024. 🎉
2024.06.07: MoE Jetpack paper released. 🔥

⭐️ Highlights

🔥 Strong performance. MoE Jetpack boosts accuracy across multiple vision tasks, outperforming both dense and Soft MoE models.
⚡ Fast Convergence. Leveraging checkpoint recycling, MoE Jetpack speeds up convergence, achieving target accuracies significantly faster than training from scratch.
🤝 Strong generalization. MoE Jetpack achieves significant performance improvements on both Transformer and CNN across 8 downstream vision datasets.
😮 Running Efficiency. We provide an efficient implementation of expert parallelization, whereby the FLOPs and training wall time remain nearly identical to those of a dense model.

⚡ Overview

We present MoE Jetpack, a framework that fine-tunes pre-trained dense models into Mixture of Experts with checkpoint recycling and SpheroMoE layers, improving convergence speed, accuracy, and computational efficiency across several downstream vision tasks.

📦 Download URL

| File Type | Description | Download Link (Google Drive) | |-------------------------------------|----------------------------------------------------------------------------|------------------------------------------------------------------| | Checkpoint Recycling | Sampling from Dense Checkpoints to Initialize MoE Weights | | | Dense Checkpoint (ViT-T) | Pre-trained ViT-T weights on ImageNet-21k for checkpoint recycling | 🤗 ViT-T Weights | | Dense Checkpoint (ViT-S) | Pre-trained ViT-S weights on ImageNet-21k for checkpoint recycling | 🤗 ViT-S Weights | | MoE Jetpack Init Weights | Initialized weights using checkpoint recycling (ViT-T/ViT-S) | MoE Init Weights | | MoE Jetpack | Fine-tuning initialized SpheroMoE on ImageNet-1k | | | Config | Config file for fine-tuning SpheroMoE model using checkpoint recycling weights | MoE Jetpack Config | | Fine-tuning Logs | Logs from fine-tuning SpheroMoE | MoE Jetpack Logs | | MoE Jetpack Weights | Final weights after fine-tuning on ImageNet-1K | MoE Jetpack Weights |

📊 Main Results

Comparisons between MoE Jetpack, Densely activated ViT, and Soft MoE

🚀 Getting Started

🔧 Installation

Follow these steps to set up the environment for MoE Jetpack:

1. Install PyTorch v2.1.0 with CUDA 12.1

bash pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

2. Install MMCV 2.1.0

bash pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html

3. Install MoE Jetpack

Clone the repository and install it: bash git clone https://github.com/Adlith/MoE-Jetpack.git cd path/to/MoE-Jetpack pip install -U openmim && mim install -e . For more details and prepare datasets, refer to MMPretrain Installation

4. Install Additional Dependencies

bash pip install timm einops entmax python-louvain scikit-learn pymetis

Now you're ready to run MoE Jetpack!

📁 Project Directory Structure

Below is an overview of the MoE Jetpack project structure with descriptions of the key components:

bash MoE-Jetpack/ │ ├── data/ │ ├── imagenet/ │ │ ├── train/ │ │ ├── val/ │ │ └── ... │ └── ... │ ├── moejet/ # Main project folder │ ├── configs/ # Configuration files │ │ └── timm/ │ │ ├── vit_tiny_dual_moe_timm_21k_ft.py │ │ └── ... │ │ │ ├── models/ # Contains the model definition files │ │ └── ... │ │ │ ├── tools/ │ │ └── gen_ViT_MoE_weight.py # Script to convert ViT dense checkpoints into MoE format │ │ │ │ │ ├── weights/ # Folder for storing pre-trained weights │ │ └── gen_weight/ # MoE initialization weights go here │ │ └── ... │ │ │ └── ... # Other project-related files and folders │ ├── README.md # Project readme and documentation └── ...

🗝️ Training & Validating

1. Initialize MoE Weights (Checkpoint Recycling)

Run the following script to initialize the MoE weights from pre-trained ViT weights:

bash python moejet/tools/gen_ViT_MoE_weight.py

2. Start Training

The training and testing code is built on MMPretrain. Please refer to the Training Documentation for more details.

```python

For example, to train MoE Jet on ImageNet-1K, use:

CUDAVISIBLEDEVICES=0,1,2,3 PORT=29500 ./tools/disttrain.sh moejet/configs/timm/vittinydualmoetimm21k_ft.py 4 ``` By default, we use 4 GPUs with a batch size of 256 per GPU. Gradient accumulation simulates a total batch size of 4096.

To customize hyperparameters, modify the relevant settings in the configuration file.

🖊️ Citation

@article{zhu2024moe, title={MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks}, author={Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai}, journal={Proceedings of Advances in Neural Information Processing Systems}, year={2024} }

👍 Acknowledgement

We thank the following great works and open-source repositories: - MMPreTrain - Official Soft MoE - Soft MoE PyTorch (by lucidrains) - Weight Selection

Owner

Login: Adlith
Kind: user

Repositories: 1
Profile: https://github.com/Adlith

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks"
authors:
  - name: "Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai"
version: 1.0.0
date-released: 2024-10-21
repository-code: "https://github.com/Adlith/MoE-Jetpack"
license: Apache-2.0

GitHub Events

Total

Issues event: 3
Watch event: 130
Delete event: 1
Issue comment event: 6
Member event: 3
Push event: 23
Fork event: 1
Create event: 1

Last Year

Issues event: 3
Watch event: 130
Delete event: 1
Issue comment event: 6
Member event: 3
Push event: 23
Fork event: 1
Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

giangdip2410 (1)
zjYao36 (1)
sharkdrop (1)

Pull Request Authors

Top Labels

Issue Labels

enhancement (3)

Pull Request Labels

Dependencies

.github/workflows/lint.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pr_stage_test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
codecov/codecov-action v1.0.14 composite

.github/workflows/publish-to-pypi.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/test_mim.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

tests/data/meta.yml cpan

.circleci/docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve/Dockerfile docker

pytorch/torchserve latest-gpu build

projects/internimage_classification/ops_dcnv3/setup.py pypi

requirements/docs.txt pypi

docutils ==0.18.1
modelindex *
myst-parser *
pytorch_sphinx_theme *
sphinx ==6.1.3
sphinx-copybutton *
sphinx-notfound-page *
sphinx-tabs *
sphinxcontrib-jquery *
tabulate *

requirements/mminstall.txt pypi

mmcv >=2.0.0,<2.4.0
mmengine >=0.8.3,<1.0.0

requirements/multimodal.txt pypi

pycocotools *
transformers >=4.28.0

requirements/optional.txt pypi

albumentations >=0.3.2
grad-cam >=1.3.7,<1.5.0
requests *
scikit-learn *

requirements/readthedocs.txt pypi

mmcv-lite >=2.0.0rc4
mmengine *
pycocotools *
torch *
torchvision *
transformers *

requirements/runtime.txt pypi

einops *
importlib-metadata *
mat4py *
matplotlib *
modelindex *
numpy *
rich *

requirements/tests.txt pypi

coverage * test
interrogate * test
pytest * test

requirements.txt pypi

setup.py pypi