moe-jetpack
[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Keywords
Repository
[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Basic Info
- Host: GitHub
- Owner: Adlith
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2406.04801
- Size: 32.3 MB
Statistics
- Stars: 115
- Watchers: 2
- Forks: 1
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
MoE Jetpack
From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu*, Yiran Guan*, Dingkang Liang, Yuchao Chen, Yuliang Liu✉, Xiang Bai
Huazhong University of Science and Technology
* Equal Contribution ✉ Corresponding Author
If you like our project, please give us a star ⭐ on GitHub for the latest update.
📣 News
- 2024.09.26: MoE Jetpack has been accepted by NeurIPS 2024. 🎉
- 2024.06.07: MoE Jetpack paper released. 🔥
⭐️ Highlights
🔥 Strong performance. MoE Jetpack boosts accuracy across multiple vision tasks, outperforming both dense and Soft MoE models.
⚡ Fast Convergence. Leveraging checkpoint recycling, MoE Jetpack speeds up convergence, achieving target accuracies significantly faster than training from scratch.
🤝 Strong generalization. MoE Jetpack achieves significant performance improvements on both Transformer and CNN across 8 downstream vision datasets.
😮 Running Efficiency. We provide an efficient implementation of expert parallelization, whereby the FLOPs and training wall time remain nearly identical to those of a dense model.
⚡ Overview
We present MoE Jetpack, a framework that fine-tunes pre-trained dense models into Mixture of Experts with checkpoint recycling and SpheroMoE layers, improving convergence speed, accuracy, and computational efficiency across several downstream vision tasks.
📦 Download URL
| File Type | Description | Download Link (Google Drive) | |-------------------------------------|----------------------------------------------------------------------------|------------------------------------------------------------------| | Checkpoint Recycling | Sampling from Dense Checkpoints to Initialize MoE Weights | | | Dense Checkpoint (ViT-T) | Pre-trained ViT-T weights on ImageNet-21k for checkpoint recycling | 🤗 ViT-T Weights | | Dense Checkpoint (ViT-S) | Pre-trained ViT-S weights on ImageNet-21k for checkpoint recycling | 🤗 ViT-S Weights | | MoE Jetpack Init Weights | Initialized weights using checkpoint recycling (ViT-T/ViT-S) | MoE Init Weights | | MoE Jetpack | Fine-tuning initialized SpheroMoE on ImageNet-1k | | | Config | Config file for fine-tuning SpheroMoE model using checkpoint recycling weights | MoE Jetpack Config | | Fine-tuning Logs | Logs from fine-tuning SpheroMoE | MoE Jetpack Logs | | MoE Jetpack Weights | Final weights after fine-tuning on ImageNet-1K | MoE Jetpack Weights |
📊 Main Results
Comparisons between MoE Jetpack, Densely activated ViT, and Soft MoE
🚀 Getting Started
🔧 Installation
Follow these steps to set up the environment for MoE Jetpack:
1. Install PyTorch v2.1.0 with CUDA 12.1
bash
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
2. Install MMCV 2.1.0
bash
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
3. Install MoE Jetpack
Clone the repository and install it:
bash
git clone https://github.com/Adlith/MoE-Jetpack.git
cd path/to/MoE-Jetpack
pip install -U openmim && mim install -e .
For more details and prepare datasets, refer to MMPretrain Installation
4. Install Additional Dependencies
bash
pip install timm einops entmax python-louvain scikit-learn pymetis
Now you're ready to run MoE Jetpack!
📁 Project Directory Structure
Below is an overview of the MoE Jetpack project structure with descriptions of the key components:
bash
MoE-Jetpack/
│
├── data/
│ ├── imagenet/
│ │ ├── train/
│ │ ├── val/
│ │ └── ...
│ └── ...
│
├── moejet/ # Main project folder
│ ├── configs/ # Configuration files
│ │ └── timm/
│ │ ├── vit_tiny_dual_moe_timm_21k_ft.py
│ │ └── ...
│ │
│ ├── models/ # Contains the model definition files
│ │ └── ...
│ │
│ ├── tools/
│ │ └── gen_ViT_MoE_weight.py # Script to convert ViT dense checkpoints into MoE format
│ │
│ │
│ ├── weights/ # Folder for storing pre-trained weights
│ │ └── gen_weight/ # MoE initialization weights go here
│ │ └── ...
│ │
│ └── ... # Other project-related files and folders
│
├── README.md # Project readme and documentation
└── ...
🗝️ Training & Validating
1. Initialize MoE Weights (Checkpoint Recycling)
Run the following script to initialize the MoE weights from pre-trained ViT weights:
bash
python moejet/tools/gen_ViT_MoE_weight.py
2. Start Training
- The training and testing code is built on MMPretrain. Please refer to the Training Documentation for more details.
```python
For example, to train MoE Jet on ImageNet-1K, use:
CUDAVISIBLEDEVICES=0,1,2,3 PORT=29500 ./tools/disttrain.sh moejet/configs/timm/vittinydualmoetimm21k_ft.py 4 ``` By default, we use 4 GPUs with a batch size of 256 per GPU. Gradient accumulation simulates a total batch size of 4096.
To customize hyperparameters, modify the relevant settings in the configuration file.
🖊️ Citation
@article{zhu2024moe,
title={MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks},
author={Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai},
journal={Proceedings of Advances in Neural Information Processing Systems},
year={2024}
}
👍 Acknowledgement
We thank the following great works and open-source repositories: - MMPreTrain - Official Soft MoE - Soft MoE PyTorch (by lucidrains) - Weight Selection
Owner
- Login: Adlith
- Kind: user
- Repositories: 1
- Profile: https://github.com/Adlith
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." title: "MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks" authors: - name: "Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai" version: 1.0.0 date-released: 2024-10-21 repository-code: "https://github.com/Adlith/MoE-Jetpack" license: Apache-2.0
GitHub Events
Total
- Issues event: 3
- Watch event: 130
- Delete event: 1
- Issue comment event: 6
- Member event: 3
- Push event: 23
- Fork event: 1
- Create event: 1
Last Year
- Issues event: 3
- Watch event: 130
- Delete event: 1
- Issue comment event: 6
- Member event: 3
- Push event: 23
- Fork event: 1
- Create event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- giangdip2410 (1)
- zjYao36 (1)
- sharkdrop (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v1.0.14 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/torchserve latest-gpu build
- docutils ==0.18.1
- modelindex *
- myst-parser *
- pytorch_sphinx_theme *
- sphinx ==6.1.3
- sphinx-copybutton *
- sphinx-notfound-page *
- sphinx-tabs *
- sphinxcontrib-jquery *
- tabulate *
- mmcv >=2.0.0,<2.4.0
- mmengine >=0.8.3,<1.0.0
- pycocotools *
- transformers >=4.28.0
- albumentations >=0.3.2
- grad-cam >=1.3.7,<1.5.0
- requests *
- scikit-learn *
- mmcv-lite >=2.0.0rc4
- mmengine *
- pycocotools *
- torch *
- torchvision *
- transformers *
- einops *
- importlib-metadata *
- mat4py *
- matplotlib *
- modelindex *
- numpy *
- rich *
- coverage * test
- interrogate * test
- pytest * test