zeroi2v
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Repository
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Basic Info
Statistics
- Stars: 20
- Watchers: 4
- Forks: 0
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video(ECCV2024)
This repo is the official implementation of "ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video"(ECCV2024)
If you're interested in our work, check out our new video adaptation benchmark!
TODO
- [x] Release source codes
- [x] Pretrained model weights
Introduction
In this paper, we present a zero-cost adaptation paradigm (ZeroI2V) to transfer the image transformers to video recognition tasks (i.e., introduce zero extra cost to the adapted models during inference).

Models
You could reparameter the weight refer to
tools/weight_reparam.py.Kinetics 400
| Backbone | Pretrain | GFLOPs | Param | New Param (M) | acc@1 | Views | Config | Checkpoint (before reparam) | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | ViT-B/16 | CLIP | 422 | 86 | 0 | 83.0 | 8x1x3 | config | checkpoint | | ViT-L/14 | CLIP | 1946 | 304 | 0 | 86.3 | 8x1x3 | config | checkpoint | | ViT-L/14 | CLIP | 7783 | 304 | 0 | 87.2 | 32x1x3 | config | checkpoint |
Something Something V2
| Backbone | Pretrain | GFLOPs | Param | New Param (M) | acc@1 | Views | Config | Checkpoint (before reparam) | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | | ViT-L/14 | CLIP | 7783 | 304 | 0 | 72.2 | 32x3x1 |config| checkpoint |
Installation
```bash pip install -U openmim mim install mmengine 'mmcv>=2.0.0rc1' mim install "mmdet>=3.0.0rc5" mim install "mmpose>=1.0.0rc0" git clone https://github.com/leexinhao/ZeroI2V.git cd ZeroI2V pip install -v -e .
install CLIP
pip install git+https://github.com/openai/CLIP.git ``` Our project is based on MMAction2. Please refer to install.md for more detailed instructions.
Data Preparation
All the datasets (K400, SSv2, UCF101 and HMDB51) used in this work are supported in MMAction2.
Training
The training configs of different experiments are provided in configs/recognition/. To run experiments, please use the following command. PATH/TO/CONFIG is the training config you want to use. The default training setting is 8GPU with a batchsize of 64.
shell
bash tools/dist_train.sh <PATH/TO/CONFIG> <NUM_GPU>
We also provide a training script in run_exp.sh. You can simply change the training config to train different models.
Evaluation
The code will do the evaluation after training. If you would like to evaluate a model only, please use the following command,
shell
bash tools/dist_test.sh <PATH/TO/CONFIG> <CHECKPOINT_FILE> <NUM_GPU> --eval top_k_accuracy
Reparameterize the linear adapter
Please refer to tools/weight_reparam.py.
Test speed and throughput
Please refer to tools/test_speed.py and tools/test_throughput.py.
If you find our work useful in your research, please cite:
@article{li2023zeroi2v,
title={ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video},
author={Li, Xinhao and Zhu, Yuhan and Wang, Limin},
journal={arXiv preprint arXiv:2310.01324},
year={2023}
}
Owner
- Name: white_windmills
- Login: leexinhao
- Kind: user
- Location: Shang Hai
- Company: SenseTime
- Repositories: 2
- Profile: https://github.com/leexinhao
I am currently working as an intern at SenseTime. My research interests mainly lie in effective/efficient/novel methods in image/video understanding.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMAction2 Contributors" title: "OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark" date-released: 2020-07-21 url: "https://github.com/open-mmlab/mmaction2" license: Apache-2.0
GitHub Events
Total
- Issues event: 2
- Watch event: 3
- Issue comment event: 3
Last Year
- Issues event: 2
- Watch event: 3
- Issue comment event: 3
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: 4 days
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 4 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- TJQdoIt9527 (2)
- kie4280 (1)
- Ziwei-Zheng (1)
Pull Request Authors
- TJQdoIt9527 (1)