emiff
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Repository
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
Basic Info
Statistics
- Stars: 77
- Watchers: 2
- Forks: 10
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
Project page | Paper | VIMI |
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang.ICRA 2024.
This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for EMIFF/VIMI.
Abstract
In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.
Methods

Get Started
Benchmark and Model Zoo
Modality:Image
| Fusion | Method| Dataset | AP-3D (IoU=0.5) | AP-BEV (IoU=0.5) |Config|DownLoad|
| :-----: | :--------: | :-------: | :----: | :----: | :----: | :-----: |
| Only-Veh | ImvoxelNet | VIC-Sync | 7.29 | 8.85 | config |\ |
| Only-Inf | ImvoxelNet | VIC-Sync | 8.66 | 14.41 | config |\ |
| Late-Fusion | ImvoxelNet | VIC-Sync | 11.08 | 14.76 | \ | \ |
| Early-Fusion | BEVFormer_S | VIC-Sync | 8.80 | 13.45 | config | model/log|
| Early-Fusion | ImVoxelNet | VIC-Sync | 12.72 | 18.17 | config | model/log|
| Intermediate-Fusion| EMIFF | VIC-Sync | 15.61 | 21.44 | config | model/log |
We evaluate Only-Veh/Only-Inf/Late-Fusion model following OpenDAIRV2X.
Acknowledgement
This project is not possible without the following codebases. * OpenDAIRV2X * MMDetection3D <!-- * pypcd -->
Citation
If you find our work useful in your research, please consider citing:
``` @misc{wang2023vimi, title={VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection}, author={Zhe Wang and Siqi Fan and Xiaoliang Huo and Tongda Xu and Yan Wang and Jingjing Liu and Yilun Chen and Ya-Qin Zhang}, year={2023}, eprint={2303.10975}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@inproceedings{wang2024emiff, title={EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection}, author={Zhe Wang and Siqi Fan and Xiaoliang Huo and Tongda Xu and Yan Wang and Jingjing Liu and Yilun Chen and Ya-Qin Zhang}, booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2024}} } ```
Owner
- Login: Bosszhe
- Kind: user
- Repositories: 1
- Profile: https://github.com/Bosszhe
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMDetection3D Contributors" title: "OpenMMLab's Next-generation Platform for General 3D Object Detection" date-released: 2020-07-23 url: "https://github.com/open-mmlab/mmdetection3d" license: Apache-2.0
GitHub Events
Total
- Issues event: 2
- Watch event: 12
- Issue comment event: 6
- Fork event: 3
Last Year
- Issues event: 2
- Watch event: 12
- Issue comment event: 6
- Fork event: 3
Dependencies
- absl-py ==1.4.0
- addict ==2.4.0
- ansi2html ==1.9.1
- anyio ==3.6.2
- appdirs ==1.4.4
- argon2-cffi ==21.3.0
- argon2-cffi-bindings ==21.2.0
- attrs ==23.1.0
- backcall ==0.2.0
- beautifulsoup4 ==4.12.2
- black ==23.3.0
- bleach ==6.0.0
- cachetools ==5.3.0
- ccimport ==0.4.2
- cffi ==1.15.1
- charset-normalizer ==3.1.0
- click ==8.1.3
- colorama ==0.4.6
- configargparse ==1.7
- cumm ==0.4.11
- cycler ==0.11.0
- dash ==2.14.2
- dash-core-components ==2.0.0
- dash-html-components ==2.0.0
- dash-table ==5.0.0
- debugpy ==1.6.7
- decorator ==5.1.1
- defusedxml ==0.7.1
- descartes ==1.1.0
- docker-pycreds ==0.4.0
- entrypoints ==0.4
- exceptiongroup ==1.1.1
- fastjsonschema ==2.16.3
- fire ==0.5.0
- flake8 ==3.9.2
- flask ==2.2.5
- fonttools ==4.38.0
- fvcore ==0.1.5.post20221221
- gitdb ==4.0.10
- gitpython ==3.1.31
- google-auth ==2.17.3
- google-auth-oauthlib ==0.4.6
- grpcio ==1.54.0
- idna ==3.4
- imageio ==2.28.1
- importlib-metadata ==6.6.0
- importlib-resources ==5.12.0
- iniconfig ==2.0.0
- iopath ==0.1.10
- ipykernel ==6.16.2
- ipython ==7.34.0
- ipython-genutils ==0.2.0
- ipywidgets ==8.0.6
- itsdangerous ==2.1.2
- jedi ==0.18.2
- jinja2 ==3.1.2
- joblib ==1.2.0
- jsonschema ==4.17.3
- jupyter ==1.0.0
- jupyter-client ==7.4.9
- jupyter-console ==6.6.3
- jupyter-core ==4.12.0
- jupyter-server ==1.24.0
- jupyterlab-pygments ==0.2.2
- jupyterlab-widgets ==3.0.7
- kiwisolver ==1.4.4
- lark ==1.1.8
- llvmlite ==0.36.0
- lyft-dataset-sdk ==0.0.8
- markdown ==3.4.3
- markdown-it-py ==2.2.0
- markupsafe ==2.1.2
- matplotlib ==3.5.2
- matplotlib-inline ==0.1.6
- mccabe ==0.6.1
- mdurl ==0.1.2
- mistune ==2.0.5
- mmcls ==0.25.0
- mmcv-full ==1.6.2
- mmdet ==2.25.2
- mmengine ==0.7.3
- mmsegmentation ==0.29.0
- model-index ==0.1.11
- mypy-extensions ==1.0.0
- nbclassic ==1.0.0
- nbclient ==0.7.4
- nbconvert ==7.3.1
- nbformat ==5.7.0
- nest-asyncio ==1.5.6
- networkx ==2.2
- ninja ==1.11.1.1
- notebook ==6.5.4
- notebook-shim ==0.2.3
- numba ==0.53.0
- numpy ==1.21.6
- nuscenes-devkit ==1.1.10
- oauthlib ==3.2.2
- open3d ==0.17.0
- opencv-python ==4.7.0.72
- openmim ==0.3.7
- ordered-set ==4.1.0
- packaging ==23.1
- pandas ==1.3.5
- pandocfilters ==1.5.0
- parso ==0.8.3
- pathspec ==0.11.1
- pathtools ==0.1.2
- pccm ==0.4.11
- pexpect ==4.8.0
- pickleshare ==0.7.5
- pillow ==9.5.0
- pkgutil-resolve-name ==1.3.10
- platformdirs ==3.5.0
- plotly ==5.14.1
- pluggy ==1.0.0
- plyfile ==0.9
- portalocker ==2.7.0
- prettytable ==3.7.0
- prometheus-client ==0.16.0
- prompt-toolkit ==3.0.38
- protobuf ==3.9.2
- psutil ==5.9.5
- ptyprocess ==0.7.0
- pyasn1 ==0.5.0
- pyasn1-modules ==0.3.0
- pybind11 ==2.11.1
- pycocotools ==2.0.6
- pycodestyle ==2.7.0
- pycparser ==2.21
- pyflakes ==2.3.1
- pygments ==2.15.1
- pyparsing ==3.0.9
- pyquaternion ==0.9.9
- pyrsistent ==0.19.3
- pytest ==7.3.1
- python-dateutil ==2.8.2
- pytz ==2023.3
- pywavelets ==1.3.0
- pyyaml ==6.0
- pyzmq ==25.0.2
- qtconsole ==5.4.3
- qtpy ==2.3.1
- requests ==2.30.0
- requests-oauthlib ==1.3.1
- retrying ==1.3.4
- rich ==13.3.5
- rsa ==4.9
- scikit-image ==0.19.3
- scikit-learn ==1.0.2
- scipy ==1.7.3
- send2trash ==1.8.2
- sentry-sdk ==1.22.2
- setproctitle ==1.3.2
- setuptools ==59.5.0
- shapely ==1.8.5
- six ==1.16.0
- smmap ==5.0.0
- sniffio ==1.3.0
- soupsieve ==2.4.1
- spconv ==2.3.6
- tabulate ==0.9.0
- tenacity ==8.2.2
- tensorboard ==2.11.2
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- termcolor ==2.3.0
- terminado ==0.17.1
- terminaltables ==3.1.10
- threadpoolctl ==3.1.0
- tifffile ==2021.11.2
- tinycss2 ==1.2.1
- tomli ==2.0.1
- torch ==1.9.1
- torch-efficient-distloss ==0.1.3
- torch-scatter ==2.1.1
- torchaudio ==0.9.1
- torchvision ==0.10.1
- tornado ==6.2
- tqdm ==4.65.0
- traitlets ==5.9.0
- trimesh ==2.35.39
- typed-ast ==1.5.4
- typing-extensions ==4.5.0
- urllib3 ==1.26.15
- wandb ==0.15.2
- wcwidth ==0.2.6
- webencodings ==0.5.1
- websocket-client ==1.5.1
- werkzeug ==2.2.3
- widgetsnbextension ==4.0.7
- yacs ==0.1.8
- yapf ==0.33.0
- zipp ==3.15.0
- docutils ==0.16.0
- m2r *
- mistune ==0.8.4
- myst-parser *
- sphinx ==4.0.2
- sphinx-copybutton *
- sphinx_markdown_tables *
- mmcv-full >=1.4.8,<=1.6.0
- mmdet >=2.24.0,<=3.0.0
- mmsegmentation >=0.20.0,<=1.0.0
- open3d *
- spconv *
- waymo-open-dataset-tf-2-1-0 ==1.2.0
- mmcv >=1.4.8
- mmdet >=2.24.0
- mmsegmentation >=0.20.1
- torch *
- torchvision *
- lyft_dataset_sdk *
- networkx >=2.2,<2.3
- numba ==0.53.0
- numpy *
- nuscenes-devkit *
- plyfile *
- scikit-image *
- tensorboard *
- trimesh >=2.35.39,<2.35.40
- asynctest * test
- codecov * test
- flake8 * test
- interrogate * test
- isort * test
- kwarray * test
- pytest * test
- pytest-cov * test
- pytest-runner * test
- ubelt * test
- xdoctest >=0.10.0 test
- yapf * test