emiff

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

https://github.com/bosszhe/emiff

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Basic Info
  • Host: GitHub
  • Owner: Bosszhe
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 8.34 MB
Statistics
  • Stars: 77
  • Watchers: 2
  • Forks: 10
  • Open Issues: 3
  • Releases: 0
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Project page | Paper | VIMI |

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang.ICRA 2024.

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for EMIFF/VIMI.

Abstract

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.

Methods

Architecture

Get Started

Benchmark and Model Zoo

Modality:Image

| Fusion | Method| Dataset | AP-3D (IoU=0.5) | AP-BEV (IoU=0.5) |Config|DownLoad| | :-----: | :--------: | :-------: | :----: | :----: | :----: | :-----: |
| Only-Veh | ImvoxelNet | VIC-Sync | 7.29 | 8.85 | config |\ | | Only-Inf | ImvoxelNet | VIC-Sync | 8.66 | 14.41 | config |\ | | Late-Fusion | ImvoxelNet | VIC-Sync | 11.08 | 14.76 | \ | \ | | Early-Fusion | BEVFormer_S | VIC-Sync | 8.80 | 13.45 | config | model/log|
| Early-Fusion | ImVoxelNet | VIC-Sync | 12.72 | 18.17 | config | model/log|
| Intermediate-Fusion| EMIFF | VIC-Sync | 15.61 | 21.44 | config | model/log |

We evaluate Only-Veh/Only-Inf/Late-Fusion model following OpenDAIRV2X.

Acknowledgement

This project is not possible without the following codebases. * OpenDAIRV2X * MMDetection3D <!-- * pypcd -->

Citation

If you find our work useful in your research, please consider citing:

``` @misc{wang2023vimi, title={VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection}, author={Zhe Wang and Siqi Fan and Xiaoliang Huo and Tongda Xu and Yan Wang and Jingjing Liu and Yilun Chen and Ya-Qin Zhang}, year={2023}, eprint={2303.10975}, archivePrefix={arXiv}, primaryClass={cs.CV} }

@inproceedings{wang2024emiff, title={EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection}, author={Zhe Wang and Siqi Fan and Xiaoliang Huo and Tongda Xu and Yan Wang and Jingjing Liu and Yilun Chen and Ya-Qin Zhang}, booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2024}} } ```

Owner

  • Login: Bosszhe
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection3D Contributors"
title: "OpenMMLab's Next-generation Platform for General 3D Object Detection"
date-released: 2020-07-23
url: "https://github.com/open-mmlab/mmdetection3d"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 2
  • Watch event: 12
  • Issue comment event: 6
  • Fork event: 3
Last Year
  • Issues event: 2
  • Watch event: 12
  • Issue comment event: 6
  • Fork event: 3

Dependencies

environment.yml pypi
  • absl-py ==1.4.0
  • addict ==2.4.0
  • ansi2html ==1.9.1
  • anyio ==3.6.2
  • appdirs ==1.4.4
  • argon2-cffi ==21.3.0
  • argon2-cffi-bindings ==21.2.0
  • attrs ==23.1.0
  • backcall ==0.2.0
  • beautifulsoup4 ==4.12.2
  • black ==23.3.0
  • bleach ==6.0.0
  • cachetools ==5.3.0
  • ccimport ==0.4.2
  • cffi ==1.15.1
  • charset-normalizer ==3.1.0
  • click ==8.1.3
  • colorama ==0.4.6
  • configargparse ==1.7
  • cumm ==0.4.11
  • cycler ==0.11.0
  • dash ==2.14.2
  • dash-core-components ==2.0.0
  • dash-html-components ==2.0.0
  • dash-table ==5.0.0
  • debugpy ==1.6.7
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • descartes ==1.1.0
  • docker-pycreds ==0.4.0
  • entrypoints ==0.4
  • exceptiongroup ==1.1.1
  • fastjsonschema ==2.16.3
  • fire ==0.5.0
  • flake8 ==3.9.2
  • flask ==2.2.5
  • fonttools ==4.38.0
  • fvcore ==0.1.5.post20221221
  • gitdb ==4.0.10
  • gitpython ==3.1.31
  • google-auth ==2.17.3
  • google-auth-oauthlib ==0.4.6
  • grpcio ==1.54.0
  • idna ==3.4
  • imageio ==2.28.1
  • importlib-metadata ==6.6.0
  • importlib-resources ==5.12.0
  • iniconfig ==2.0.0
  • iopath ==0.1.10
  • ipykernel ==6.16.2
  • ipython ==7.34.0
  • ipython-genutils ==0.2.0
  • ipywidgets ==8.0.6
  • itsdangerous ==2.1.2
  • jedi ==0.18.2
  • jinja2 ==3.1.2
  • joblib ==1.2.0
  • jsonschema ==4.17.3
  • jupyter ==1.0.0
  • jupyter-client ==7.4.9
  • jupyter-console ==6.6.3
  • jupyter-core ==4.12.0
  • jupyter-server ==1.24.0
  • jupyterlab-pygments ==0.2.2
  • jupyterlab-widgets ==3.0.7
  • kiwisolver ==1.4.4
  • lark ==1.1.8
  • llvmlite ==0.36.0
  • lyft-dataset-sdk ==0.0.8
  • markdown ==3.4.3
  • markdown-it-py ==2.2.0
  • markupsafe ==2.1.2
  • matplotlib ==3.5.2
  • matplotlib-inline ==0.1.6
  • mccabe ==0.6.1
  • mdurl ==0.1.2
  • mistune ==2.0.5
  • mmcls ==0.25.0
  • mmcv-full ==1.6.2
  • mmdet ==2.25.2
  • mmengine ==0.7.3
  • mmsegmentation ==0.29.0
  • model-index ==0.1.11
  • mypy-extensions ==1.0.0
  • nbclassic ==1.0.0
  • nbclient ==0.7.4
  • nbconvert ==7.3.1
  • nbformat ==5.7.0
  • nest-asyncio ==1.5.6
  • networkx ==2.2
  • ninja ==1.11.1.1
  • notebook ==6.5.4
  • notebook-shim ==0.2.3
  • numba ==0.53.0
  • numpy ==1.21.6
  • nuscenes-devkit ==1.1.10
  • oauthlib ==3.2.2
  • open3d ==0.17.0
  • opencv-python ==4.7.0.72
  • openmim ==0.3.7
  • ordered-set ==4.1.0
  • packaging ==23.1
  • pandas ==1.3.5
  • pandocfilters ==1.5.0
  • parso ==0.8.3
  • pathspec ==0.11.1
  • pathtools ==0.1.2
  • pccm ==0.4.11
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • pillow ==9.5.0
  • pkgutil-resolve-name ==1.3.10
  • platformdirs ==3.5.0
  • plotly ==5.14.1
  • pluggy ==1.0.0
  • plyfile ==0.9
  • portalocker ==2.7.0
  • prettytable ==3.7.0
  • prometheus-client ==0.16.0
  • prompt-toolkit ==3.0.38
  • protobuf ==3.9.2
  • psutil ==5.9.5
  • ptyprocess ==0.7.0
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pybind11 ==2.11.1
  • pycocotools ==2.0.6
  • pycodestyle ==2.7.0
  • pycparser ==2.21
  • pyflakes ==2.3.1
  • pygments ==2.15.1
  • pyparsing ==3.0.9
  • pyquaternion ==0.9.9
  • pyrsistent ==0.19.3
  • pytest ==7.3.1
  • python-dateutil ==2.8.2
  • pytz ==2023.3
  • pywavelets ==1.3.0
  • pyyaml ==6.0
  • pyzmq ==25.0.2
  • qtconsole ==5.4.3
  • qtpy ==2.3.1
  • requests ==2.30.0
  • requests-oauthlib ==1.3.1
  • retrying ==1.3.4
  • rich ==13.3.5
  • rsa ==4.9
  • scikit-image ==0.19.3
  • scikit-learn ==1.0.2
  • scipy ==1.7.3
  • send2trash ==1.8.2
  • sentry-sdk ==1.22.2
  • setproctitle ==1.3.2
  • setuptools ==59.5.0
  • shapely ==1.8.5
  • six ==1.16.0
  • smmap ==5.0.0
  • sniffio ==1.3.0
  • soupsieve ==2.4.1
  • spconv ==2.3.6
  • tabulate ==0.9.0
  • tenacity ==8.2.2
  • tensorboard ==2.11.2
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • termcolor ==2.3.0
  • terminado ==0.17.1
  • terminaltables ==3.1.10
  • threadpoolctl ==3.1.0
  • tifffile ==2021.11.2
  • tinycss2 ==1.2.1
  • tomli ==2.0.1
  • torch ==1.9.1
  • torch-efficient-distloss ==0.1.3
  • torch-scatter ==2.1.1
  • torchaudio ==0.9.1
  • torchvision ==0.10.1
  • tornado ==6.2
  • tqdm ==4.65.0
  • traitlets ==5.9.0
  • trimesh ==2.35.39
  • typed-ast ==1.5.4
  • typing-extensions ==4.5.0
  • urllib3 ==1.26.15
  • wandb ==0.15.2
  • wcwidth ==0.2.6
  • webencodings ==0.5.1
  • websocket-client ==1.5.1
  • werkzeug ==2.2.3
  • widgetsnbextension ==4.0.7
  • yacs ==0.1.8
  • yapf ==0.33.0
  • zipp ==3.15.0
requirements/build.txt pypi
requirements/docs.txt pypi
  • docutils ==0.16.0
  • m2r *
  • mistune ==0.8.4
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
requirements/mminstall.txt pypi
  • mmcv-full >=1.4.8,<=1.6.0
  • mmdet >=2.24.0,<=3.0.0
  • mmsegmentation >=0.20.0,<=1.0.0
requirements/optional.txt pypi
  • open3d *
  • spconv *
  • waymo-open-dataset-tf-2-1-0 ==1.2.0
requirements/readthedocs.txt pypi
  • mmcv >=1.4.8
  • mmdet >=2.24.0
  • mmsegmentation >=0.20.1
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • lyft_dataset_sdk *
  • networkx >=2.2,<2.3
  • numba ==0.53.0
  • numpy *
  • nuscenes-devkit *
  • plyfile *
  • scikit-image *
  • tensorboard *
  • trimesh >=2.35.39,<2.35.40
requirements/tests.txt pypi
  • asynctest * test
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort * test
  • kwarray * test
  • pytest * test
  • pytest-cov * test
  • pytest-runner * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi
setup.py pypi