diffbev

Official PyTorch implementation for a conditional diffusion probability model in BEV perception

https://github.com/jiayuzou2020/diffbev

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary

Keywords

3d-detection bev-perception diffusion-models semantic-segmentation
Last synced: 4 months ago · JSON representation ·

Repository

Official PyTorch implementation for a conditional diffusion probability model in BEV perception

Basic Info
Statistics
  • Stars: 245
  • Watchers: 6
  • Forks: 12
  • Open Issues: 13
  • Releases: 0
Topics
3d-detection bev-perception diffusion-models semantic-segmentation
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

Conditional diffusion probability model for BEV perception

Arxiv Abstract Dataset Installation Visualization Citation Acknowledgement

Arxiv

https://arxiv.org/abs/2303.08333

Abstract

BEV perception is of great importance in the field of autonomous driving, serving as the cornerstone of planning, controlling, and motion prediction. The quality of the BEV feature highly affects the performance of BEV perception. However, taking the noises in camera parameters and LiDAR scans into consideration, we usually obtain BEV representation with harmful noises. Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation. In this work, we propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation. To the best of our knowledge, we are the first to apply diffusion model to BEV perception. In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature in a progressive way. What's more, a cross-attention module is leveraged to fuse the context of BEV feature and the semantic content of conditional diffusion model. DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach. Quantitative and qualitative results on multiple benchmarks demonstrate the effectiveness of DiffBEV in BEV semantic segmentation and 3D object detection tasks. framework

Dataset

Download Datasets From Official Websites

Extensive experiments are conducted on the nuScenes, [KITTI Raw](https://www.cvlibs.net/datasets/kitti/rawdata.php), [KITTI Odometry](https://www.cvlibs.net/datasets/kitti/evalodometry.php), and [KITTI 3D Object](https://www.cvlibs.net/datasets/kitti/eval3dobject.php)_ benchmarks.

Prepare Depth Maps

Follow the script to generate depth maps for KITTI datasets. The depth maps of KITTI datasets are available at Google Drive and Baidu Net Disk. We also provide the script to get the depth map for nuScenes dataset. Replace the dataset path in the script accroding to your dataset directory.

Dataset Processing

After downing these datasets, we need to generate the annotations in BEV. Follow the instructions below to get the corresponding annotations.

nuScenes

Run the script makenusceneslabels to get the BEV annotation for the nuScenes benchmark. Please follow here to generate the BEV annotation (annbevdir) for KITTI datasets.

KITTI Datasets

Follow the instruction to get the BEV annotations for KITTI Raw, KITTI Odometry, and KITTI 3D Object datasets.

The datasets' structure is organized as follows. data ├── nuscenes ├── img_dir ├── train ├── val ├── ann_bev_dir ├── train ├── val ├── train_depth ├── val_depth ├── calib.json ├── kitti_processed ├── kitti_raw ├── img_dir ├── train ├── val ├── ann_bev_dir ├── train ├── val ├── train_depth ├── val_depth ├── calib.json ├── kitti_odometry ├── img_dir ├── train ├── val ├── ann_bev_dir ├── train ├── val ├── train_depth ├── val_depth ├── calib.json ├── kitti_object ├── img_dir ├── train ├── val ├── ann_bev_dir ├── train ├── val ├── train_depth ├── val_depth ├── calib.json

Prepare Calibration Files

For the camera parameters on each dataset, we write them into the corresponding calib.json file. For each dataset, we upload the _calib.json to _Google Drive and Baidu Net Disk.

Please change the dataset path according to the real data directory in the [nuScenes, KITTI Raw, KITTI Odometry, and KITTI 3D Object dataset configurations](https://github.com/JiayuZou2020/DiffBEV/tree/main/configs/base/datasets). Modify the path of pretrained model in model configurations.

Installation

DiffBEV is tested on: * Python 3.7/3.8 * CUDA 11.1 * Torch 1.9.1

Please check install for installation. * Create a conda environment for the project. python conda create -n diffbev python=3.7 conda activate diffbev * Install Pytorch following the instruction. conda install pytorch torchvision -c pytorch * Install mmcv

python pip install -U openmim mim install mmcv-full * Git clone this repository

python git clone https://github.com/JiayuZou2020/DiffBEV.git

  • Install and compile the required packages. python cd DiffBEV pip install -v -e .

Visualization

vis

Citation

If you find our work is helpful for your research, please consider citing as follows. @article{zou2023diffbev, title={DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception}, author={Jiayu, Zou and Zheng, Zhu and Yun, Ye and Xingang, Wang}, journal={arXiv preprint arXiv:2303.08333}, year={2023} }

Acknowledgement

Our work is partially based on the following open-sourced projects: mmsegmentation, VPN, PYVA, PON, LSS. Thanks for their contribution to the research community of BEV perception.

Owner

  • Login: JiayuZou2020
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMSegmentation Contributors"
title: "OpenMMLab Semantic Segmentation Toolbox and Benchmark"
date-released: 2020-07-10
url: "https://github.com/open-mmlab/mmsegmentation"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 2
  • Watch event: 23
  • Fork event: 2
Last Year
  • Issues event: 2
  • Watch event: 23
  • Fork event: 2

Dependencies

docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
mmsegmentation.egg-info/requires.txt pypi
  • cityscapesscripts *
  • codecov *
  • flake8 *
  • interrogate *
  • isort ==4.3.21
  • matplotlib *
  • numpy *
  • packaging *
  • prettytable *
  • pytest *
  • xdoctest >=0.10.0
  • yapf *
requirements/docs.txt pypi
  • docutils ==0.16.0
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx_copybutton *
  • sphinx_markdown_tables *
requirements/mminstall.txt pypi
  • mmcv-full >=1.3.1,<=1.4.0
requirements/optional.txt pypi
  • cityscapesscripts *
requirements/readthedocs.txt pypi
  • mmcv *
  • prettytable *
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • matplotlib *
  • numpy *
  • packaging *
  • prettytable *
requirements/tests.txt pypi
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • pytest * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi
  • Babel ==2.7.0
  • Bottleneck ==1.2.1
  • Cython ==0.29.13
  • Flask ==1.1.1
  • HeapDict ==1.0.1
  • Jinja2 ==2.10.3
  • KNN-CUDA ==0.2
  • Markdown ==3.4.1
  • MarkupSafe ==2.1.1
  • PIMS ==0.6.0
  • Pillow ==8.4.0
  • PyOpenGL ==3.1.0
  • PySocks ==1.7.1
  • PyTurboJPEG ==1.6.6
  • PyWavelets ==1.0.3
  • PyYAML ==5.1.2
  • Pygments ==2.4.2
  • QtAwesome ==0.6.0
  • QtPy ==1.9.0
  • SQLAlchemy ==1.3.9
  • SecretStorage ==3.1.1
  • Send2Trash ==1.5.0
  • Shapely ==1.8.0
  • SoundFile ==0.10.3.post1
  • Sphinx ==5.0.2
  • Werkzeug ==2.2.0
  • XlsxWriter ==1.2.1
  • absl-py ==1.2.0
  • addict ==2.4.0
  • alabaster ==0.7.12
  • anaconda-client ==1.7.2
  • anaconda-navigator ==1.9.7
  • anaconda-project ==0.8.3
  • antlr4-python3-runtime ==4.8
  • appdirs ==1.4.4
  • asn1crypto ==1.0.1
  • astroid ==2.3.1
  • astropy ==3.2.2
  • atomicwrites ==1.3.0
  • attrs ==19.2.0
  • audioread ==2.1.9
  • autopep8 ==1.6.0
  • backcall ==0.1.0
  • backports.functools-lru-cache ==1.5
  • backports.os ==0.1.1
  • backports.shutil-get-terminal-size ==1.0.0
  • backports.tempfile ==1.0
  • backports.weakref ==1.0.post1
  • beautifulsoup4 ==4.8.0
  • bitarray ==1.0.1
  • bkcharts ==0.2
  • black ==22.3.0
  • bleach ==3.1.0
  • blobfile ==2.0.0
  • bokeh ==1.3.4
  • boto ==2.49.0
  • cPython ==0.0.6
  • cachetools ==4.2.4
  • certifi ==2022.9.24
  • cffi ==1.12.3
  • chamfer ==2.0.0
  • chardet ==3.0.4
  • chumpy ==0.70
  • click ==8.1.1
  • cloudpickle ==1.2.2
  • clyent ==1.2.2
  • colorama ==0.4.1
  • colour ==0.1.5
  • conda ==22.11.0
  • conda-build ==3.23.2
  • conda-package-handling ==1.6.0
  • conda-verify ==3.4.2
  • contextlib2 ==0.6.0
  • coverage ==6.3.2
  • cryptography ==2.7
  • cycler ==0.10.0
  • cytoolz ==0.10.0
  • dask ==2.5.2
  • decorator ==4.4.0
  • decord ==0.6.0
  • defusedxml ==0.6.0
  • descartes ==1.1.0
  • dgl-cu111 ==0.6.1
  • dglgo ==0.0.1
  • distributed ==2.5.2
  • docutils ==0.15.2
  • easydict ==1.9
  • efficientnet-pytorch ==0.7.1
  • einops ==0.3.2
  • emd-ext ==0.0.0
  • entrypoints ==0.3
  • et-xmlfile ==1.0.1
  • fastcache ==1.1.0
  • filelock ==3.0.12
  • fire ==0.4.0
  • freetype-py ==2.2.0
  • fsspec ==0.5.2
  • future ==0.17.1
  • gevent ==1.4.0
  • glob2 ==0.7
  • gmpy2 ==2.0.8
  • google-auth ==2.9.1
  • google-auth-oauthlib ==0.4.6
  • greenlet ==0.4.15
  • grpcio ==1.47.0
  • h5py ==2.9.0
  • html5lib ==1.0.1
  • hydra-core ==1.1.0
  • idna ==2.8
  • imageio ==2.6.0
  • imageio-ffmpeg ==0.4.7
  • imagesize ==1.1.0
  • importlib-metadata ==4.12.0
  • importlib-resources ==5.4.0
  • interrogate ==1.5.0
  • ipykernel ==5.1.2
  • ipython ==7.8.0
  • ipython-genutils ==0.2.0
  • ipywidgets ==7.5.1
  • isort ==5.10.1
  • itsdangerous ==1.1.0
  • jdcal ==1.4.1
  • jedi ==0.15.1
  • jeepney ==0.4.1
  • joblib ==1.1.1
  • json-tricks ==3.15.5
  • json5 ==0.8.5
  • jsonschema ==3.0.2
  • jupyter ==1.0.0
  • jupyter-client ==5.3.3
  • jupyter-console ==6.0.0
  • jupyter-core ==4.5.0
  • jupyterlab ==1.1.4
  • jupyterlab-server ==1.0.6
  • keyring ==18.0.0
  • kiwisolver ==1.1.0
  • lap ==0.4.0
  • lazy-object-proxy ==1.4.2
  • libarchive-c ==2.8
  • librosa ==0.9.1
  • lief ==0.9.0
  • llvmlite ==0.29.0
  • locket ==0.2.0
  • lxml ==4.9.1
  • matplotlib ==3.1.1
  • mccabe ==0.6.1
  • mistune ==0.8.4
  • mkl-fft ==1.0.14
  • mkl-random ==1.1.0
  • mkl-service ==2.3.0
  • mmcv-full ==1.3.15
  • mock ==3.0.5
  • more-itertools ==7.2.0
  • motmetrics ==1.1.3
  • moviepy ==1.0.3
  • mpi4py ==3.0.3
  • mpmath ==1.1.0
  • msgpack ==0.6.1
  • multipledispatch ==0.6.0
  • munkres ==1.1.4
  • mypy-extensions ==0.4.3
  • navigator-updater ==0.2.1
  • nbconvert ==5.6.0
  • nbformat ==4.4.0
  • networkx ==2.3
  • nltk ==3.4.5
  • nose ==1.3.7
  • notebook ==6.0.1
  • numba ==0.45.1
  • numexpr ==2.7.0
  • numpy ==1.21.6
  • numpydoc ==1.5.0
  • nuscenes-devkit ==1.1.9
  • oauthlib ==3.2.0
  • olefile ==0.46
  • omegaconf ==2.1.0
  • open3d ==0.9.0.0
  • opencv-contrib-python ==4.0.0.21
  • opencv-python ==4.1.0.25
  • openpyxl ==3.0.0
  • packaging ==21.3
  • pandas ==0.25.1
  • pandocfilters ==1.4.2
  • parso ==0.5.1
  • partd ==1.0.0
  • path.py ==12.0.1
  • pathlib2 ==2.3.5
  • pathspec ==0.9.0
  • patsy ==0.5.1
  • pep8 ==1.7.1
  • pexpect ==4.7.0
  • pickleshare ==0.7.5
  • pkginfo ==1.5.0.1
  • platformdirs ==2.5.2
  • pluggy ==1.0.0
  • ply ==3.11
  • polars ==0.11.0
  • pooch ==1.6.0
  • prettytable ==2.2.1
  • proglog ==0.1.9
  • progressbar ==2.5
  • prometheus-client ==0.7.1
  • prompt-toolkit ==2.0.10
  • protobuf ==3.19.1
  • psutil ==5.6.3
  • ptyprocess ==0.6.0
  • py ==1.8.0
  • pyOpenSSL ==19.0.0
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pycocotools ==2.0.4
  • pycodestyle ==2.8.0
  • pycosat ==0.6.3
  • pycparser ==2.19
  • pycrypto ==2.6.1
  • pycryptodomex ==3.16.0
  • pycurl ==7.43.0.3
  • pydantic ==1.9.1
  • pyflakes ==2.1.1
  • pyglet ==1.5.23
  • pylint ==2.4.2
  • pymongo ==4.1.1
  • pyntcloud ==0.1.5
  • pyodbc ==4.0.27
  • pyparsing ==2.4.2
  • pyquaternion ==0.9.9
  • pyrender ==0.1.45
  • pyrsistent ==0.15.4
  • pytest ==4.4.2
  • pytest-arraydiff ==0.3
  • pytest-astropy ==0.5.0
  • pytest-doctestplus ==0.4.0
  • pytest-openfiles ==0.4.0
  • pytest-remotedata ==0.3.2
  • python-dateutil ==2.8.0
  • pytz ==2019.3
  • pyzmq ==18.1.0
  • qtconsole ==4.5.5
  • regex ==2022.4.24
  • requests ==2.22.0
  • requests-oauthlib ==1.3.1
  • resampy ==0.2.2
  • rope ==0.14.0
  • rsa ==4.9
  • ruamel-yaml ==0.15.46
  • ruamel.yaml ==0.17.21
  • ruamel.yaml.clib ==0.2.6
  • scikit-image ==0.15.0
  • scikit-learn ==0.21.3
  • scipy ==1.7.3
  • seaborn ==0.9.0
  • shutup ==0.2.0
  • simplegeneric ==0.8.1
  • singledispatch ==3.4.0.3
  • six ==1.12.0
  • sklearn ==0.0
  • slicerator ==1.1.0
  • smplx ==0.1.28
  • snowballstemmer ==2.0.0
  • some-package ==0.1
  • sortedcollections ==1.1.2
  • sortedcontainers ==2.1.0
  • soupsieve ==1.9.3
  • sphinxcontrib-applehelp ==1.0.1
  • sphinxcontrib-devhelp ==1.0.1
  • sphinxcontrib-htmlhelp ==2.0.0
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.2
  • sphinxcontrib-serializinghtml ==1.1.5
  • sphinxcontrib-websupport ==1.1.2
  • spyder ==3.3.6
  • spyder-kernels ==0.5.2
  • statsmodels ==0.10.1
  • sympy ==1.4
  • tables ==3.5.2
  • tabulate ==0.8.9
  • tblib ==1.4.0
  • tensorboard ==2.9.1
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • tensorboardX ==2.1
  • termcolor ==1.1.0
  • terminado ==0.8.2
  • terminaltables ==3.1.0
  • testpath ==0.4.2
  • timm ==0.3.2
  • toml ==0.10.2
  • tomli ==2.0.1
  • toolz ==0.10.0
  • torch ==1.9.1
  • torchvision ==0.10.1
  • tornado ==6.0.3
  • tqdm ==4.36.1
  • traitlets ==4.3.3
  • transforms3d ==0.4.1
  • trimesh ==3.10.8
  • ttach ==0.0.3
  • typed-ast ==1.5.3
  • typer ==0.4.1
  • typing-extensions ==3.10.0.2
  • unicodecsv ==0.14.1
  • urllib3 ==1.26.13
  • wcwidth ==0.1.7
  • webcolors ==1.11.1
  • webencodings ==0.5.1
  • widgetsnbextension ==3.5.1
  • wrapt ==1.11.2
  • wurlitzer ==1.0.3
  • xdoctest ==1.0.0
  • xlrd ==1.2.0
  • xlwt ==1.3.0
  • xtcocotools ==1.11.5
  • yapf ==0.31.0
  • zict ==1.0.0
  • zipp ==3.8.0