diffbev
Official PyTorch implementation for a conditional diffusion probability model in BEV perception
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Keywords
Repository
Official PyTorch implementation for a conditional diffusion probability model in BEV perception
Basic Info
- Host: GitHub
- Owner: JiayuZou2020
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2303.08333
- Size: 5 MB
Statistics
- Stars: 245
- Watchers: 6
- Forks: 12
- Open Issues: 13
- Releases: 0
Topics
Metadata Files
README.md
DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception
Arxiv
https://arxiv.org/abs/2303.08333
Abstract
BEV perception is of great importance in the field of autonomous driving, serving as the cornerstone of planning, controlling, and motion prediction. The quality of the BEV feature highly affects the performance of BEV perception. However, taking the noises in camera parameters and LiDAR scans into consideration, we usually obtain BEV representation with harmful noises. Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation. In this work, we propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation. To the best of our knowledge, we are the first to apply diffusion model to BEV perception. In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature in a progressive way. What's more, a cross-attention module is leveraged to fuse the context of BEV feature and the semantic content of conditional diffusion model. DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach. Quantitative and qualitative results on multiple benchmarks demonstrate the effectiveness of DiffBEV in BEV semantic segmentation and 3D object detection tasks.

Dataset
Download Datasets From Official Websites
Extensive experiments are conducted on the nuScenes, [KITTI Raw](https://www.cvlibs.net/datasets/kitti/rawdata.php), [KITTI Odometry](https://www.cvlibs.net/datasets/kitti/evalodometry.php), and [KITTI 3D Object](https://www.cvlibs.net/datasets/kitti/eval3dobject.php)_ benchmarks.
Prepare Depth Maps
Follow the script to generate depth maps for KITTI datasets. The depth maps of KITTI datasets are available at Google Drive and Baidu Net Disk. We also provide the script to get the depth map for nuScenes dataset. Replace the dataset path in the script accroding to your dataset directory.
Dataset Processing
After downing these datasets, we need to generate the annotations in BEV. Follow the instructions below to get the corresponding annotations.
nuScenes
Run the script makenusceneslabels to get the BEV annotation for the nuScenes benchmark. Please follow here to generate the BEV annotation (annbevdir) for KITTI datasets.
KITTI Datasets
Follow the instruction to get the BEV annotations for KITTI Raw, KITTI Odometry, and KITTI 3D Object datasets.
The datasets' structure is organized as follows.
data
├── nuscenes
├── img_dir
├── train
├── val
├── ann_bev_dir
├── train
├── val
├── train_depth
├── val_depth
├── calib.json
├── kitti_processed
├── kitti_raw
├── img_dir
├── train
├── val
├── ann_bev_dir
├── train
├── val
├── train_depth
├── val_depth
├── calib.json
├── kitti_odometry
├── img_dir
├── train
├── val
├── ann_bev_dir
├── train
├── val
├── train_depth
├── val_depth
├── calib.json
├── kitti_object
├── img_dir
├── train
├── val
├── ann_bev_dir
├── train
├── val
├── train_depth
├── val_depth
├── calib.json
Prepare Calibration Files
For the camera parameters on each dataset, we write them into the corresponding calib.json file. For each dataset, we upload the _calib.json to _Google Drive and Baidu Net Disk.
Please change the dataset path according to the real data directory in the [nuScenes, KITTI Raw, KITTI Odometry, and KITTI 3D Object dataset configurations](https://github.com/JiayuZou2020/DiffBEV/tree/main/configs/base/datasets). Modify the path of pretrained model in model configurations.
Installation
DiffBEV is tested on: * Python 3.7/3.8 * CUDA 11.1 * Torch 1.9.1
Please check install for installation.
* Create a conda environment for the project.
python
conda create -n diffbev python=3.7
conda activate diffbev
* Install Pytorch following the instruction.
conda install pytorch torchvision -c pytorch
* Install mmcv
python
pip install -U openmim
mim install mmcv-full
* Git clone this repository
python
git clone https://github.com/JiayuZou2020/DiffBEV.git
- Install and compile the required packages.
python cd DiffBEV pip install -v -e .
Visualization

Citation
If you find our work is helpful for your research, please consider citing as follows.
@article{zou2023diffbev,
title={DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception},
author={Jiayu, Zou and Zheng, Zhu and Yun, Ye and Xingang, Wang},
journal={arXiv preprint arXiv:2303.08333},
year={2023}
}
Acknowledgement
Our work is partially based on the following open-sourced projects: mmsegmentation, VPN, PYVA, PON, LSS. Thanks for their contribution to the research community of BEV perception.
Owner
- Login: JiayuZou2020
- Kind: user
- Repositories: 1
- Profile: https://github.com/JiayuZou2020
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMSegmentation Contributors" title: "OpenMMLab Semantic Segmentation Toolbox and Benchmark" date-released: 2020-07-10 url: "https://github.com/open-mmlab/mmsegmentation" license: Apache-2.0
GitHub Events
Total
- Issues event: 2
- Watch event: 23
- Fork event: 2
Last Year
- Issues event: 2
- Watch event: 23
- Fork event: 2
Dependencies
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- cityscapesscripts *
- codecov *
- flake8 *
- interrogate *
- isort ==4.3.21
- matplotlib *
- numpy *
- packaging *
- prettytable *
- pytest *
- xdoctest >=0.10.0
- yapf *
- docutils ==0.16.0
- myst-parser *
- sphinx ==4.0.2
- sphinx_copybutton *
- sphinx_markdown_tables *
- mmcv-full >=1.3.1,<=1.4.0
- cityscapesscripts *
- mmcv *
- prettytable *
- torch *
- torchvision *
- matplotlib *
- numpy *
- packaging *
- prettytable *
- codecov * test
- flake8 * test
- interrogate * test
- isort ==4.3.21 test
- pytest * test
- xdoctest >=0.10.0 test
- yapf * test
- Babel ==2.7.0
- Bottleneck ==1.2.1
- Cython ==0.29.13
- Flask ==1.1.1
- HeapDict ==1.0.1
- Jinja2 ==2.10.3
- KNN-CUDA ==0.2
- Markdown ==3.4.1
- MarkupSafe ==2.1.1
- PIMS ==0.6.0
- Pillow ==8.4.0
- PyOpenGL ==3.1.0
- PySocks ==1.7.1
- PyTurboJPEG ==1.6.6
- PyWavelets ==1.0.3
- PyYAML ==5.1.2
- Pygments ==2.4.2
- QtAwesome ==0.6.0
- QtPy ==1.9.0
- SQLAlchemy ==1.3.9
- SecretStorage ==3.1.1
- Send2Trash ==1.5.0
- Shapely ==1.8.0
- SoundFile ==0.10.3.post1
- Sphinx ==5.0.2
- Werkzeug ==2.2.0
- XlsxWriter ==1.2.1
- absl-py ==1.2.0
- addict ==2.4.0
- alabaster ==0.7.12
- anaconda-client ==1.7.2
- anaconda-navigator ==1.9.7
- anaconda-project ==0.8.3
- antlr4-python3-runtime ==4.8
- appdirs ==1.4.4
- asn1crypto ==1.0.1
- astroid ==2.3.1
- astropy ==3.2.2
- atomicwrites ==1.3.0
- attrs ==19.2.0
- audioread ==2.1.9
- autopep8 ==1.6.0
- backcall ==0.1.0
- backports.functools-lru-cache ==1.5
- backports.os ==0.1.1
- backports.shutil-get-terminal-size ==1.0.0
- backports.tempfile ==1.0
- backports.weakref ==1.0.post1
- beautifulsoup4 ==4.8.0
- bitarray ==1.0.1
- bkcharts ==0.2
- black ==22.3.0
- bleach ==3.1.0
- blobfile ==2.0.0
- bokeh ==1.3.4
- boto ==2.49.0
- cPython ==0.0.6
- cachetools ==4.2.4
- certifi ==2022.9.24
- cffi ==1.12.3
- chamfer ==2.0.0
- chardet ==3.0.4
- chumpy ==0.70
- click ==8.1.1
- cloudpickle ==1.2.2
- clyent ==1.2.2
- colorama ==0.4.1
- colour ==0.1.5
- conda ==22.11.0
- conda-build ==3.23.2
- conda-package-handling ==1.6.0
- conda-verify ==3.4.2
- contextlib2 ==0.6.0
- coverage ==6.3.2
- cryptography ==2.7
- cycler ==0.10.0
- cytoolz ==0.10.0
- dask ==2.5.2
- decorator ==4.4.0
- decord ==0.6.0
- defusedxml ==0.6.0
- descartes ==1.1.0
- dgl-cu111 ==0.6.1
- dglgo ==0.0.1
- distributed ==2.5.2
- docutils ==0.15.2
- easydict ==1.9
- efficientnet-pytorch ==0.7.1
- einops ==0.3.2
- emd-ext ==0.0.0
- entrypoints ==0.3
- et-xmlfile ==1.0.1
- fastcache ==1.1.0
- filelock ==3.0.12
- fire ==0.4.0
- freetype-py ==2.2.0
- fsspec ==0.5.2
- future ==0.17.1
- gevent ==1.4.0
- glob2 ==0.7
- gmpy2 ==2.0.8
- google-auth ==2.9.1
- google-auth-oauthlib ==0.4.6
- greenlet ==0.4.15
- grpcio ==1.47.0
- h5py ==2.9.0
- html5lib ==1.0.1
- hydra-core ==1.1.0
- idna ==2.8
- imageio ==2.6.0
- imageio-ffmpeg ==0.4.7
- imagesize ==1.1.0
- importlib-metadata ==4.12.0
- importlib-resources ==5.4.0
- interrogate ==1.5.0
- ipykernel ==5.1.2
- ipython ==7.8.0
- ipython-genutils ==0.2.0
- ipywidgets ==7.5.1
- isort ==5.10.1
- itsdangerous ==1.1.0
- jdcal ==1.4.1
- jedi ==0.15.1
- jeepney ==0.4.1
- joblib ==1.1.1
- json-tricks ==3.15.5
- json5 ==0.8.5
- jsonschema ==3.0.2
- jupyter ==1.0.0
- jupyter-client ==5.3.3
- jupyter-console ==6.0.0
- jupyter-core ==4.5.0
- jupyterlab ==1.1.4
- jupyterlab-server ==1.0.6
- keyring ==18.0.0
- kiwisolver ==1.1.0
- lap ==0.4.0
- lazy-object-proxy ==1.4.2
- libarchive-c ==2.8
- librosa ==0.9.1
- lief ==0.9.0
- llvmlite ==0.29.0
- locket ==0.2.0
- lxml ==4.9.1
- matplotlib ==3.1.1
- mccabe ==0.6.1
- mistune ==0.8.4
- mkl-fft ==1.0.14
- mkl-random ==1.1.0
- mkl-service ==2.3.0
- mmcv-full ==1.3.15
- mock ==3.0.5
- more-itertools ==7.2.0
- motmetrics ==1.1.3
- moviepy ==1.0.3
- mpi4py ==3.0.3
- mpmath ==1.1.0
- msgpack ==0.6.1
- multipledispatch ==0.6.0
- munkres ==1.1.4
- mypy-extensions ==0.4.3
- navigator-updater ==0.2.1
- nbconvert ==5.6.0
- nbformat ==4.4.0
- networkx ==2.3
- nltk ==3.4.5
- nose ==1.3.7
- notebook ==6.0.1
- numba ==0.45.1
- numexpr ==2.7.0
- numpy ==1.21.6
- numpydoc ==1.5.0
- nuscenes-devkit ==1.1.9
- oauthlib ==3.2.0
- olefile ==0.46
- omegaconf ==2.1.0
- open3d ==0.9.0.0
- opencv-contrib-python ==4.0.0.21
- opencv-python ==4.1.0.25
- openpyxl ==3.0.0
- packaging ==21.3
- pandas ==0.25.1
- pandocfilters ==1.4.2
- parso ==0.5.1
- partd ==1.0.0
- path.py ==12.0.1
- pathlib2 ==2.3.5
- pathspec ==0.9.0
- patsy ==0.5.1
- pep8 ==1.7.1
- pexpect ==4.7.0
- pickleshare ==0.7.5
- pkginfo ==1.5.0.1
- platformdirs ==2.5.2
- pluggy ==1.0.0
- ply ==3.11
- polars ==0.11.0
- pooch ==1.6.0
- prettytable ==2.2.1
- proglog ==0.1.9
- progressbar ==2.5
- prometheus-client ==0.7.1
- prompt-toolkit ==2.0.10
- protobuf ==3.19.1
- psutil ==5.6.3
- ptyprocess ==0.6.0
- py ==1.8.0
- pyOpenSSL ==19.0.0
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pycocotools ==2.0.4
- pycodestyle ==2.8.0
- pycosat ==0.6.3
- pycparser ==2.19
- pycrypto ==2.6.1
- pycryptodomex ==3.16.0
- pycurl ==7.43.0.3
- pydantic ==1.9.1
- pyflakes ==2.1.1
- pyglet ==1.5.23
- pylint ==2.4.2
- pymongo ==4.1.1
- pyntcloud ==0.1.5
- pyodbc ==4.0.27
- pyparsing ==2.4.2
- pyquaternion ==0.9.9
- pyrender ==0.1.45
- pyrsistent ==0.15.4
- pytest ==4.4.2
- pytest-arraydiff ==0.3
- pytest-astropy ==0.5.0
- pytest-doctestplus ==0.4.0
- pytest-openfiles ==0.4.0
- pytest-remotedata ==0.3.2
- python-dateutil ==2.8.0
- pytz ==2019.3
- pyzmq ==18.1.0
- qtconsole ==4.5.5
- regex ==2022.4.24
- requests ==2.22.0
- requests-oauthlib ==1.3.1
- resampy ==0.2.2
- rope ==0.14.0
- rsa ==4.9
- ruamel-yaml ==0.15.46
- ruamel.yaml ==0.17.21
- ruamel.yaml.clib ==0.2.6
- scikit-image ==0.15.0
- scikit-learn ==0.21.3
- scipy ==1.7.3
- seaborn ==0.9.0
- shutup ==0.2.0
- simplegeneric ==0.8.1
- singledispatch ==3.4.0.3
- six ==1.12.0
- sklearn ==0.0
- slicerator ==1.1.0
- smplx ==0.1.28
- snowballstemmer ==2.0.0
- some-package ==0.1
- sortedcollections ==1.1.2
- sortedcontainers ==2.1.0
- soupsieve ==1.9.3
- sphinxcontrib-applehelp ==1.0.1
- sphinxcontrib-devhelp ==1.0.1
- sphinxcontrib-htmlhelp ==2.0.0
- sphinxcontrib-jsmath ==1.0.1
- sphinxcontrib-qthelp ==1.0.2
- sphinxcontrib-serializinghtml ==1.1.5
- sphinxcontrib-websupport ==1.1.2
- spyder ==3.3.6
- spyder-kernels ==0.5.2
- statsmodels ==0.10.1
- sympy ==1.4
- tables ==3.5.2
- tabulate ==0.8.9
- tblib ==1.4.0
- tensorboard ==2.9.1
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- tensorboardX ==2.1
- termcolor ==1.1.0
- terminado ==0.8.2
- terminaltables ==3.1.0
- testpath ==0.4.2
- timm ==0.3.2
- toml ==0.10.2
- tomli ==2.0.1
- toolz ==0.10.0
- torch ==1.9.1
- torchvision ==0.10.1
- tornado ==6.0.3
- tqdm ==4.36.1
- traitlets ==4.3.3
- transforms3d ==0.4.1
- trimesh ==3.10.8
- ttach ==0.0.3
- typed-ast ==1.5.3
- typer ==0.4.1
- typing-extensions ==3.10.0.2
- unicodecsv ==0.14.1
- urllib3 ==1.26.13
- wcwidth ==0.1.7
- webcolors ==1.11.1
- webencodings ==0.5.1
- widgetsnbextension ==3.5.1
- wrapt ==1.11.2
- wurlitzer ==1.0.3
- xdoctest ==1.0.0
- xlrd ==1.2.0
- xlwt ==1.3.0
- xtcocotools ==1.11.5
- yapf ==0.31.0
- zict ==1.0.0
- zipp ==3.8.0