veon

[ECCV2024] VEON: Vocabulary-Enhanced Occupancy Prediction

https://github.com/vision-sjtu/veon

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization vision-sjtu has institutional domain (vision.sjtu.edu.cn)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

[ECCV2024] VEON: Vocabulary-Enhanced Occupancy Prediction

Basic Info
  • Host: GitHub
  • Owner: VISION-SJTU
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 11.7 MB
Statistics
  • Stars: 9
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

VEON

Introduction

This is the official implementation of VEON (ECCV2024).

This repo includes the reproduced version of VEON, and its extended journal variant VEON. The models learn *Vocabulary-Enhanced 3D representation for Open-vocabulary **Occupancy PredictioN in the autonomous driving scenario.

Occupancy prediction results of our VEON models

The recipe of VEON is assembling and adapting both a depth foundation model and a vision-language foundation model for 3D open-vocabulary occupancy. Please refer to our paper for model details.

Get Started

Suppose the VEON codebase path is ${VEON_HOME}. Then, follow the subsequent procedures.

Step 1: Prepare Environment

Prepare the base environments (BEVDet & SAN & Depth).

Step 1.1 BEVDet Environment

Please prepare the environment as that in BEVDet. VEON directly adopts the BEVDet framework (v2.1) for development.

Step 1.2 SAN Environment

Please prepare the environment as that in SAN. VEON integrates SAN into the BEVDet framework for open-vocabulary recognition.

Then, download the pretrained SAN checkpoints (sanvitb_16.pth and sanvitlarge_14.pth) into folder ${VEON_HOME}/ckpts/clipsan. Then, run the following script to reformat them. shell script cd ${VEON_HOME} python tools/misc/process_san_pth.py The reformatted checkpoints are named SAN_ViT-B.pth and SAN_ViT-L.pth, also placed in folder ${VEON_HOME}/ckpts/clipsan. The path and names of these checkpoints can be revised if you are familiar with the config files.

Note 1: For those having network problem for automatically downloading openai CLIP backbones (ViT-B-16.pt and ViT-L-14-336px.pt) in function open_clip.create_model_and_transforms(), you may need to manually download the pretrained weights, and load them offline from the disk.

Note 2: Environments of SAN and BEVDet are basically compatible with each other, but you may need to install detectron2 with detectron2-xyz for compatibility in certain Python versions (e.g., Python 3.7).

Step 1.3 Depth Environment

Prepare the depth environments. There exists two branches that can be conducted according to the depth foundation model you leverage, including Branch 1.3.1 for ZoeDepth, and Branch 1.3.2 for DepthAnythingV2.

In fact, we use ZoeDepth variants for most experiments in our paper, but DepthAnythingV2 variants are often more stable and well-performing. Thus, we strongly recommend using the DepthAnythingV2 variants.

Branch 1.3.1 MiDaS Environment

If you adopt MiDaS + ZoeDepth as the depth foundation model, please prepare the environment as that in ZoeDepth.

Then, download the pretrained ZoeDepth-NK model (ZoeDM12NK.pt) and place it into the folder ${VEON_HOME}/ckpts/zoedepth. Run the following script to reformat it. shell script cd ${VEON_HOME} python tools/misc/process_zoe_pth.py The reformatted checkpoint is named ZoeD_M12_NK_p.pt and also placed in folder ${VEON_HOME}/ckpts/zoedepth.

Branch 1.3.2 DepthAnythingV2 Environment

If you adopt DepthAnythingV2 as the depth foundation model, please prepare the environment as that in Depth-Anthing-V2/metric.

Then, download the pretrained DA-V2 models (outdoor metric models including depthanythingv2metricvkitti_vitb.pth and depthanythingv2metricvkitti_vitl.pth) and place them into the folder ${VEON_HOME}/ckpts/depthanythingv2.

Step 2: Prepare nuScenes Dataset

Prepare the nuScenes dataset folder as introduced in nuscenes_det.md and create the pkl files for BEVDet by running the following script. shell script cd ${VEON_HOME} python tools/create_data_bevdet.py

Please refer to issues of BEVDet if you are faced with any problems on nuScenes.

Step 3: Prepare Task Materials

This repository supports both Occ3D-nuScenes close-set occupancy dataset and POP-3D language-driven retrieval benchmark.

Step 3.1 Occ3D-nuScenes Dataset

For the close-set Occ3D-nuScenes occupancy prediction task, download (only) the 'gts' from CVPR2023-3D-Occupancy-Prediction and arrange the nuScenes dataset folder ${VEON_HOME}/data/nuscenes as: shell script └── data └── nuscenes ├── v1.0-trainval (existing) ├── sweeps (existing) ├── samples (existing) └── gts (new)

Step 3.2 POP-3D Retrieval Benchmark

For the language-driven object retrieval task, please download the materials as introducted in POP-3D. The corresponding downloading script is downloadretrievalbenchmark.sh. After downloading, place the downloaded materials in folder ${VEON_HOME}/data/nuscenes/retrieval_benchmark/. The folder structure is:

shell script └── data └── nuscenes └── retrieval_benchmark ├── annotations ├── matching_points ├── retrieval_anns_all.csv ├── retrieval_anns_eval.csv ├── retrieval_anns_test.csv ├── retrieval_anns_train.csv └── retrieval_anns_val.csv

Step 4: Check the Folder Structure

First, the ${VEON_HOME}/ckpts folder should have the following structure before training:

shell script ├── clipsan (necessary) │  ├── SAN_ViT-B.pth │  └── SAN_ViT-L.pth ├── depth_pretrain (empty) ├── depthanythingv2 (branch 1.3.1) │  ├── depth_anything_v2_metric_vkitti_vitb.pth │  └── depth_anything_v2_metric_vkitti_vitl.pth └── zoedepth (branch 1.3.2) └── ZoeD_M12_NK_p.pt

Second, the ${VEON_HOME}/data/nuscenes folder should be organized as follows: shell script └── data └── nuscenes ├── bevdetv2-nuscenes_infos_train.pkl ├── bevdetv2-nuscenes_infos_val.pkl ├── gts ├── lidarseg ├── maps ├── retrieval_benchmark (optional) ├── samples ├── sweeps ├── v1.0-test └── v1.0-trainval Not all components from the nuScenes dataset are necessary, but the above folder structure is OK.

Training and Testing

Now we introduce how to train and test the VEON models. The training stage can be divided into two stages, including Depth Pretraining (Stage 1), and Occupancy Prediction (Stage 2).

By default, we use 8 NVIDIA V100 GPUs with 32G memory each.

Training Stage 1: Depth Pretraining

For adapting the depth foundation model, you should run the following script. We recommend using the DepthAnythingV2 variants instead of the MiDaS variants, but we first take the ZoeDepth variants as an example. ```shell script

Branch 1.3.1: MiDaS + ZoeDepth variant

Script format: bash ./tools/disttrain.sh $config $numgpu

cd ${VEONHOME} bash tools/disttrain.sh configs/veon/veon-pretrain-zoedepth.py 8 `` The outputting checkpoints are in folder${VEONHOME}/workdirs/veon-pretrain-zoedepth/. Before starting training stage 2, you should: (1) select one checkpoint (e.g.,epoch48.pth); (2) place it in${VEONHOME}/ckpts/depthpretrain; and (3) rename the checkpoint aszoedepthpretrain.pth`. Note: The file names of the adapted depth models can be revised in the config files of stage 2.

Similarly, for the DepthAnythingV2 variants, the training script is: ```shell script

Branch 1.3.2: Depth-Anything-V2 variant

cd ${VEONHOME} bash tools/disttrain.sh configs/veon/veon-pretrain-depthanythingv2.py 8 `` Before starting training stage 2, you should also select one checkpoint (e.g.,epoch48.pth), place it in${VEONHOME}/ckpts/depthpretrain/, and rename it asdepthanythingv2pretrain_large.pth`.

Training Stage 2: Occupancy Prediction

After obtaining the finetuned depth estimator, we can start training stage 2 by the following script. We recommend using the DepthAnythingV2 variants instead of the MiDaS variants, but we first take the ZoeDepth variants as example. shell script cd ${VEON_HOME} bash tools/dist_train.sh configs/veon/veon-temporal-base-512x1408-zoe-nodepthcache.py 8 After training stage 2, all resulting VEON checkpoints will be stored in ${VEON_HOME}/work_dirs/veon-temporal-base-512x1408-zoe-nodepthcache/.

Similarly, for the DepthAnythingV2 variant, you can run the script as: shell script cd ${VEON_HOME} bash tools/dist_train.sh configs/veon/veon-temporal-base-512x1408-dav2-nodepthcache.py 8 The resulting VEON checkpoints will be stored in ${VEON_HOME}/work_dirs/veon-temporal-base-512x1408-dav2-nodepthcache/.

Testing on Specific Tasks

After training stage 2, you can test the checkpoints stored in folder ${VEON_HOME}/work_dirs/ on specific tasks.

Testing Mode 1: Occ3D-nuScenes Dataset

To test and eval a single checkpoint (e.g. epoch_10.pth) on Occ3D-nuScenes, you can run the following script: ```shell script

Testing only epoch_10.pth on Occ3D-nuScenes Dataset

Script format: ./tools/disttest.sh $config $checkpoint $numgpu --eval $metric

cd ${VEONHOME} bash ./tools/disttest.sh configs/veon/veon-temporal-base-512x1408-zoe-nodepthcache.py workdirs/veon-temporal-base-512x1408-zoe-nodepthcache/epoch10.pth 8 --eval bbox However, **we strongly recommend testing all the resulting checkpoints within a certain epoch interval**, e.g. epoch 5 to epoch 15. That useful script can be written as follows: shell script

Testing all checkpoints from epoch 5 to epoch 15 on Occ3D-nuScenes Dataset

Script format: ./tools/disttestall.sh $config $checkpointfolder $numgpu $startepoch $endepoch --eval $metric

cd ${VEONHOME} bash ./tools/disttestall.sh configs/veon/veon-temporal-base-512x1408-zoe-nodepthcache.py workdirs/veon-temporal-base-512x1408-zoe-nodepthcache 8 5 15 --eval bbox ```

Note: The corresponding config files for DepthAnythingV2 variants are also provided with -dav2. Again, we recommend using the DepthAnythingV2 variants instead of the MiDaS variants.

Testing Mode 2: POP-3D Retrieval Benchmark

You can simply change the config file to eval a single checkpoint on POP-3D retreival benchmark. Here, the config file is different, but the checkpoint is kept the same. ```shell script

Testing only epoch_10.pth on POP-3D Retrieval Benchmark

cd ${VEONHOME} bash ./tools/disttest.sh configs/veon/veon-temporal-base-512x1408-zoe-retrieval.py workdirs/veon-temporal-base-512x1408-zoe-nodepthcache/epoch10.pth 8 --eval bbox ```

Note: The corresponding config files for DepthAnythingV2 variants are also provided with -dav2.

Very Useful Tricks

Depth Cache Mechanism

In training stage 2, as the depth estimator is frozen, we can cache the predicted depth on the whole training set, and thereby accelerate the training process.

Take the MiDaS + ZoeDepth version as an example. After obtaining the finetuned depth checkpoint (e.g. {VEON_HOME}/ckpts/depth_pretrain/zoedepth_pretrain.pth), you can run the following script for only one complete epoch, to cache all predicted depth maps on disk. Around 120G free disk space is required. shell script cd ${VEON_HOME} bash tools/dist_train.sh configs/veon/veon-depthcache-zoedepth.py 8

After one epoch, all depth maps will be stored in folder {VEON_HOME}/data/nuscenes/depth_cache/depth/. Then, you can run training stage 2 with the following script. This would save not only training time but also GPU memory. shell script cd ${VEON_HOME} bash tools/dist_train.sh configs/veon/veon-temporal-base-512x1408-zoe-withdepthcache.py 8

Note: The corresponding config files for DepthAnythingV2 variants are also provided, and the depth cache folder is {VEON_HOME}/data/nuscenes/depth_cache/depth_dav2/.

Temporal Occupancy Prediction

In the journal version of VEON, we integrate surrounding images from multiple frames to exploit the rich temporal information. In fact, you can revise the config files by only one line to run the VEON-T{X} variants.

Take the VEON-L-T{X} variants as an example. You can find the config file, e.g. ./configs/veon/veon-temporal-base-512x1408-zoe-nodepthcache.py, and revise the following line: ```python

Original code: num_temporal = 1

num_temporal = 2 # 1, 2, 3, 4 are all ok for V100 ``` This would support training and testing VEON-T2 with 2-frame inputs. The training and testing scripts are kept the same.

Note: We strongly recommend using the depth cache mechanism when num_temporal > 2, or the ``GPU out of memory'' error would occur on NVIDIA V100 GPUs.

Acknowledgement

This repository refers to multiple great open-sourced code bases. Thanks for their great contribution to the community.

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

bibtex @inproceedings{eccv24-veon, title={VEON: Vocabulary-Enhanced Occupancy Prediction}, author={Zheng, Jilai and Tang, Pin and Wang, Zhongdao and Wang, Guoqing and Ren, Xiangxuan and Feng, Bailan and Ma, Chao}, booktitle={ECCV}, year={2024}, }

Owner

  • Name: SJTU Vision and Learning Group
  • Login: VISION-SJTU
  • Kind: organization
  • Email: chaoma@sjtu.edu.cn

Vision and Learning Group (VLG) is led by Dr. Chao Ma and affiliated with AI Institute, Shanghai Jiao Tong University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection3D Contributors"
title: "OpenMMLab's Next-generation Platform for General 3D Object Detection"
date-released: 2020-07-23
url: "https://github.com/open-mmlab/mmdetection3d"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
  • Issue comment event: 1
  • Member event: 1
  • Push event: 2
  • Create event: 2
Last Year
  • Issues event: 1
  • Watch event: 4
  • Issue comment event: 1
  • Member event: 1
  • Push event: 2
  • Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: 4 days
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 4 days
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Rayn-Wu (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

docker/Dockerfile docker
  • nvcr.io/nvidia/tensorrt 22.07-py3 build
docker/serve/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
requirements/build.txt pypi
requirements/docs.txt pypi
  • docutils ==0.16.0
  • m2r *
  • mistune ==0.8.4
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
requirements/mminstall.txt pypi
  • mmcv-full >=1.4.8,<=1.6.0
  • mmdet >=2.24.0,<=3.0.0
  • mmsegmentation >=0.20.0,<=1.0.0
requirements/optional.txt pypi
  • open3d *
  • spconv *
  • waymo-open-dataset-tf-2-1-0 ==1.2.0
requirements/readthedocs.txt pypi
  • mmcv >=1.4.8
  • mmdet >=2.24.0
  • mmsegmentation >=0.20.1
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • lyft_dataset_sdk *
  • networkx >=2.2,<2.3
  • numba ==0.53.0
  • numpy *
  • nuscenes-devkit *
  • plyfile *
  • scikit-image *
  • tensorboard *
  • trimesh >=2.35.39,<2.35.40
requirements/tests.txt pypi
  • asynctest * test
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort * test
  • kwarray * test
  • pytest * test
  • pytest-cov * test
  • pytest-runner * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi
setup.py pypi