depth-from-motion

[ECCV 2022 oral] Monocular 3D Object Detection with Depth from Motion

https://github.com/tai-wang/depth-from-motion

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    4 of 63 committers (6.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary

Keywords

3d-detection autonomous-driving monocular pytorch robotics structure-from-motion

Keywords from Contributors

3d-object-detection object-detection-model point-cloud beit clip constrastive-learning convnext mae masked-image-modeling mobilenet
Last synced: 6 months ago · JSON representation ·

Repository

[ECCV 2022 oral] Monocular 3D Object Detection with Depth from Motion

Basic Info
  • Host: GitHub
  • Owner: Tai-Wang
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 17.1 MB
Statistics
  • Stars: 313
  • Watchers: 9
  • Forks: 29
  • Open Issues: 7
  • Releases: 0
Topics
3d-detection autonomous-driving monocular pytorch robotics structure-from-motion
Created over 3 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Depth from Motion (DfM)

This repository is the official implementation for DfM and MV-FCOS3D++.

pv-demo

3d-demo-318 3d-demo2-318

Introduction

This is an official release of the paper: Monocular 3D Object Detection with Depth from Motion and MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones.

The code is still going through large refactoring. We plan to re-organize this repo as a combination of core codes for this project and mmdet3d requirement finally.

Please stay tuned for the clean release of all the configs and models.

Note: We will also release the refactored code in the official mmdet3d soon.

Monocular 3D Object Detection with Depth from Motion,
Tai Wang, Jiangmiao Pang, Dahua Lin
In: Proc. European Conference on Computer Vision (ECCV), 2022
[arXiv][Bibtex]

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones,
Tai Wang, Qing Lian, Chenming Zhu, Xinge Zhu, Wenwei Zhang
In: arxiv, 2022
[arXiv][Bibtex]

Results

DfM

The results of DfM and its corresponding config are shown as below.

We have released the preliminary model for reproducing the results on the KITTI validation set.

The complete model checkpoints and logs will be released soon.

| Backbone | Lr schd | Mem (GB) | Inf time (fps) | Easy | Moderate | Hard | Download| | :-------: | :-----: | :------: | :------------: | :----: | :------: | :--: | :-----: | | ResNet34 | - | - | - | 29.1232 | 19.8970 | 17.39101 | model | log | | above @ BEV AP
(IoU 0.7) | - | - | - | 38.9137 | 27.2843 | 24.8381 | | | above @ 3D AP
(IoU 0.5) | - | - | - | 67.4935 | 51.2602 | 47.4430 | | | above @ BEV AP
(IoU 0.5) | - | - | - | 72.5696 | 55.4583 | 52.4735 | |

[1] This reproduced performance may have some degree of fluctuation due to the limited training samples and sensitive metrics. From my experience of multiple runs, the average performance may vary from 26/18/16 to 29/20/17, depending on the effect of corner cases (caused by matrix inverse computation or other reasons). Please stay tuned for a more stable version. (Models and logs will be updated soon.)

MV-FCOS3D++

The results of MV-FCOS3D++ (baseline version) and its corresponding config are shown as below.

We have released the preliminary config for reproducing the results on the Waymo validation set.

(To comply the license agreement of Waymo dataset, the pre-trained models on Waymo dataset are not released.)

The complete model configs and logs will be released soon.

Pretrained FCOS3D++ (without customized finetuning)

| Backbone | Lr schd | Mem (GB) | Inf time (fps) | mAPL | mAP | mAPH | Download | | :-------: | :-----: | :------: | :------------: | :----: | :------: | :--: | :-----: | | ResNet101+DCN | - | - | - | 20.41 | 28.6 | 27.01 | log | | above @ Car | - | - | - | 41.05 | 55.74 | 54.83 | | | above @ Pedestrian | - | - | - | 18.77 | 27.85 | 24.21 | | | above @ Cyclist | - | - | - | 1.43 | 2.21 | 2.0 | |

MV-FCOS3D++ with Pretrained FCOS3D++

| Backbone | Lr schd | Mem (GB) | Inf time (fps) | mAPL | mAP | mAPH | Download | | :-------: | :-----: | :------: | :------------: | :----: | :------: | :--: | :-----: | | ResNet101+DCN | - | - | - | 33.8 | 46.65 | 44.25| log | | above @ Car | - | - | - | 52.69 | 68.36 | 67.47 | | | above @ Pedestrian | - | - | - | 26.82 | 38.47 | 34.1 | | | above @ Cyclist | - | - | - | 21.9 | 33.11 | 31.16 | | | ResNet101+DCN
+10 sweeps
| - | - | - | 35.14| 47.98 | 45.49 | log1 | log2 | | above @ Car | - | - | - | 55.44 | 70.72 | 69.79 | | | above @ Pedestrian | - | - | - | 27.6 | 39.5 | 35.1 | | | above @ Cyclist | - | - | - | 22.39 | 33.72 | 31.59 | | | ResNet101+DCN
(slow infer)2
| - | - | - | 37.9 | 52.15 | 48.84| | | above @ Car | - | - | - | 56.24 | 73.15 | 72.07 | | | above @ Pedestrian | - | - | - | 34.6 | 49.01 | 42.25 | | | above @ Cyclist | - | - | - | 22.84 | 34.29 | 32.18 | |

[2] "slow infer" refers to changing the nms setting to nms_pre=4096 and max_num=500 to increase the number of predictions such that the inference can get a better recall performance. It will slow down the inference procedure but significantly improves the final performance under the Waymo metric. The same trick can also be applied to the 10-sweep config and other models.

Installation

It requires the following OpenMMLab packages:

  • MMCV-full >= v1.6.0 (recommended for the latest iou3d computation)
  • MMDetection >= v2.24.0
  • MMSegmentation >= v0.20.0

All the above versions are recommended except mmcv. Lower version of mmdet and mmseg may also work but are not tested temporarily.

Example commands are shown as follows.

bash conda create --name dfm python=3.7 -y conda activate dfm conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge pip install mmcv-full==1.6.0 pip install mmdet==2.24.0 pip install mmsegmentation==0.20.0 git clone https://github.com/Tai-Wang/Depth-from-Motion.git cd Depth-from-Motion pip install -v -e .

License

This project is released under the Apache 2.0 license.

Usage

Data preparation

First prepare the raw data of KITTI and Waymo data following MMDetection3D.

Then we prepare the data related to temporally consecutive frames.

For KITTI, we need to additionally download the pose and label files of the raw data here and the official mapping (between the raw data and the 3D detection benchmark split) here. Then we can run the data converter script:

python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti

For Waymo, we need to additionally generate the ground truth bin file for camera-only setting (only boxes covered by the perception range of cameras are considered). Besides, we recommend use the latest waymo dataset, which includes the camera synced annotations tailored to this setting.

python tools/create_waymo_gt_bin.py

Then please follow the mmdet3d tutorial for Waymo dataset for the pre-processing steps.

The final data structure looks like below:

text mmdetection3d ├── mmdet3d ├── tools ├── configs ├── data │ ├── kitti │ │ ├── ImageSets │ │ ├── testing │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── prev_2 │ │ │ ├── velodyne │ │ ├── training │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── prev_2 │ │ │ ├── label_2 │ │ │ ├── velodyne │ │ ├── raw │ │ │ ├── 2011_09_26_drive_0001_sync │ │ │ ├── xxxx (other raw data files) │ │ ├── devkit │ │ │ ├── mapping │ │ │ │ ├── train_mapping.txt │ │ │ │ ├── train_rand.txt │ ├── waymo │ │ ├── waymo_format │ │ │ ├── training │ │ │ ├── validation │ │ │ ├── testing │ │ │ ├── gt.bin │ │ │ ├── cam_gt.bin │ │ ├── kitti_format │ │ │ ├── ImageSets │ │ │ ├── training │ │ │ │ ├── calib │ │ │ │ ├── image_0 │ │ │ │ ├── image_1 │ │ │ │ ├── image_2 │ │ │ │ ├── image_3 │ │ │ │ ├── image_4 │ │ │ │ ├── label_0 │ │ │ │ ├── label_1 │ │ │ │ ├── label_2 │ │ │ │ ├── label_3 │ │ │ │ ├── label_4 │ │ │ │ ├── label_all │ │ │ │ ├── pose │ │ │ │ ├── velodyne │ │ │ ├── testing │ │ │ │ ├── (the same as training) │ │ │ ├── waymo_gt_database │ │ │ ├── waymo_infos_trainval.pkl │ │ │ ├── waymo_infos_train.pkl │ │ │ ├── waymo_infos_val.pkl │ │ │ ├── waymo_infos_test.pkl │ │ │ ├── waymo_dbinfos_train.pkl

Pretrained models

For the KITTI implementation of DfM, we keep the LIGA-Stereo setting that has a LiDAR-based teacher for better supervision during training. Please download the teacher checkpoint (has been converted to mmdet3d-style) here. It can make this network converge faster and bring ~1 AP performance gain. We will consider to replace it with other more direct supervision for simpler usage in the near future.

Demo

To test DfM on image data, simply run:

shell python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]

where the ANNOTATION_FILE should provide the 3D to 2D projection matrix (camera intrinsic matrix). The visualization results including an image and its predicted 3D bounding boxes projected on the image will be saved in ${OUT_DIR}/IMAGE_NAME.

Example on KITTI data using DfM model:

shell python demo/mono_det_demo.py demo/data/kitti/000008.png demo/data/kitti/kitti_000008_infos.pkl configs/dfm/dfm_r34_1x8_kitti-3d-3class.py checkpoints/dfm.pth

Training and testing

For training and testing, you can follow the standard command in mmdet to train and test the model

```bash

train DfM on KITTI

./tools/slurmtrain.sh ${PARTITION} ${JOBNAME} ${CONFIGFILE} ${WORKDIR} ```

For simple inference and evaluation, you can use the command below:

```bash

evaluate DfM on KITTI and MV-FCOS3D++ on Waymo

./tools/slurmtest.sh ${PARTITION} ${JOBNAME} ${CONFIGFILE} ${CKPTPATH} --eval mAP ```

FAQ

  • How to use the Waymo LET-AP metric to evaluate the performance of MV-FCOS3D++?

You can follow the instruction of compiling the original Waymo detection metrics to compile this file and get the compute_detection_let_metrics_main file for LET-AP metric evaluation. Besides, you can refer to the official tutorial of camera-only 3D detection for more details about its python example code.

Acknowledgement

This codebase is based on MMDet3D and it benefits a lot from LIGA-Stereo.

Citation

bibtex @inproceedings{wang2022dfm, title={Monocular 3D Object Detection with Depth from Motion}, author={Wang, Tai and Pang, Jiangmiao and Lin, Dahua}, year={2022}, booktitle={European Conference on Computer Vision (ECCV)}, } @article{wang2022mvfcos3d++, title={{MV-FCOS3D++: Multi-View} Camera-Only 4D Object Detection with Pretrained Monocular Backbones}, author={Wang, Tai and Lian, Qing and Zhu, Chenming and Zhu, Xinge and Zhang, Wenwei}, journal={arXiv preprint}, year={2022} }

Owner

  • Name: Tai-Wang
  • Login: Tai-Wang
  • Kind: user
  • Location: Hong Kong SAR
  • Company: MMLab, CUHK

PhD student@MMLab, CUHK

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection3D Contributors"
title: "OpenMMLab's Next-generation Platform for General 3D Object Detection"
date-released: 2020-07-23
url: "https://github.com/open-mmlab/mmdetection3d"
license: Apache-2.0

GitHub Events

Total
  • Watch event: 13
  • Fork event: 1
Last Year
  • Watch event: 13
  • Fork event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 916
  • Total Committers: 63
  • Avg Commits per committer: 14.54
  • Development Distribution Score (DDS): 0.842
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Tai-Wang t****g@o****m 145
zhangwenwei w****w@o****m 143
liyinhao l****o@s****m 71
ChaimZhu z****g@p****n 71
Ziyi Wu d****6@g****m 64
Wenhao Wu 7****u 61
liyinhao S****o@c****m 57
Yezhen Cong 5****z 50
yinchimaoliang l****3@q****m 41
wuyuefeng w****g@s****m 41
xiliu8006 7****6 30
Danila Rukhovich d****h@g****m 22
encore-zhou z****g@s****m 19
dingchang h****r@s****m 10
Xiangxu-0103 x****3@g****m 10
hjin2902 6****2 8
VVsssssk 8****k 8
Shilong Zhang 6****g 6
zhanggefan 3****n 6
Zongbao Feng m****u@g****m 4
Enze Xie J****z@1****m 2
meng-zha z****0@m****n 2
junhaozhang98 3****8 2
congee 3****4 2
WRH 1****i 2
Gopi Krishna Erabati g****1@g****m 2
Vitalii Kudinov v****x@g****m 1
Tianwei Yin y****i@u****u 1
THU17cyz c****1@h****m 1
Subjectivist x****c@1****m 1
and 33 more...

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 28
  • Total pull requests: 0
  • Average time to close issues: 9 days
  • Average time to close pull requests: N/A
  • Total issue authors: 20
  • Total pull request authors: 0
  • Average comments per issue: 2.07
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • wangluyabupt (4)
  • jichaofeng (3)
  • shuxiusuxiu (2)
  • shuowang666 (2)
  • lhiceu (2)
  • zhzhzhzhzhz (1)
  • le-cheng (1)
  • synsin0 (1)
  • Daniel-xsy (1)
  • Merealtea (1)
  • zyrant (1)
  • JunjieLiuSWU (1)
  • vansin (1)
  • PhanThanhTrung (1)
  • YuJiXYZ (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements/docs.txt pypi
  • docutils ==0.16.0
  • m2r *
  • mistune ==0.8.4
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
requirements/mminstall.txt pypi
  • mmcv-full >=1.4.8,<=1.6.0
  • mmdet >=2.24.0,<=3.0.0
  • mmsegmentation >=0.20.0,<=1.0.0
requirements/optional.txt pypi
  • open3d *
  • spconv *
  • waymo-open-dataset-tf-2-1-0 ==1.2.0
requirements/readthedocs.txt pypi
  • mmcv >=1.4.8
  • mmdet >=2.24.0
  • mmsegmentation >=0.20.1
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • lyft_dataset_sdk *
  • networkx >=2.2,<2.3
  • numba ==0.53.0
  • numpy *
  • nuscenes-devkit *
  • plyfile *
  • scikit-image *
  • tensorboard *
  • trimesh >=2.35.39,<2.35.40
requirements/tests.txt pypi
  • asynctest * test
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort * test
  • kwarray * test
  • pytest * test
  • pytest-cov * test
  • pytest-runner * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test