https://github.com/aim-uofa/depth3d
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: aim-uofa
- License: bsd-2-clause
- Language: Python
- Default Branch: master
- Size: 72.1 MB
Statistics
- Stars: 10
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Depth3D: A Model Zoo for Robust Monocular Metric Depth Estimation
Depth3D aims at robust monocular metric depth estimation on zero-shot testing images, while ensuring the geometric accuracy of unprojected 3D point cloud. We release models of BEiT-L, ConvNext-L, and Swin2-L trained on 11.8 million RGB-D data. The technical report is avaliable in Depth3D/technical_report_v1.1.pdf.
Dataset and Weight Download
Download all the folders including datasets, weights, pretrained_weights, weights_ablation, and place them under Depth3D/. The download link is as follows:
Baidu Netdisk: 链接(Download link): https://pan.baidu.com/s/1ISB0kOooYz5QMttmHAd5cA?pwd=qr8q 提取码(Passcode): qr8q
The Components of Each Folder
bash
- datasets # The test datasets, we provide kitti, nyu, scannet, 7scenes, diode, eth3d, ibims, and nuscenes.
- pretrained weights # pre-trained weights, which are adopted for training depth models.
- convnext_large_22k_1k_384.pth # Weight of ConvNext-L pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k at resolution 384x384.
- dpt_beit_large_512.pt # The highest quality affine-invariant depth model of BEiT-L trained by MiDaS 3.1 (https://github.com/isl-org/MiDaS/tree/master).
- dpt_swin2_large_384.pt # The speed-performance trade-off affine-invariant depth model of Swin2-L trained by MiDaS 3.1.
- weights # Our released models, we leverage around 12 million RGB-D data for training.
- metricdepth_beit_large_512x512.pth # BEiT-L depth model trained on the resolution of 512x512. Best Performance.
- metricdepth_convnext_large_544x1216.pth # ConvNexrt-L depth model trained on the resolution of 544x1216.
- metricdepth_swin2_large_384x384.pth # Swin2-L depth model trained on the resolution of 384x384. Balance between speed and quality.
- weights_ablation # Checkpoints of ablation study
- ablation_full.pth # Full baseline of ablation study. Trained with 21 datasets, 7 losses, and supervised in the camera canonical space.
- ablation_data_6_datasets.pth # Compared to ablation_full.pth, only trained on 6 datasets.
- ablation_data_13_datasets.pth # Compared to ablation_full.pth, only trained on 13 datasets.
- ablation_loss_l1_sky.pth # Compared to ablation_full.pth, only supervised with "L1Loss" and "SkyRegularizationLoss".
- ablation_loss_l1_sky_normal.pth # Compared to ablation_full.pth, only supervised with "L1Loss" and "SkyRegularizationLoss", "VNLoss", "EdgeguidedNormalLoss", "PWNPlanesLoss".
- ablation_wo_lablescalecanonical.pth # Compared to ablation_full.pth, it does not transform the camera to the canonical space, which results in unsatisfactory metric depth performance.
Reproducing the Results of the Technical Report
To reproduce the results, please run the scripts of Depth3D/scripts/technical_report.
For example. if you would like to reproduce the Table 3, run this commmand: python scripts/technical_report/run_table3.py. It will take hours of time to output the final results.
See scripts/technical_report/README.md for details.
Structure of Code
python
- Depth3D
- data_info
- check_datasets.py
- pretrained_weight.py # pretrained weight path of backbone.
- public_datasets.py # path of annotations of diverse datasets.
- datasets
- ibims
- ibims
- test_annotation.json
- diode
- diode
- test_annotation.json
- test_annotation_indoor.json
- test_annotation_outdoor.json
- ETH3D
- ETH3D
- test_annotations.json
...
- demo_data # demo data of technical report.
- mono
- configs # configs of training and evaluation.
- datasets # torch.utils.data.Dataset.
- model # depth models.
- tools
- utils
- other_tools
- pretrained_weights # pretrained weights, used for training depth models.
- scripts # scripts of training and testing
- ablation
- test
- train
- technical_report # scripts to reproduce the results of technical report.
- test
- train
- show_dirs # output folder of inference.
- weights # place our trained depth models here.
- weights_ablation # place our released depth models of ablation study here.
- work_dirs # output folder of training.
Installation
bash
conda create -n Depth3D python=3.7
conda activate Depth3D
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install -U openmim
mim install mmengine
mim install "mmcv-full==1.3.17"
pip install yapf==0.40.1
For 40 Series GPUs
bash
conda create -n Depth3D python=3.8
conda activate Depth3D
pip install torch==2.0.0 torchvision==0.15.1
pip install -r requirements.txt
pip install -U openmim
mim install mmengine
mim install "mmcv-full==1.7.1"
pip install yapf==0.40.1
Evaluation of a Specific Dataset
Case 1: Metric Depth Estimation of Our Provided Test Datasets
bash
source scripts/test/beit/test_beit_nyu.sh # change the shell file path if necessary.
If you would like to evaluate on a customized dataset, let's take "demo_data" as an example:
Case 2: Metric Depth Estimation of Customized Data
- Generate
test_annotation.jsonof customized dataset:python dict( 'files': [ dict('cam_in': [fx, fy, cx, cy], 'rgb': 'demo_data/rgb/xxx.png', (optional) 'depth': 'demo_data/gt_depth/xxx.npy', (optional) 'depth_scale': 1.0, (optional) 'depth_mask': 'demo_data/gt_depth_mask/xxx.npy'), dict('cam_in': [fx, fy, cx, cy], 'rgb': 'demo_data/rgb/xxx.png', (optional) 'depth': 'demo_data/gt_depth/xxx.png', (optional) 'depth_scale': 256.0, (optional) 'depth_mask': null), dict('cam_in': [fx, fy, cx, cy], 'rgb': 'demo_data/rgb/xxx.png', (optional) 'depth': 'demo_data/gt_depth/xxx.png', (optional) 'depth_scale': 1000.0, (optional) 'depth_mask': null), ... ] )SeeDepth3D/demo_data/test_annotations.jsonfor details. Thedepthanddepth_scaleare necessary if evaluation is expected. Thedepth_scalestands for the depth scale ratio of depth image. Thedepth_maskis used to filter out invalid depth regions, with 0 for invalid regions and others for valid regions.
We store the relative file path in the annotation. For example, if the RGB file path is /mnt/nas/share/home/xugk/Depth3D/demo_data/rgb/0016028039294718_b.jpg, we save the relative path demo_data/rgb/0016028039294718_b.jpg, and set parameters "$DATA_ROOT" to "/mnt/nas/share/home/xugk/Depth3D/"(See step 2 below).
- Inference with this script:
bash DEPTH3D_TEST_ANNO_PATH='demo_data/test_annotations.json' DATA_ROOT='/mnt/nas/share/home/xugk/Depth3D/' source scripts/inference/inference_metric_depth.sh $TEST_ANNO_PATH $DATA_ROOT
If the absolute file paths are saved in test_annotation.json, you can simply input the "$TEST_ANNO_PATH" only:
bash
TEST_ANNO_PATH='demo_data/test_annotations_absolute_path.json'
source scripts/inference/inference_metric_depth.sh $TEST_ANNO_PATH
- The output depth maps and point clouds are saved in
Depth3D/outputs_beit_metric_depth.
Case 3: Scale-invariant Depth Estimation of Customized Data (In the Wild Images, Unknown Focal Length)
Assuming the path of RGB folder is "demo_data/rgb/":
bash
RGB_FOLDER='demo_data/rgb/'
source scripts/inference/inference_in_the_wild.sh $RGB_FOLDER
The output depth maps and point clouds are saved in Depth3D/outputs_beit_in_the_wild.
Dataset Structure
We use the *_annotation.json files to store the camera intrinsic information and the paths of rgb, depth, etc. The data structure is as follows:
bash
- Taskonomy
- Taskonomy
- (optional) meta # save the pickle files, see 'Format 1' for details
- rgb
- depth
- (optional) sem
- (optional) normal
- test_annotation.json # test annotation file
- train_annotation.json # train annotation file
Format 1
The format of *annotation.json files: ```python dict( 'files': [ dict('camin': [fx, fy, cx, cy], 'rgb': 'Taskonomy/rgb/xxx.png', 'depth': 'Taskonomy/depth/xxx.png', (optional) 'sem': 'Taskonomy/sem/xxx.png', (optional) 'normal': 'Taskonomy/norm/xxx.png'), dict('camin': [fx, fy, cx, cy], 'rgb': 'Taskonomy/rgb/xxx.png', 'depth': 'Taskonomy/depth/xxx.png', (optional) 'sem': 'Taskonomy/sem/xxx.png', (optional) 'normal': 'Taskonomy/norm/xxx.png'), dict('camin': [fx, fy, cx, cy], 'rgb': 'Taskonomy/rgb/xxx.png', 'depth': 'Taskonomy/depth/xxx.png', (optional) 'sem': 'Taskonomy/sem/xxx.png', (optional) 'normal': 'Taskonomy/norm/xxx.png'), ... ] ) ```
Format 2
The format of *annotation.json files: ```python dict( 'files': [ dict('metadata': 'Taskonomy/xxx/xxx.pkl'), dict('metadata': 'Taskonomy/xxx/xxx.pkl'), dict('metadata': 'Taskonomy/xxx/xxx.pkl'), ... ] ) ```
The format of 'xxx.pkl':
python
dict(
'cam_in': [fx, fy, cx, cy],
'rgb': 'Taskonomy/rgb/xxx.png',
'depth': 'Taskonomy/depth/xxx.png',
(optional) 'sem': 'Taskonomy/sem/xxx.png'
(optional) 'normal': 'Taskonomy/norm/xxx.png',
)
Demos
Monocular Depth Estimation
Unprojected 3D Point Cloud
🎫 License
For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
Owner
- Name: Advanced Intelligent Machines (AIM)
- Login: aim-uofa
- Kind: organization
- Location: China
- Repositories: 23
- Profile: https://github.com/aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- DateTime *
- HTML4Vision *
- Pillow *
- h5py *
- imagecorruptions *
- imgaug *
- iopath *
- matplotlib *
- numpy *
- opencv-python *
- plyfile *
- tabulate *
- tensorboardX *
- termcolor *
- timm ==0.6.13