https://github.com/brianpugh/raft-stereo
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: BrianPugh
- License: mit
- Language: Python
- Default Branch: main
- Size: 663 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of princeton-vl/RAFT-Stereo
Created almost 5 years ago
· Last pushed almost 5 years ago
https://github.com/BrianPugh/RAFT-Stereo/blob/main/
# RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: [RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching](https://arxiv.org/pdf/2109.07547.pdf)
Lahav Lipson, Zachary Teed and Jia Deng
## Requirements The code has been tested with PyTorch 1.7 and Cuda 10.2. ```Shell conda env create -f environment.yaml conda activate raftstereo ``` ## Required Data To evaluate/train RAFT, you will need to download the required datasets. * [Sceneflow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html#:~:text=on%20Academic%20Torrents-,FlyingThings3D,-Driving) (Includes FlyingThings3D, Driving & Monkaa * [Middlebury](https://vision.middlebury.edu/stereo/data/) * [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data) * [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo) To download the ETH3D and Middlebury test datasets for the [demos](#demos), run ```Shell chmod ug+x download_datasets.sh && ./download_datasets.sh ``` By default `stereo_datasets.py` will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the `datasets` folder ```Shell datasets FlyingThings3D frames_cleanpass frames_finalpass disparity Monkaa frames_cleanpass frames_finalpass disparity Driving frames_cleanpass frames_finalpass disparity KITTI testing training devkit Middlebury MiddEval3 ETH3D lakeside_1l ... tunnel_3s ``` ## Demos Pretrained models can be downloaded by running ```Shell chmod ug+x download_models.sh && ./download_models.sh ``` or downloaded from [google drive](https://drive.google.com/drive/folders/1booUFYEXmsdombVuglatP0nZXb5qI89J) You can demo a trained model on pairs of images. To predict stereo for Middlebury, run ```Shell python demo.py --restore_ckpt models/raftstereo-sceneflow.pth ``` Or for ETH3D: ```Shell python demo.py --restore_ckpt models/raftstereo-eth3d.pth -l=datasets/ETH3D/*/im0.png -r=datasets/ETH3D/*/im1.png ``` Using our fastest model: ```Shell python demo.py --restore_ckpt models/raftstereo-realtime.pth --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru ``` To save the disparity values as `.npy` files, run any of the demos with the `--save_numpy` flag. ## Converting Disparity to Depth If the camera focal length and camera baseline are known, disparity predictions can be converted to depth values using
Note that the units of the focal length are _pixels_ not millimeters. ## Evaluation To evaluate a trained model on a validation set (e.g. Middlebury), run ```Shell python evaluate_stereo.py --restore_ckpt models/raftstereo-middlebury.pth --dataset middlebury_H ``` ## Training Our model is trained on two RTX-6000 GPUs using the following command. Training logs will be written to `runs/` which can be visualized using tensorboard. ```Shell python train_stereo.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision ``` To train using significantly less memory, change `--n_downsample 2` to `--n_downsample 3`. This will slightly reduce accuracy. ## (Optional) Faster Implementation We provide a faster CUDA implementation of the correlation volume which works with mixed precision feature maps. ```Shell cd sampler && python setup.py install && cd .. ``` Running demo.py, train_stereo.py or evaluate.py with `--corr_implementation reg_cuda` together with `--mixed_precision` will speed up the model without impacting performance. To significantly decrease memory consumption on high resolution images, use `--corr_implementation alt`. This implementation is slower than the default, however.
Owner
- Name: Brian Pugh
- Login: BrianPugh
- Kind: user
- Location: Washington D.C.
- Repositories: 123
- Profile: https://github.com/BrianPugh
Deep Learning Scientist and blockchain enthusiast
## Requirements
The code has been tested with PyTorch 1.7 and Cuda 10.2.
```Shell
conda env create -f environment.yaml
conda activate raftstereo
```
## Required Data
To evaluate/train RAFT, you will need to download the required datasets.
* [Sceneflow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html#:~:text=on%20Academic%20Torrents-,FlyingThings3D,-Driving) (Includes FlyingThings3D, Driving & Monkaa
* [Middlebury](https://vision.middlebury.edu/stereo/data/)
* [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data)
* [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)
To download the ETH3D and Middlebury test datasets for the [demos](#demos), run
```Shell
chmod ug+x download_datasets.sh && ./download_datasets.sh
```
By default `stereo_datasets.py` will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the `datasets` folder
```Shell
datasets
FlyingThings3D
frames_cleanpass
frames_finalpass
disparity
Monkaa
frames_cleanpass
frames_finalpass
disparity
Driving
frames_cleanpass
frames_finalpass
disparity
KITTI
testing
training
devkit
Middlebury
MiddEval3
ETH3D
lakeside_1l
...
tunnel_3s
```
## Demos
Pretrained models can be downloaded by running
```Shell
chmod ug+x download_models.sh && ./download_models.sh
```
or downloaded from [google drive](https://drive.google.com/drive/folders/1booUFYEXmsdombVuglatP0nZXb5qI89J)
You can demo a trained model on pairs of images. To predict stereo for Middlebury, run
```Shell
python demo.py --restore_ckpt models/raftstereo-sceneflow.pth
```
Or for ETH3D:
```Shell
python demo.py --restore_ckpt models/raftstereo-eth3d.pth -l=datasets/ETH3D/*/im0.png -r=datasets/ETH3D/*/im1.png
```
Using our fastest model:
```Shell
python demo.py --restore_ckpt models/raftstereo-realtime.pth --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru
```
To save the disparity values as `.npy` files, run any of the demos with the `--save_numpy` flag.
## Converting Disparity to Depth
If the camera focal length and camera baseline are known, disparity predictions can be converted to depth values using
Note that the units of the focal length are _pixels_ not millimeters.
## Evaluation
To evaluate a trained model on a validation set (e.g. Middlebury), run
```Shell
python evaluate_stereo.py --restore_ckpt models/raftstereo-middlebury.pth --dataset middlebury_H
```
## Training
Our model is trained on two RTX-6000 GPUs using the following command. Training logs will be written to `runs/` which can be visualized using tensorboard.
```Shell
python train_stereo.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision
```
To train using significantly less memory, change `--n_downsample 2` to `--n_downsample 3`. This will slightly reduce accuracy.
## (Optional) Faster Implementation
We provide a faster CUDA implementation of the correlation volume which works with mixed precision feature maps.
```Shell
cd sampler && python setup.py install && cd ..
```
Running demo.py, train_stereo.py or evaluate.py with `--corr_implementation reg_cuda` together with `--mixed_precision` will speed up the model without impacting performance.
To significantly decrease memory consumption on high resolution images, use `--corr_implementation alt`. This implementation is slower than the default, however.