https://github.com/compvis/geometry-free-view-synthesis

Is a geometric model required to synthesize novel views from a single image?

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary

Keywords

novel-view-synthesis transformers

Last synced: 5 months ago · JSON representation

Repository

Is a geometric model required to synthesize novel views from a single image?

Basic Info

Host: GitHub
Owner: CompVis
License: mit
Language: Python
Default Branch: master
Homepage: https://arxiv.org/abs/2104.07652
Size: 162 MB

Statistics

Stars: 381
Watchers: 26
Forks: 35
Open Issues: 11
Releases: 0

Topics

novel-view-synthesis transformers

Created almost 5 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License

Geometry-Free View Synthesis: Transformers and no 3D Priors

teaser

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*, Patrick Esser*, Björn Ommer
* equal contribution

arXiv | BibTeX | Colab

Interactive Scene Exploration Results

RealEstate10K:

Videos: short (2min) / long (12min)

ACID:

Videos: short (2min) / long (9min)

Demo

For a quickstart, you can try the Colab demo, but for a smoother experience we recommend installing the local demo as described below.

Installation

The demo requires building a PyTorch extension. If you have a sane development environment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, you can also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesis conda env create -f geometry-free-view-synthesis/environment.yaml conda activate geofree pip install geometry-free-view-synthesis/

Running

After installation, running

braindance.py

will start the demo on a sample scene. Explore the scene interactively using the WASD keys to move and arrow keys to look around. Once positioned, hit the space bar to render the novel view with GeoGPT.

You can move again with WASD keys. Mouse control can be activated with the m key. Run braindance.py <folder to select image from/path to image> to run the demo on your own images. By default, it uses the re-impl-nodepth (trained on RealEstate without explicit transformation and no depth input) which can be changed with the --model flag. The corresponding checkpoints will be downloaded the first time they are required. Specify an output path using --video path/to/vid.mp4 to record a video.

```

braindance.py -h usage: braindance.py [-h] [--model {reimplnodepth,reimpldepth,acimplnodepth,acimpldepth}] [--video [VIDEO]] [path]

What's up, BD-maniacs?

key(s) action

wasd move around
arrows look around
m enable looking with mouse space render with transformer q quit

positional arguments: path path to image or directory from which to select image. Default example is used if not specified.

optional arguments: -h, --help show this help message and exit --model {reimplnodepth,reimpldepth,acimplnodepth,acimpldepth} pretrained model to use. --video [VIDEO] path to write video recording to. (no recording if unspecified). ```

Training

Data Preparation

We support training on RealEstate10K and ACID. Both come in the same format as described here and the preparation is the same for both of them. You will need to have colmap installed and available on your $PATH.

We assume that you have extracted the .txt files of the dataset you want to prepare into $TXT_ROOT, e.g. for RealEstate:

```

tree $TXT_ROOT ├── test │ ├── 000c3ab189999a83.txt │ ├── ... │ └── fff9864727c42c80.txt └── train ├── 0000cc6d8b108390.txt ├── ... └── ffffe622a4de5489.txt ```

and that you have downloaded the frames (we downloaded them in resolution 640 x 360) into $IMG_ROOT, e.g. for RealEstate:

```

tree $IMG_ROOT ├── test │ ├── 000c3ab189999a83 │ │ ├── 45979267.png │ │ ├── ... │ │ └── 55255200.png │ ├── ... │ ├── 0017ce4c6a39d122 │ │ ├── 40874000.png │ │ ├── ... │ │ └── 48482000.png ├── train │ ├── ... ```

To prepare the $SPLIT split of the dataset ($SPLIT being one of train, test for RealEstate and train, test, validation for ACID) in $SPA_ROOT, run the following within the scripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply set TXT_ROOT, IMG_ROOT and SPA_ROOT as environment variables and run ./sparsify_realestate.sh or ./sparsify_acid.sh. Take a look into the sources to run with multiple workers in parallel.

Finally, symlink $SPA_ROOT to data/realestate_sparse/data/acid_sparse.

First Stage Models

As described in our paper, we train the transformer models in a compressed, discrete latent space of pretrained VQGANs. These pretrained models can be conveniently downloaded by running python scripts/download_vqmodels.py which will also create symlinks ensuring that the paths specified in the training configs (see configs/*) exist. In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to the taming transformers repository.

Running the Training

After both the preparation of the data and the first stage models are done, the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running python geofree/main.py --base configs/<dataset>/<dataset>_13x23_<experiment>.yaml -t --gpus 0, where <dataset> is one of realestate/acid and <experiment> is one of expl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid. These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

variants

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree, title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, author={Robin Rombach and Patrick Esser and Björn Ommer}, year={2021}, eprint={2104.07652}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Owner

Name: CompVis - Computer Vision and Learning LMU Munich
Login: CompVis
Kind: organization
Email: assist.mvl@lrz.uni-muenchen.de
Location: Germany

Website: https://ommer-lab.com/
Repositories: 33
Profile: https://github.com/CompVis

Computer Vision and Learning research group at Ludwig Maximilian University of Munich (formerly Computer Vision Group at Heidelberg University)

GitHub Events

Total

Watch event: 4
Issue comment event: 1

Last Year

Watch event: 4
Issue comment event: 1

Committers

Last synced: 9 months ago

All Time

Total Commits: 9
Total Committers: 1
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Patrick Esser	P**r@g**t	9

Committer Domains (Top 20 + Academic)

gmx.net: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 19
Total pull requests: 3
Average time to close issues: 1 day
Average time to close pull requests: N/A
Total issue authors: 14
Total pull request authors: 2
Average comments per issue: 1.11
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

zimingzhong (2)
chris-aeviator (2)
lalalune (1)
AK391 (1)
avihu111 (1)
DRealArun (1)
hytseng0509 (1)
RMalikM (1)
xiaofanustc (1)
volkancirik (1)
Devetec (1)
hameleon-ed (1)
alextrevithick (1)
123yuyu (1)

Pull Request Authors

amrzv (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

setup.py pypi

einops *
imageio *
imageio-ffmpeg *
importlib-resources *
numpy *
omegaconf >=2.0.0
pygame *
pytorch-lightning >=1.0.8
splatting *
test-tube *
torch *
torchvision *
tqdm *

environment.yaml pypi

https://github.com/compvis/geometry-free-view-synthesis

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Geometry-Free View Synthesis: Transformers and no 3D Priors

Interactive Scene Exploration Results

Demo

Installation

Running

key(s) action

Training

Data Preparation

First Stage Models

Running the Training

BibTeX

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies