vhap

A complete head tracking pipeline from videos to NeRF/3DGS-ready datasets.

https://github.com/shenhanqian/vhap

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

A complete head tracking pipeline from videos to NeRF/3DGS-ready datasets.

Basic Info

Host: GitHub
Owner: ShenhanQian
License: other
Language: Python
Default Branch: main
Homepage:
Size: 48.1 MB

Statistics

Stars: 270
Watchers: 5
Forks: 34
Open Issues: 13
Releases: 2

Created almost 2 years ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

VHAP: Versatile Head Alignment with Adaptive Appearance Priors

Update

[2025-07-08] Achieving ~3x acceleration via batchifying frames for monocular videos (batch_size=16). Use lower batch_size in case of drifting, and set it to 1 for the original behavior.

TL;DR

A photometric optimization pipeline based on differentiable mesh rasterization, applied to human head alignment.
A perturbation mechanism that implicitly extract and inject regional appearance priors adaptively during rendering, enabling alignment of regions purely based on their appearance consistency, such as the hair, ears, neck, and shoulders, where no pre-defined landmarks are available.
The exported tracking results can be directly used to create you own GaussianAvatars.

License

This work is made available under CC-BY-NC-SA-4.0. The repository is derived from the multi-view head tracker of GaussianAvatars, which is subjected to the following statements:

Toyota Motor Europe NV/SA and its affiliated companies retain all intellectual property and proprietary rights in and to this software and related documentation. Any commercial use, reproduction, disclosure or distribution of this software and related documentation without an express license agreement from Toyota Motor Europe NV/SA is strictly prohibited.

On top of the original repository, we add support to monocular videos and provide a complete set of scripts from video preprocessing to result export for NeRF/3DGS-style applications.

Setup

```shell git clone git@github.com:ShenhanQian/VHAP.git cd VHAP

conda create --name VHAP -y python=3.10 conda activate VHAP

Install CUDA and ninja for compilation

conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit ninja cmake # use the right CUDA version ln -s "$CONDAPREFIX/lib" "$CONDAPREFIX/lib64" # to avoid error "/usr/bin/ld: cannot find -lcudart" conda env config vars set CUDAHOME=$CONDAPREFIX # for compilation

Install PyTorch (make sure that the CUDA version matches with "Step 1")

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

or

conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia

make sure torch.cuda.is_available() returns True

pip install -e . ```

[!NOTE] - We use an adjusted version of nvdiffrast for backface-culling. If you have other versions installed before, you can reinstall as follows: shell pip install nvdiffrast@git+https://github.com/ShenhanQian/nvdiffrast@backface-culling --force-reinstall rm -r ~/.cache/torch_extensions/*/nvdiffrast* - We use STAR for landmark detection by default. Alterntively, face-alignment is faster but less accurate.

Download

FLAME

Our code relies on FLAME. Please download assets from the official website and store them in the paths below:

FLAME 2023 (versions w/ jaw rotation) -> asset/flame/flame2023.pkl
FLAME Vertex Masks -> asset/flame/FLAME_masks.pkl

[!NOTE] It is possible to use FLAME 2020 by download to asset/flame/generic_model.pkl. The FLAME_MODEL_PATH in flame.py needs to be updated accordingly.

Video Data

Multiview

To get access to NeRSemble dataset, please request via the Google Form. The directory structure is expected to be like this.

[!NOTE] The NeRSemble dataset has been updated to Version 2. Its folder structure and color correction algorithm differ from those in Version 1, so please be careful not to confuse the two.

Monocular

We use monocular video sequences following INSTA. You can download raw videos from LRZ.

Usage

Monocular

For Monocular Videos

Multiview

For NeRSemble Dataset

For NeRSemble Dataset V2

Discussions

Photometric alignment is versatile but sometimes sensitive.

Texture map regularization: Our method relies on a total-variation regularization on the texture map. Its loss weight is by default 1e4 for a monocualr video and 1e5 for the NeRSemble dataset (16 views). For you own multi-view dataset with fewer views, you should lower the regularization by passing --w.reg_tex_tv 1e4 or 3e4. Otherwise, you may encounter corrupted shapes and blurry textures similar to https://github.com/ShenhanQian/VHAP/issues/10#issue-2558743737 and https://github.com/ShenhanQian/VHAP/issues/6#issue-2524833245.

Color affinity: If the color of a point on the foreground contour is too close to the background, the static_offset can go wild. You may try a different background color by --data.background_color white or --data.background_color black. You can also disable static_offset by --model.no_use_static_offset.

Occlussion: When the neck is occluded by collars, the photometric gradients may squeeze and stretch the neck into unnatural shapes. Usually, this problem can be relieved by disabling photometric alignment in certain regions. We hard-coded the occlusion status for some subjects in the NeRSemble dataset with the occluded_table. You can extend the table or temporally change it by, e.g., --model.occluded neck_lower boundary.

Limited degree of freedom: Another limitation comes from the FLAME model. FLAME is great since it covers the whole head and neck. However, there is only one joint for the neck, between the neck and the head. This means the lower part of the neck cannot move relative to the torse. This limits the model's ability to capture large movement of the head. For example, it's very hard to achieve good alignment of the lower neck and the head at the same time for the EXP-1-head sequence in NeRSemble dataset because of the aforementioned lack of degree of freedom.

You are welcomed to report more failure cases and help us improve the tracker.

Interactive Viewers

Our method relies on vertex masks defined on FLAME. We add custom masks to enrich the original ones. You can play with regions in our FLAME Editor to see how each mask look like .

shell python vhap/flame_editor.py

We also provide a FLAME viewer for you to interact with a tracked sequence.

shell python vhap/flame_viewer.py \ --param_path output/nersemble/074_EMO-1_v16_DS4_wBg_staticOffset/2024-09-09_15-49-02/tracked_flame_params_30.npz \

Optional, you can enable colored rendering by specifying a texture image with --tex_path.

For both viewers, you can switch to flat shading with --no-shade-smooth.

Cite

Please kindly cite our repository and preceding paper if you find our software or algorithm useful for your research.

bibtex @misc{qian2024vhap, title={VHAP: Versatile Head Alignment with Adaptive Appearance Priors}, author={Qian, Shenhan}, year={2024}, month={sep}, doi={10.5281/zenodo.14988309} url={https://github.com/ShenhanQian/VHAP} }

bibtex @inproceedings{qian2024gaussianavatars, title={Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians}, author={Qian, Shenhan and Kirschstein, Tobias and Schoneveld, Liam and Davoli, Davide and Giebenhain, Simon and Nie{\ss}ner, Matthias}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={20299--20309}, year={2024} }

Owner

Name: Shenhan Qian
Login: ShenhanQian
Kind: user
Company: Technical University of Munich

Website: https://shenhanqian.com
Repositories: 5
Profile: https://github.com/ShenhanQian

GitHub Events

Total

Create event: 2
Release event: 2
Issues event: 66
Watch event: 199
Issue comment event: 143
Push event: 23
Pull request event: 5
Fork event: 26

Last Year

Create event: 2
Release event: 2
Issues event: 66
Watch event: 199
Issue comment event: 143
Push event: 23
Pull request event: 5
Fork event: 26

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 24
Total pull requests: 3
Average time to close issues: 8 days
Average time to close pull requests: about 4 hours
Total issue authors: 20
Total pull request authors: 3
Average comments per issue: 2.13
Average comments per pull request: 0.33
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 24
Pull requests: 3
Average time to close issues: 8 days
Average time to close pull requests: about 4 hours
Issue authors: 20
Pull request authors: 3
Average comments per issue: 2.13
Average comments per pull request: 0.33
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Jp-17 (5)
EvdoTheo (3)
zhoutianyang2002 (2)
GeneralWhite (2)
jiusigua (1)
ddxsg24 (1)
zjwfufu (1)
longyangqi (1)
rorrewang (1)
syncanimation (1)
trThanhnguyen (1)
ZhengyuDiao (1)
I-hsin-Chen (1)
zhang-zrq (1)
19010464 (1)

Pull Request Authors

chakri1804 (1)
seva100 (1)
jonathsch (1)
rensux (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

pyproject.toml pypi

BackgroundMattingV2 @git+https://github.com/ShenhanQian/BackgroundMattingV2
STAR @git+https://github.com/ShenhanQian/STAR/
chumpy *
dlib *
face-alignment *
face-detection-tflite *
ffmpeg-python *
gdown *
matplotlib ==3.8.0
numpy ==1.22.3
nvdiffrast @git+https://github.com/ShenhanQian/nvdiffrast@backface-culling
opencv-python *
pandas *
pillow *
pytorch3d @git+https://github.com/facebookresearch/pytorch3d.git
pyyaml *
scipy *
tensorboard *
torch *
torchvision *
trimesh *
tyro *

vhap

Science Score: 39.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

VHAP: Versatile Head Alignment with Adaptive Appearance Priors

Update

TL;DR

License

Setup

Install CUDA and ninja for compilation

Install PyTorch (make sure that the CUDA version matches with "Step 1")

or

make sure torch.cuda.is_available() returns True

Download

FLAME

Video Data

Multiview

Monocular

Usage

Monocular

Multiview

Discussions

Interactive Viewers

Cite

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies