marlin
[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary
Keywords
Repository
[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg
Basic Info
- Host: GitHub
- Owner: ControlNet
- License: other
- Language: Python
- Default Branch: master
- Homepage: https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper
- Size: 21.5 MB
Statistics
- Stars: 255
- Watchers: 9
- Forks: 26
- Open Issues: 10
- Releases: 2
Topics
Metadata Files
README.md
MARLIN: Masked Autoencoder for facial video Representation LearnINg
This repo is the official PyTorch implementation for the paper MARLIN: Masked Autoencoder for facial video Representation LearnINg (CVPR 2023).
Repository Structure
The repository contains 2 parts:
- marlin-pytorch: The PyPI package for MARLIN used for inference.
- The HuggingFace wrapper for MARLIN used for inference.
- The implementation for the paper including training and evaluation scripts.
``` . ├── assets # Images for README.md ├── LICENSE ├── README.md ├── MODEL_ZOO.md ├── CITATION.cff ├── .gitignore ├── .github
below is for the PyPI package marlin-pytorch
├── src # Source code for marlin-pytorch ├── tests # Unittest ├── requirements.lib.txt ├── setup.py ├── init.py ├── version.txt
below is for the huggingface wrapper
├── hf_src
below is for the paper implementation
├── configs # Configs for experiments settings ├── model # Marlin models ├── preprocess # Preprocessing scripts ├── dataset # Dataloaders ├── utils # Utility functions ├── train.py # Training script ├── evaluate.py # Evaluation script ├── requirements.txt
```
Use marlin-pytorch for Feature Extraction
Requirements: - Python >= 3.6, < 3.12 - PyTorch >= 1.8 - ffmpeg
Install from PyPI:
bash
pip install marlin-pytorch
Load MARLIN model from online ```python from marlin_pytorch import Marlin
Load MARLIN model from GitHub Release
model = Marlin.fromonline("marlinvitbaseytf") ```
Load MARLIN model from file ```python from marlin_pytorch import Marlin
Load MARLIN model from local file
model = Marlin.fromfile("marlinvitbaseytf", "path/to/marlin.pt")
Load MARLIN model from the ckpt file trained by the scripts in this repo
model = Marlin.fromfile("marlinvitbaseytf", "path/to/marlin.ckpt") ```
Current model name list:
- marlin_vit_small_ytf: ViT-small encoder trained on YTF dataset. Embedding 384 dim.
- marlin_vit_base_ytf: ViT-base encoder trained on YTF dataset. Embedding 768 dim.
- marlin_vit_large_ytf: ViT-large encoder trained on YTF dataset. Embedding 1024 dim.
For more details, see MODEL_ZOO.md.
When MARLIN model is retrieved from GitHub Release, it will be cached in .marlin. You can remove marlin cache by
python
from marlin_pytorch import Marlin
Marlin.clean_cache()
Extract features from cropped video file ```python
Extract features from facial cropped video with size (224x224)
features = model.extract_video("path/to/video.mp4") print(features.shape) # torch.Size([T, 768]) where T is the number of windows
You can keep output of all elements from the sequence by setting keep_seq=True
features = model.extractvideo("path/to/video.mp4", keepseq=True) print(features.shape) # torch.Size([T, k, 768]) where k = T/t * H/h * W/w = 8 * 14 * 14 = 1568 ```
Extract features from in-the-wild video file ```python
Extract features from in-the-wild video with various size
features = model.extractvideo("path/to/video.mp4", cropface=True) print(features.shape) # torch.Size([T, 768]) ```
Extract features from video clip tensor ```python
Extract features from clip tensor with size (B, 3, 16, 224, 224)
x = ... # video clip features = model.extractfeatures(x) # torch.Size([B, k, 768]) features = model.extractfeatures(x, keep_seq=False) # torch.Size([B, 768]) ```
Use transformers (HuggingFace) for Feature Extraction
Requirements: - Python - PyTorch - transformers - einops
Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc).
```python import torch from transformers import AutoModel
model = AutoModel.frompretrained( "ControlNet/marlinvitbaseytf", # or other variants trustremotecode=True ) tensor = torch.rand([1, 3, 16, 224, 224]) # (B, C, T, H, W) output = model(tensor) # torch.Size([1, 1568, 384]) ```
Paper Implementation
Requirements
- Python >= 3.7, < 3.12
- PyTorch ~= 1.11
- Torchvision ~= 0.12
Installation
Firstly, make sure you have installed PyTorch and Torchvision with or without CUDA.
Clone the repo and install the requirements:
bash
git clone https://github.com/ControlNet/MARLIN.git
cd MARLIN
pip install -r requirements.txt
MARLIN Pretraining
Download the YoutubeFaces dataset (only frame_images_DB is required).
Download the face parsing model from face_parsing.farl.lapa
and put it in utils/face_sdk/models/face_parsing/face_parsing_1.0.
Download the VideoMAE pretrained checkpoint for initializing the weights. (ps. They updated their models in this commit, but we are using the old models which are not shared anymore by the authors. So we uploaded this model by ourselves.)
Then run scripts to process the dataset:
bash
python preprocess/ytf_preprocess.py --data_dir /path/to/youtube_faces --max_workers 8
After processing, the directory structure should be like this:
├── YoutubeFaces
│ ├── frame_images_DB
│ │ ├── Aaron_Eckhart
│ │ │ ├── 0
│ │ │ │ ├── 0.555.jpg
│ │ │ │ ├── ...
│ │ │ ├── ...
│ │ ├── ...
│ ├── crop_images_DB
│ │ ├── Aaron_Eckhart
│ │ │ ├── 0
│ │ │ │ ├── 0.555.jpg
│ │ │ │ ├── ...
│ │ │ ├── ...
│ │ ├── ...
│ ├── face_parsing_images_DB
│ │ ├── Aaron_Eckhart
│ │ │ ├── 0
│ │ │ │ ├── 0.555.npy
│ │ │ │ ├── ...
│ │ │ ├── ...
│ │ ├── ...
│ ├── train_set.csv
│ ├── val_set.csv
Then, run the training script:
bash
python train.py \
--config config/pretrain/marlin_vit_base.yaml \
--data_dir /path/to/youtube_faces \
--n_gpus 4 \
--num_workers 8 \
--batch_size 16 \
--epochs 2000 \
--official_pretrained /path/to/videomae/checkpoint.pth
After trained, you can load the checkpoint for inference by
```python from marlinpytorch import Marlin from marlinpytorch.config import registermodelfrom_yaml
registermodelfromyaml("mymarlinmodel", "path/to/config.yaml") model = Marlin.fromfile("mymarlinmodel", "path/to/marlin.ckpt") ```
Evaluation
CelebV-HQ
#### 1. Download the dataset Download dataset from [CelebV-HQ](https://github.com/CelebV-HQ/CelebV-HQ) and the file structure should be like this: ``` ├── CelebV-HQ │ ├── downloaded │ │ ├── ***.mp4 │ │ ├── ... │ ├── celebvhq_info.json │ ├── ... ``` #### 2. Preprocess the dataset Crop the face region from the raw video and split the train val and test sets. ```bash python preprocess/celebvhq_preprocess.py --data_dir /path/to/CelebV-HQ ``` #### 3. Extract MARLIN features (Optional, if linear probing) Extract MARLIN features from the cropped video and saved to `License
This project is under the CC BY-NC 4.0 license. See LICENSE for details.
References
If you find this work useful for your research, please consider citing it.
bibtex
@inproceedings{cai2022marlin,
title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg},
author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023},
month = {June},
pages = {1493-1504},
doi = {10.1109/CVPR52729.2023.00150},
publisher = {IEEE},
}
Acknowledgements
Some code about model is based on MCG-NJU/VideoMAE. The code related to preprocessing is borrowed from JDAI-CV/FaceX-Zoo.
Owner
- Name: ControlNet
- Login: ControlNet
- Kind: user
- Website: controlnet.space
- Repositories: 30
- Profile: https://github.com/ControlNet
Study on: Computer Vision | Artificial Intelligence
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
type: conference-paper
title: "MARLIN: Masked Autoencoder for facial video Representation LearnINg"
authors:
- family-names: "Cai"
given-names: "Zhixi"
- family-names: "Ghosh"
given-names: "Shreya"
- family-names: "Stefanov"
given-names: "Kalin"
- family-names: "Dhall"
given-names: "Abhinav"
- family-names: "Cai"
given-names: "Jianfei"
- family-names: "Rezatofighi"
given-names: "Hamid"
- family-names: "Haffari"
given-names: "Reza"
- family-names: "Hayat"
given-names: "Munawar"
collection-title: "Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition"
year: 2023
location:
name: "Vancouver, Canada"
start: 1493
end: 1504
doi: 10.1109/CVPR52729.2023.00150
GitHub Events
Total
- Issues event: 6
- Watch event: 28
- Issue comment event: 10
- Push event: 3
- Pull request event: 4
- Fork event: 6
Last Year
- Issues event: 6
- Watch event: 28
- Issue comment event: 10
- Push event: 3
- Pull request event: 4
- Fork event: 6
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 42
- Total Committers: 1
- Avg Commits per committer: 42.0
- Development Distribution Score (DDS): 0.0
Top Committers
| Name | Commits | |
|---|---|---|
| ControlNet | s****x@h****m | 42 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 30
- Total pull requests: 4
- Average time to close issues: 3 months
- Average time to close pull requests: 6 months
- Total issue authors: 27
- Total pull request authors: 4
- Average comments per issue: 2.47
- Average comments per pull request: 2.25
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 1
- Average time to close issues: 13 days
- Average time to close pull requests: about 17 hours
- Issue authors: 4
- Pull request authors: 1
- Average comments per issue: 1.83
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- yanzichuan (3)
- imxtx (2)
- forkbabu (1)
- ZJ-CAI (1)
- vishalsantoshi (1)
- tyrink (1)
- Rookie-Kai (1)
- ByeongjunCho (1)
- wolverine28 (1)
- Octopus-Detective (1)
- yuanjunchai (1)
- dafang (1)
- tvaranka (1)
- ktrapeznikov (1)
- yossiyhartman (1)
Pull Request Authors
- aromanusc (2)
- secutron (2)
- GreyElaina (1)
- Nikki-Gu (1)