2d3mf
Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Keywords
Repository
Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
Basic Info
Statistics
- Stars: 32
- Watchers: 2
- Forks: 2
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
2D3MF: Deepfake Detection using Multi Modal Middle Fusion
[!CAUTION] This repo is under development. No hyper parameter tuning is presented yet here; hence, the current architecture is not optimal for deepfake detection.
This repo is the implementation for the paper 2D3MF: Deepfake Detection using Multi Modal Middle Fusion.
Repository Structure
``` . assets # Images for README.md LICENSE README.md MODEL_ZOO.md CITATION.cff .gitignore .github
below is for the PyPI package marlin-pytorch
src # Source code for marlin-pytorch and audio feature extractors tests # Unittest requirements.lib.txt setup.py init.py version.txt
below is for the paper implementation
configs # Configs for experiments settings TD3MF # 2D3MF model code preprocess # Preprocessing scripts dataset # Dataloaders utils # Utility functions train.py # Training script evaluate.py # Evaluation script requirements.txt
```
Installing and running our model
Feature Extraction - 2D3MF
Install 2D3MF from pypi
bash
pip install 2D3MF
Sample code snippet for feature extraction
python
from TD3MF.classifier import TD3MF
ckpt = "ckpt/celebvhq_marlin_deepfake_ft/last-v72.ckpt"
model = TD3MF.load_from_checkpoint(ckpt)
features = model.feature_extraction("2D3MF_Datasets/test/SampleVideo_1280x720_1mb.mp4")
We have some pretrained marlin checkpoints and configurations here
Paper Implementation
Requirements:
- Python >= 3.7, < 3.12
- PyTorch ~= 1.11
- Torchvision ~= 0.12
- ffmpeg
Installation
Install PyTorch from the official website
Clone the repo and install the requirements:
bash
git clone https://github.com/aiden200/2D3MF
cd 2D3MF
pip install -e .
Training
1. Download Datasets
Forensics++
We cannot offer the direct script in our repository due to their terms on using the dataset. Please follow the instructions on the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) page to obtain the download script. #### Storage ```bash - FaceForensics++ - The original downladed source videos from youtube: 38.5GB - All h264 compressed videos with compression rate factor - raw/0: ~500GB - 23: ~10GB (Which we use) ``` #### Downloading the data Please download the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) dataset. We used the all light compressed original & altered videos of three manipulation methods. It's the script in the Forensics++ repository that ends with: `DFDC
Kaggle provides a nice and easy way to download the [DFDC dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data)DeepFakeTIMIT
We recommend downloading the data from the [DeepfakeTIMIT Zenodo Record](https://zenodo.org/records/4068245)FakeAVCeleb
We recommend requesting access to FakeAVCeleb via their [repo README](https://github.com/DASH-Lab/FakeAVCeleb)RAVDESS
We recommend downloading the data from the [RAVDESS Zenodo Record](https://zenodo.org/records/1188976)2. Preprocess the dataset
We recommend using the following unified dataset structure
2D3MF_Dataset/
DeepfakeTIMIT
audio/*.wav
video/*.mp4
DFDC
audio/*.wav
video/*.mp4
FakeAVCeleb
audio/*.wav
video/*.mp4
Forensics++
audio/*.wav
video/*.mp4
RAVDESS
audio/*.wav
video/*.mp4
Crop the face region from the raw video. Run:
bash
python3 preprocess/preprocess_clips.py --data_dir [Dataset_Dir]
3. Extract features from pretrained models
EfficientFace
Download the pre-trained EfficientFace from [here](https://github.com/zengqunzhao/EfficientFace) under 'Pre-trained models'. In our experiments, we use the model pre-trained on AffectNet7, i.e., EfficientFace_Trained_on_AffectNet7.pth.tar. Please place it under the `pretrained` directoryRun:
bash
python preprocess/extract_features.py --data_dir /path/to/data --video_backbone [VIDEO_BACKBONE] --audio_backbone [AUDIO_BACKBONE]
[VIDEOBACKBONE] can be replaced with one of the following: - marlinvitsmallytf - marlinvitbaseytf - marlinvitlargeytf - efficientface
[AUDIO_BACKBONE] can be replaced with one of the following: - MFCC - xvectors - resnet - emotion2vec - eat
Optionally add the --Forensics flag in the end if Forensics++ is the dataset being processed.
From our paper, we found that eat works the best as the audio backbone.
Split the train val and test sets. Run:
bash
python preprocess/gen_split.py --data_dir /path/to/data --test 0.1 --val 0.1 --feat_type [AUDIO_BACKBONE]
Note that the pre-trained video_backbone and audio_backbone can be downloaded from MODEL_ZOO.md
4. Train and evaluate
Train and evaluate the 2D3MF model..
Please use the configs in config/*.yaml as the config file.
```bash python evaluate.py \ --config /path/to/config \ --datapath /path/to/CelebV-HQ --numworkers 4 --batch_size 16
python evaluate.py \ --config /path/to/config \ --datapath /path/to/dataset \ --numworkers 4 \ --batchsize 8 \ --marlinckpt pretrained/marlinvitbase_ytf.encoder.pt \ --epochs 300
python evaluate.py --config config/celebvhqmarlindeepfakeft.yaml --datapath 2D3MFDatasets --numworkers 4 --batchsize 1 --marlinckpt pretrained/marlinvitsmall_ytf.encoder.pt --epochs 300 ```
Optionally, add
bash
--skip_train --resume /path/to/checkpoint
To skip the training.
5. Configuration File
Set a configuration file based on your hyperparameters and backbones. You can find a example config file under config/
Explanation:
- training_datasets - list, can contain one or more datasets within "DeepfakeTIMIT", "RAVDESS", "Forensics++", "DFDC", "FakeAVCeleb"
- eval_datasets- list, can contain one or more datasets within "DeepfakeTIMIT", "RAVDESS", "Forensics++", "DFDC", "FakeAVCeleb"
- learning_rate - int, ex: 1.00e-3
- num_heads - int, Number of attention heads
- fusion - str, Choice of fusion type: "mf" for middle fusion and "lf" for late fusion.
- audio_positional_encoding - bool, add audio positional encoding
- hidden_layers - int, hidden layers
- lp_only - bool, setting this to be true will perform inference from the video features only
- audio_backbone- str, select one of the following options: "MFCC", "eat", "xvectors", "resnet", "emotion2vec"
- middle_fusion_type- str, select one of the following options: "default", "audio_refuse", "video_refuse", "self_attention", "self_cross_attention"
- modality_dropout - float, modality dropout rate
- video_backbone - str, select one of the following options: "efficientface", "marlin"
6. Performing Grid Search
- config/gridsearchconfig.py
- --grid_search
7. Monitoring Performance:
Run
bash
tensorboard --logdir=lightning_logs/
Should be hosted on http://localhost:6006/
License
This project is under the CC BY-NC 4.0 license. See LICENSE for details.
References
Please cite our work!
```bibtex
```
Acknowledgements
Some code about model is based on ControlNet/MARLIN. The code related to middle fusion is from Self-attention fusion for audiovisual emotion recognition with incomplete data.
Our Audio Feature Extraction Models:
Our Video Feature Extraction Models:
Owner
- Name: Aiden Chang
- Login: aiden200
- Kind: user
- Company: University of Southern California
- Website: https://www.aidenwchang.com/
- Repositories: 21
- Profile: https://github.com/aiden200
Graduate Student at USC
GitHub Events
Total
- Issues event: 4
- Watch event: 4
- Delete event: 1
- Issue comment event: 7
- Push event: 1
- Fork event: 2
- Create event: 1
Last Year
- Issues event: 4
- Watch event: 4
- Delete event: 1
- Issue comment event: 7
- Push event: 1
- Fork event: 2
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- aiden200 (9)
- adrianSRoman (2)
- MachoMaheen (1)
- ywh-my (1)
Pull Request Authors
- aiden200 (24)
- aromanusc (13)
- hyunkeup (6)
- adrianSRoman (4)
- cy3021561 (2)
- kevdozer1 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- marvinpinto/action-automatic-releases latest composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v3 composite
- actions/checkout v2 composite
- actions/setup-python v4 composite
- actions/setup-python v2 composite
- pierotofy/set-swap-space master composite
- einops *
- ffmpeg-python >=0.2.0
- marlin_pytorch ==0.3.4
- matplotlib >=3.5.2
- numpy >=1.23
- opencv-python >=4.6
- pandas *
- pillow >=9.2.0
- pytorch_lightning ==1.7.
- pyyaml >=6.0
- scikit-image >=0.19.3
- torch *
- torchmetrics ==0.11.
- torchvision >=0.12.0
- tqdm >=4.64.0
- fairseq ==0.12.2
- h5py ==3.10.0
- numpy ==1.26.3
- omegaconf ==2.0.6
- pyarrow ==15.0.0
- scikit_learn ==1.3.2
- soundfile ==0.12.1
- tensorboardX ==2.6.2.2
- timm ==0.9.12
- torch ==2.1.2
- torchaudio ==2.1.2
- torchsummary ==1.5.1
- amfm_decompy *
- appdirs *
- decorator *
- einops *
- joblib *
- numba *
- packaging *
- requests *
- scipy *
- six *
- sklearn *
- pandas *
- wget *
- fairseq ==0.12.2
- pandas ==1.4.3
- sacrebleu ==2.2.0
- torch ==1.12.1
- torchaudio ==0.12.1
- tqdm ==4.64.0
- transformers ==4.21.1
- hydra-core >=1.0.4
- submitit >=1.0.0
- bitarray *
- cffi *
- cython *
- hydra-core >=1.0.7,<1.1
- numpy >=1.21.3
- omegaconf <2.1
- packaging *
- regex *
- sacrebleu >=1.4.12
- scikit-learn *
- torch >=1.13
- torchaudio >=0.8.0
- tqdm *