2d3mf

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

https://github.com/aiden200/2d3mf

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary

Keywords

audio deep-learning deepfake-detection machine-learning multimodal pytorch video

Last synced: 6 months ago · JSON representation

Repository

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

Basic Info

Host: GitHub
Owner: aiden200
License: other
Language: Python
Default Branch: master
Homepage:
Size: 121 MB

Statistics

Stars: 32
Watchers: 2
Forks: 2
Open Issues: 3
Releases: 0

Topics

audio deep-learning deepfake-detection machine-learning multimodal pytorch video

Created almost 2 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

2D3MF: Deepfake Detection using Multi Modal Middle Fusion

[!CAUTION] This repo is under development. No hyper parameter tuning is presented yet here; hence, the current architecture is not optimal for deepfake detection.

This repo is the implementation for the paper 2D3MF: Deepfake Detection using Multi Modal Middle Fusion.

Repository Structure

``` . assets # Images for README.md LICENSE README.md MODEL_ZOO.md CITATION.cff .gitignore .github

below is for the PyPI package marlin-pytorch

src # Source code for marlin-pytorch and audio feature extractors tests # Unittest requirements.lib.txt setup.py init.py version.txt

below is for the paper implementation

configs # Configs for experiments settings TD3MF # 2D3MF model code preprocess # Preprocessing scripts dataset # Dataloaders utils # Utility functions train.py # Training script evaluate.py # Evaluation script requirements.txt

```

Installing and running our model

Feature Extraction - 2D3MF

Install 2D3MF from pypi

bash pip install 2D3MF

Sample code snippet for feature extraction

python from TD3MF.classifier import TD3MF ckpt = "ckpt/celebvhq_marlin_deepfake_ft/last-v72.ckpt" model = TD3MF.load_from_checkpoint(ckpt) features = model.feature_extraction("2D3MF_Datasets/test/SampleVideo_1280x720_1mb.mp4")

We have some pretrained marlin checkpoints and configurations here

Paper Implementation

Requirements:

Python >= 3.7, < 3.12
PyTorch ~= 1.11
Torchvision ~= 0.12
ffmpeg

Installation

Install PyTorch from the official website

Clone the repo and install the requirements:

bash git clone https://github.com/aiden200/2D3MF cd 2D3MF pip install -e .

Training

1. Download Datasets

Forensics++

We cannot offer the direct script in our repository due to their terms on using the dataset. Please follow the instructions on the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) page to obtain the download script. #### Storage ```bash - FaceForensics++ - The original downladed source videos from youtube: 38.5GB - All h264 compressed videos with compression rate factor - raw/0: ~500GB - 23: ~10GB (Which we use) ``` #### Downloading the data Please download the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) dataset. We used the all light compressed original & altered videos of three manipulation methods. It's the script in the Forensics++ repository that ends with: ` -d all -c c23 -t videos` The script offers two servers which can be selected by add `--server `. If the `EU` server is not working for you, you can also try `EU2` which has been reported to work in some of those instances. #### Audio download Once the first two steps are executed, you should have a structure of ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences ``` Since the Forensics++ dataset doesn't provide audio data, we need to extract the data ourselves. Please run the script in the Forensics++ repository that ends with: ` -d original_youtube_videos_info` Now you should have a directory with the following structure: ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences |-- downloaded_videos_info ``` Please run the script from our repository: `python3 preprocess/faceforensics_scripts/extract_audio.py --dir [Parent_dir]` After this, you should have a directory with the following structure: ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences |-- downloaded_videos_info |-- audio_clips ``` #### References - Andreas Rssler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niener. "FaceForensics++: Learning to Detect Manipulated Facial Images." In _International Conference on Computer Vision (ICCV)_, 2019.

DFDC

Kaggle provides a nice and easy way to download the [DFDC dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data)

DeepFakeTIMIT

We recommend downloading the data from the [DeepfakeTIMIT Zenodo Record](https://zenodo.org/records/4068245)

FakeAVCeleb

We recommend requesting access to FakeAVCeleb via their [repo README](https://github.com/DASH-Lab/FakeAVCeleb)

RAVDESS

We recommend downloading the data from the [RAVDESS Zenodo Record](https://zenodo.org/records/1188976)

2. Preprocess the dataset

We recommend using the following unified dataset structure

2D3MF_Dataset/ DeepfakeTIMIT audio/*.wav video/*.mp4 DFDC audio/*.wav video/*.mp4 FakeAVCeleb audio/*.wav video/*.mp4 Forensics++ audio/*.wav video/*.mp4 RAVDESS audio/*.wav video/*.mp4

Crop the face region from the raw video. Run:

bash python3 preprocess/preprocess_clips.py --data_dir [Dataset_Dir]

3. Extract features from pretrained models

EfficientFace

Download the pre-trained EfficientFace from [here](https://github.com/zengqunzhao/EfficientFace) under 'Pre-trained models'. In our experiments, we use the model pre-trained on AffectNet7, i.e., EfficientFace_Trained_on_AffectNet7.pth.tar. Please place it under the `pretrained` directory

Run:

bash python preprocess/extract_features.py --data_dir /path/to/data --video_backbone [VIDEO_BACKBONE] --audio_backbone [AUDIO_BACKBONE]

[VIDEOBACKBONE] can be replaced with one of the following: - marlinvitsmallytf - marlinvitbaseytf - marlinvitlargeytf - efficientface

[AUDIO_BACKBONE] can be replaced with one of the following: - MFCC - xvectors - resnet - emotion2vec - eat

Optionally add the --Forensics flag in the end if Forensics++ is the dataset being processed.

From our paper, we found that eat works the best as the audio backbone.

Split the train val and test sets. Run:

bash python preprocess/gen_split.py --data_dir /path/to/data --test 0.1 --val 0.1 --feat_type [AUDIO_BACKBONE]

Note that the pre-trained video_backbone and audio_backbone can be downloaded from MODEL_ZOO.md

4. Train and evaluate

Train and evaluate the 2D3MF model..

Please use the configs in config/*.yaml as the config file.

```bash python evaluate.py \ --config /path/to/config \ --datapath /path/to/CelebV-HQ --numworkers 4 --batch_size 16

python evaluate.py \ --config /path/to/config \ --datapath /path/to/dataset \ --numworkers 4 \ --batchsize 8 \ --marlinckpt pretrained/marlinvitbase_ytf.encoder.pt \ --epochs 300

python evaluate.py --config config/celebvhqmarlindeepfakeft.yaml --datapath 2D3MFDatasets --numworkers 4 --batchsize 1 --marlinckpt pretrained/marlinvitsmall_ytf.encoder.pt --epochs 300 ```

Optionally, add bash --skip_train --resume /path/to/checkpoint

To skip the training.

5. Configuration File

Set a configuration file based on your hyperparameters and backbones. You can find a example config file under config/

Explanation: - training_datasets - list, can contain one or more datasets within "DeepfakeTIMIT", "RAVDESS", "Forensics++", "DFDC", "FakeAVCeleb" - eval_datasets- list, can contain one or more datasets within "DeepfakeTIMIT", "RAVDESS", "Forensics++", "DFDC", "FakeAVCeleb" - learning_rate - int, ex: 1.00e-3 - num_heads - int, Number of attention heads - fusion - str, Choice of fusion type: "mf" for middle fusion and "lf" for late fusion. - audio_positional_encoding - bool, add audio positional encoding - hidden_layers - int, hidden layers - lp_only - bool, setting this to be true will perform inference from the video features only - audio_backbone- str, select one of the following options: "MFCC", "eat", "xvectors", "resnet", "emotion2vec" - middle_fusion_type- str, select one of the following options: "default", "audio_refuse", "video_refuse", "self_attention", "self_cross_attention" - modality_dropout - float, modality dropout rate - video_backbone - str, select one of the following options: "efficientface", "marlin"

6. Performing Grid Search

config/gridsearchconfig.py
--grid_search

7. Monitoring Performance:

Run

bash tensorboard --logdir=lightning_logs/

Should be hosted on http://localhost:6006/

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

Please cite our work!

```bibtex

```

Acknowledgements

Some code about model is based on ControlNet/MARLIN. The code related to middle fusion is from Self-attention fusion for audiovisual emotion recognition with incomplete data.

Our Audio Feature Extraction Models:

Our Video Feature Extraction Models:

Owner

Name: Aiden Chang
Login: aiden200
Kind: user
Company: University of Southern California

Website: https://www.aidenwchang.com/
Repositories: 21
Profile: https://github.com/aiden200

Graduate Student at USC

GitHub Events

Total

Issues event: 4
Watch event: 4
Delete event: 1
Issue comment event: 7
Push event: 1
Fork event: 2
Create event: 1

Last Year

Issues event: 4
Watch event: 4
Delete event: 1
Issue comment event: 7
Push event: 1
Fork event: 2
Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: 3 days
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: 3 days
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

aiden200 (9)
adrianSRoman (2)
MachoMaheen (1)
ywh-my (1)

Pull Request Authors

aiden200 (24)
aromanusc (13)
hyunkeup (6)
adrianSRoman (4)
cy3021561 (2)
kevdozer1 (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/release.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite
marvinpinto/action-automatic-releases latest composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/unittest.yaml actions

actions/checkout v3 composite
actions/checkout v2 composite
actions/setup-python v4 composite
actions/setup-python v2 composite
pierotofy/set-swap-space master composite

requirements.txt pypi

einops *
ffmpeg-python >=0.2.0
marlin_pytorch ==0.3.4
matplotlib >=3.5.2
numpy >=1.23
opencv-python >=4.6
pandas *
pillow >=9.2.0
pytorch_lightning ==1.7.
pyyaml >=6.0
scikit-image >=0.19.3
torch *
torchmetrics ==0.11.
torchvision >=0.12.0
tqdm >=4.64.0

setup.py pypi

src/fairseq/EAT/requirements.txt pypi

fairseq ==0.12.2
h5py ==3.10.0
numpy ==1.26.3
omegaconf ==2.0.6
pyarrow ==15.0.0
scikit_learn ==1.3.2
soundfile ==0.12.1
tensorboardX ==2.6.2.2
timm ==0.9.12
torch ==2.1.2
torchaudio ==2.1.2
torchsummary ==1.5.1

src/fairseq/examples/MMPT/setup.py pypi

src/fairseq/examples/emotion_conversion/requirements.txt pypi

amfm_decompy *
appdirs *
decorator *
einops *
joblib *
numba *
packaging *
requests *
scipy *
six *
sklearn *

src/fairseq/examples/multilingual/data_scripts/requirement.txt pypi

pandas *
wget *

src/fairseq/examples/speech_to_speech/asr_bleu/requirements.txt pypi

fairseq ==0.12.2
pandas ==1.4.3
sacrebleu ==2.2.0
torch ==1.12.1
torchaudio ==0.12.1
tqdm ==4.64.0
transformers ==4.21.1

src/fairseq/fairseq/modules/dynamicconv_layer/setup.py pypi

src/fairseq/fairseq/modules/lightconv_layer/setup.py pypi

src/fairseq/hydra_plugins/dependency_submitit_launcher/setup.py pypi

hydra-core >=1.0.4
submitit >=1.0.0

src/fairseq/pyproject.toml pypi

src/fairseq/setup.py pypi

bitarray *
cffi *
cython *
hydra-core >=1.0.7,<1.1
numpy >=1.21.3
omegaconf <2.1
packaging *
regex *
sacrebleu >=1.4.12
scikit-learn *
torch >=1.13
torchaudio >=0.8.0
tqdm *