2d3mf

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

https://github.com/aiden200/2d3mf

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords

audio deep-learning deepfake-detection machine-learning multimodal pytorch video
Last synced: 6 months ago · JSON representation

Repository

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

Basic Info
  • Host: GitHub
  • Owner: aiden200
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 121 MB
Statistics
  • Stars: 32
  • Watchers: 2
  • Forks: 2
  • Open Issues: 3
  • Releases: 0
Topics
audio deep-learning deepfake-detection machine-learning multimodal pytorch video
Created almost 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

2D3MF: Deepfake Detection using Multi Modal Middle Fusion

[!CAUTION] This repo is under development. No hyper parameter tuning is presented yet here; hence, the current architecture is not optimal for deepfake detection.

This repo is the implementation for the paper 2D3MF: Deepfake Detection using Multi Modal Middle Fusion.

Repository Structure

``` . assets # Images for README.md LICENSE README.md MODEL_ZOO.md CITATION.cff .gitignore .github

below is for the PyPI package marlin-pytorch

src # Source code for marlin-pytorch and audio feature extractors tests # Unittest requirements.lib.txt setup.py init.py version.txt

below is for the paper implementation

configs # Configs for experiments settings TD3MF # 2D3MF model code preprocess # Preprocessing scripts dataset # Dataloaders utils # Utility functions train.py # Training script evaluate.py # Evaluation script requirements.txt

```

Installing and running our model

Feature Extraction - 2D3MF

Install 2D3MF from pypi

bash pip install 2D3MF

Sample code snippet for feature extraction

python from TD3MF.classifier import TD3MF ckpt = "ckpt/celebvhq_marlin_deepfake_ft/last-v72.ckpt" model = TD3MF.load_from_checkpoint(ckpt) features = model.feature_extraction("2D3MF_Datasets/test/SampleVideo_1280x720_1mb.mp4")

We have some pretrained marlin checkpoints and configurations here

Paper Implementation

Requirements:

  • Python >= 3.7, < 3.12
  • PyTorch ~= 1.11
  • Torchvision ~= 0.12
  • ffmpeg

Installation

Install PyTorch from the official website

Clone the repo and install the requirements:

bash git clone https://github.com/aiden200/2D3MF cd 2D3MF pip install -e .

Training

1. Download Datasets

Forensics++ We cannot offer the direct script in our repository due to their terms on using the dataset. Please follow the instructions on the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) page to obtain the download script. #### Storage ```bash - FaceForensics++ - The original downladed source videos from youtube: 38.5GB - All h264 compressed videos with compression rate factor - raw/0: ~500GB - 23: ~10GB (Which we use) ``` #### Downloading the data Please download the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) dataset. We used the all light compressed original & altered videos of three manipulation methods. It's the script in the Forensics++ repository that ends with: ` -d all -c c23 -t videos` The script offers two servers which can be selected by add `--server `. If the `EU` server is not working for you, you can also try `EU2` which has been reported to work in some of those instances. #### Audio download Once the first two steps are executed, you should have a structure of ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences ``` Since the Forensics++ dataset doesn't provide audio data, we need to extract the data ourselves. Please run the script in the Forensics++ repository that ends with: ` -d original_youtube_videos_info` Now you should have a directory with the following structure: ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences |-- downloaded_videos_info ``` Please run the script from our repository: `python3 preprocess/faceforensics_scripts/extract_audio.py --dir [Parent_dir]` After this, you should have a directory with the following structure: ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences |-- downloaded_videos_info |-- audio_clips ``` #### References - Andreas Rssler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niener. "FaceForensics++: Learning to Detect Manipulated Facial Images." In _International Conference on Computer Vision (ICCV)_, 2019.
DFDC Kaggle provides a nice and easy way to download the [DFDC dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data)
DeepFakeTIMIT We recommend downloading the data from the [DeepfakeTIMIT Zenodo Record](https://zenodo.org/records/4068245)
FakeAVCeleb We recommend requesting access to FakeAVCeleb via their [repo README](https://github.com/DASH-Lab/FakeAVCeleb)
RAVDESS We recommend downloading the data from the [RAVDESS Zenodo Record](https://zenodo.org/records/1188976)

2. Preprocess the dataset

We recommend using the following unified dataset structure

2D3MF_Dataset/ DeepfakeTIMIT audio/*.wav video/*.mp4 DFDC audio/*.wav video/*.mp4 FakeAVCeleb audio/*.wav video/*.mp4 Forensics++ audio/*.wav video/*.mp4 RAVDESS audio/*.wav video/*.mp4

Crop the face region from the raw video. Run:

bash python3 preprocess/preprocess_clips.py --data_dir [Dataset_Dir]

3. Extract features from pretrained models

EfficientFace Download the pre-trained EfficientFace from [here](https://github.com/zengqunzhao/EfficientFace) under 'Pre-trained models'. In our experiments, we use the model pre-trained on AffectNet7, i.e., EfficientFace_Trained_on_AffectNet7.pth.tar. Please place it under the `pretrained` directory

Run:

bash python preprocess/extract_features.py --data_dir /path/to/data --video_backbone [VIDEO_BACKBONE] --audio_backbone [AUDIO_BACKBONE]

[VIDEOBACKBONE] can be replaced with one of the following: - marlinvitsmallytf - marlinvitbaseytf - marlinvitlargeytf - efficientface

[AUDIO_BACKBONE] can be replaced with one of the following: - MFCC - xvectors - resnet - emotion2vec - eat

Optionally add the --Forensics flag in the end if Forensics++ is the dataset being processed.

From our paper, we found that eat works the best as the audio backbone.

Split the train val and test sets. Run:

bash python preprocess/gen_split.py --data_dir /path/to/data --test 0.1 --val 0.1 --feat_type [AUDIO_BACKBONE]

Note that the pre-trained video_backbone and audio_backbone can be downloaded from MODEL_ZOO.md

4. Train and evaluate

Train and evaluate the 2D3MF model..

Please use the configs in config/*.yaml as the config file.

```bash python evaluate.py \ --config /path/to/config \ --datapath /path/to/CelebV-HQ --numworkers 4 --batch_size 16

python evaluate.py \ --config /path/to/config \ --datapath /path/to/dataset \ --numworkers 4 \ --batchsize 8 \ --marlinckpt pretrained/marlinvitbase_ytf.encoder.pt \ --epochs 300

python evaluate.py --config config/celebvhqmarlindeepfakeft.yaml --datapath 2D3MFDatasets --numworkers 4 --batchsize 1 --marlinckpt pretrained/marlinvitsmall_ytf.encoder.pt --epochs 300 ```

Optionally, add bash --skip_train --resume /path/to/checkpoint

To skip the training.

5. Configuration File

Set a configuration file based on your hyperparameters and backbones. You can find a example config file under config/

Explanation: - training_datasets - list, can contain one or more datasets within "DeepfakeTIMIT", "RAVDESS", "Forensics++", "DFDC", "FakeAVCeleb" - eval_datasets- list, can contain one or more datasets within "DeepfakeTIMIT", "RAVDESS", "Forensics++", "DFDC", "FakeAVCeleb" - learning_rate - int, ex: 1.00e-3 - num_heads - int, Number of attention heads - fusion - str, Choice of fusion type: "mf" for middle fusion and "lf" for late fusion. - audio_positional_encoding - bool, add audio positional encoding - hidden_layers - int, hidden layers - lp_only - bool, setting this to be true will perform inference from the video features only - audio_backbone- str, select one of the following options: "MFCC", "eat", "xvectors", "resnet", "emotion2vec" - middle_fusion_type- str, select one of the following options: "default", "audio_refuse", "video_refuse", "self_attention", "self_cross_attention" - modality_dropout - float, modality dropout rate - video_backbone - str, select one of the following options: "efficientface", "marlin"

6. Performing Grid Search

  • config/gridsearchconfig.py
  • --grid_search

7. Monitoring Performance:

Run

bash tensorboard --logdir=lightning_logs/

Should be hosted on http://localhost:6006/

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

Please cite our work!

```bibtex

```

Acknowledgements

Some code about model is based on ControlNet/MARLIN. The code related to middle fusion is from Self-attention fusion for audiovisual emotion recognition with incomplete data.

Our Audio Feature Extraction Models:

Our Video Feature Extraction Models:

Owner

  • Name: Aiden Chang
  • Login: aiden200
  • Kind: user
  • Company: University of Southern California

Graduate Student at USC

GitHub Events

Total
  • Issues event: 4
  • Watch event: 4
  • Delete event: 1
  • Issue comment event: 7
  • Push event: 1
  • Fork event: 2
  • Create event: 1
Last Year
  • Issues event: 4
  • Watch event: 4
  • Delete event: 1
  • Issue comment event: 7
  • Push event: 1
  • Fork event: 2
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 3 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 3 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • aiden200 (9)
  • adrianSRoman (2)
  • MachoMaheen (1)
  • ywh-my (1)
Pull Request Authors
  • aiden200 (24)
  • aromanusc (13)
  • hyunkeup (6)
  • adrianSRoman (4)
  • cy3021561 (2)
  • kevdozer1 (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/release.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • marvinpinto/action-automatic-releases latest composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/unittest.yaml actions
  • actions/checkout v3 composite
  • actions/checkout v2 composite
  • actions/setup-python v4 composite
  • actions/setup-python v2 composite
  • pierotofy/set-swap-space master composite
requirements.txt pypi
  • einops *
  • ffmpeg-python >=0.2.0
  • marlin_pytorch ==0.3.4
  • matplotlib >=3.5.2
  • numpy >=1.23
  • opencv-python >=4.6
  • pandas *
  • pillow >=9.2.0
  • pytorch_lightning ==1.7.
  • pyyaml >=6.0
  • scikit-image >=0.19.3
  • torch *
  • torchmetrics ==0.11.
  • torchvision >=0.12.0
  • tqdm >=4.64.0
setup.py pypi
src/fairseq/EAT/requirements.txt pypi
  • fairseq ==0.12.2
  • h5py ==3.10.0
  • numpy ==1.26.3
  • omegaconf ==2.0.6
  • pyarrow ==15.0.0
  • scikit_learn ==1.3.2
  • soundfile ==0.12.1
  • tensorboardX ==2.6.2.2
  • timm ==0.9.12
  • torch ==2.1.2
  • torchaudio ==2.1.2
  • torchsummary ==1.5.1
src/fairseq/examples/MMPT/setup.py pypi
src/fairseq/examples/emotion_conversion/requirements.txt pypi
  • amfm_decompy *
  • appdirs *
  • decorator *
  • einops *
  • joblib *
  • numba *
  • packaging *
  • requests *
  • scipy *
  • six *
  • sklearn *
src/fairseq/examples/multilingual/data_scripts/requirement.txt pypi
  • pandas *
  • wget *
src/fairseq/examples/speech_to_speech/asr_bleu/requirements.txt pypi
  • fairseq ==0.12.2
  • pandas ==1.4.3
  • sacrebleu ==2.2.0
  • torch ==1.12.1
  • torchaudio ==0.12.1
  • tqdm ==4.64.0
  • transformers ==4.21.1
src/fairseq/fairseq/modules/dynamicconv_layer/setup.py pypi
src/fairseq/fairseq/modules/lightconv_layer/setup.py pypi
src/fairseq/hydra_plugins/dependency_submitit_launcher/setup.py pypi
  • hydra-core >=1.0.4
  • submitit >=1.0.0
src/fairseq/pyproject.toml pypi
src/fairseq/setup.py pypi
  • bitarray *
  • cffi *
  • cython *
  • hydra-core >=1.0.7,<1.1
  • numpy >=1.21.3
  • omegaconf <2.1
  • packaging *
  • regex *
  • sacrebleu >=1.4.12
  • scikit-learn *
  • torch >=1.13
  • torchaudio >=0.8.0
  • tqdm *