audiodeepfake-detection

Official Implementation of "Towards generalizing deep-audio fake detection networks".

https://github.com/gan-police/audiodeepfake-detection

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Official Implementation of "Towards generalizing deep-audio fake detection networks".

Basic Info
  • Host: GitHub
  • Owner: gan-police
  • License: eupl-1.2
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 10.8 MB
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 5
Created over 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme Contributing License Citation

README.md

Towards generalizing deep-audio-fake detection networks

This is the supplementary source code for our paper "Towards generalizing deep-audio fake detection networks".

fingerprints

The figure above shows our studies of stable frequency domain patterns created by the different GAN architectures. The figure shows mean absolute level 14 Haar-Wavelet packet transform coefficients for LJSpeech and MelGAN audio files. The transform reveals that MelGAN produces a spike-shaped pattern in the frequency domain. We observe this for newer architectures like BigVGAN an Avocodo as well.

Further, by leveraging the wavelet-packet and short-time fourier transform, we train excellent lightweight detectors that generalize well to unseen generators and examine the results in our paper.

Assets

For wavelet computations, we use the: - PyTorch-Wavelet-Toolbox: ptwt

We compare our approach to the DCT-LFCC/MFCC-method from: - WaveFake: A Data Set to Facilitate Audio DeepFake Detection

Datasets

We utilize two datasets that appeared in previous work and add our own extension:

GAN Architectures

We utilize pre-trained models from the following repositories:

We used the inofficial implementation of Avocodo from commit 2999557 to train the avocodo vocoder.

Fingerprint audios

We created an audible version of the mean spectra of the GAN fingerprints, by transforming them back into the time domain. The folder audio-samples/generator_artifacts/ contains sound files with amplified generator artifacts. The files are exciting but not aesthetically pleasing. We recommend listening at low volumes. Comparing A_ljspeech_real.wav with one of the GAN spectra reveals clearly audible distinctions.

Reproduction

The following section of the README serves as a guide to reproducing the experiments from our paper.

Installation

The latest code can be installed in development mode in a running installation of python 3.10 or 3.11 with:

shell git clone git@github.com:gan-police/audiodeepfake-detection.git Move to the repository with shell cd audiodeepfake-detection and install all requirements with shell pip install -r requirements.txt

Preparation WaveFake

As the WaveFake dataset contains gan generated audios equivalent to the audios of LJSpeech, no further preparation needs to be done to get all audios that are needed. We work with mono-channeled audios of different sizes. Hence, the raw audio needs to be cut into equally sized frames of desired size. We mainly used frames of 1s. The sample rate can be varied as well.

To do this store all audios (original and GAN-generated) in separate subdirectories, i.e. the directory structure should look like this

data B_melgan | LJ001-0001_gen.wav | ... | LJ008-0217_gen.wav C_hifigan | LJ001-0001_gen.wav | ... | LJ008-0217_gen.wav . . . K_lbigvgan | LJ001-0001_gen.wav | ... | LJ008-0217_gen.wav A_ljspeech LJ001-0001.wav ... LJ008-0217.wav

The prefixes of the folders are important, since the directories get the labels in lexicographic order of their prefix, i.e. directory A_... gets label 0, B_... label 1, etc. If you skip certain letters in the alphabet that is okay as well. The labels will be in ascending order beginning from 0 automatically.

Now, to prepare the data sets set the data_path and save_path variable accordingly in scripts.prepare_ljspeech run python -m scripts.prepare_ljspeech. It reads the data set, cuts all audios to pieces of given size, splits them into a training, validation and test set, and stores the corresponding audio paths with the frame numbers for each audio as numpy arrays. Use the parameter use_only to specify the name of the directories that should be used from the given data path. E.g. if there are directories A_ljspeech, B_melgan and C_hifigan but you only want to use the first two, set only_use=["ljspeech", "melgan"] in the corresponding dataset.

This process could take some time, because it reads the length of all audio files. The results will be saved in the directory specified in save_path and hence this process has to only run once for each dataset.

For reproduction, use following (symbolic) folder structure: data A_ljspeech B_melgan C_hifigan D_mbmelgan E_fbmelgan F_waveglow G_pwg H_lmelgan I_avocodo J_bigvgan K_lbigvgan L_conformer M_jsutmbmelgan N_jsutpwg

Training the Classifier

Now you should be able to train a classifier using the config file in scripts/gridsearch_config.py and the train scripts. The train scripts start the training process with some configuration values that can be changed. These will be loaded into a variable dict named args wich is dot accessible (e.g. args.epochs). If you run e.g. scripts/train.sh python will run audiofakedetect.train_classifier using the grid search functionality. In this case the given training parameters will be overridden if found in the config dict in scripts/gridsearch_config.py. There you can also define new training args if you want to use them later in a model or somewhere else in the code. Each parameter expects a list of values with len(list) >= 1. If you only give one value it will run only this one experiment. If you give more than one value the script will run two different experiments, one for each value. If you give more than one value for e.g. two parameters, the script will run 2 * 2 = 4 experiments.

Keep in mind that each experiment will be run for the several different seeds (default is 3 different seeds). If you provide init_seeds as config key, you can give a list of seeds, that will be used, e.g. --init-seeds 0 1 2 3 4 for seeds 0-4.

Important: Set the data_path, save_path, limit_train, cross_limit, seconds the same as in scripts/prepare_ljspeech.py.

Using a cluster with slurm installed an examplary run for training the DCNN with a sym5 could be achieved by using the given config and executing

bash sbatch scripts/train.sh packets fbmelgan 256 sym5 2.0 False 320 1

Evaluating the Classifier

Calculating accuracy and equal error rate (EER)

To test a model that was already trained, set the argument only_testing in the initial training script or in scripts/gridsearch_config.py to True.

Attribution using integrated gradients

To calculate the attribution using integrated gradients for a classifier, you first need to train the model using the above configuration. The config argument only_ig and only_testing must both be False. When training is finished, set the config argument only_ig to True. In gridsearch_config.py this would look like this: "only_ig": [True]. Further, set "target": [0, 1, None], to attribute the model for neuron 0 (real), neuron 1 (fake) and for None (both). This will create the following files in the log directory (set in train.sh):

bash log_dir/plots/some_model_config_[used-sources]x2500_target-0_integrated_gradients.npy log_dir/plots/some_model_config_[used-sources]x2500_target-1_integrated_gradients.npy log_dir/plots/some_model_config_[used-sources]x2500_target-01_integrated_gradients.npy

To plot the results, configure the variables in plot_attribution of src/audiofakedetect/integrated_gradients.py and execute it using scripts/attribution.py.

Example Misclassifications per Model

It is possible to extract misclassified audio fakes by comparing the correct classifications of different models. This can be done by using the "get_details": [True] config in your gridsearch_config.py. Set "only_testing": [True] as well if you have trained your models already. Using this configuration each test loop for each individual experiment will produce an output numpy file in the log dir with a file name starting with true_ind....

To compare to experiment results (e.g. of a model with dilation and one without), use the scripts/analyze_model_diffs.py script to extract 10 sample audios which are correctly classified by the first given model and incorrectly classified by the second given model. You might need to adjust the corresponding file paths in the script to point to the result from the testing process and also specify a save path for the audios.

In the following we provide some examplary audios that were correctly classified as deep fake by our DCNN and misclassified (as real) by the same model without dilation: - BmelganLJ016-0433gen4.wav - DmbmelganLJ014-0293gen2.wav - HlmelganLJ002-0228gen10.wav - KlbigvganLJ021-0060generated6.wav

Building the documentation

To build the documentation move into docs/ and install the requirements with bash pip install -r requirements.txt

Now you can run the makefile with

bash make html

You can finde the built version inside the build/html/ folder.

Issues

As we use the Adam optimizer of the python module pytorch, we recommend to use torch 2.0.0, torchaudio 2.0.0 and cuda 11.7.

Important: If training with multiple GPUs be aware of the train, test and val set sizes to be equal to our initial settings to get reproducible results. When training and testing on different GPU hardware than our settings we cannot guarantee equal results.

Licensing

This project is licensed under the EUPL license.

Acknowledgments

The research leading to the development of this dataset was supported by the Bundesministerium fr Bildung und Forschung (BMBF) through the WestAI and BnTrAInee projects. The authors express their gratitude to the Gauss Centre for Supercomputing e.V. for funding the project and providing computing resources through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS at Jlich Supercomputing Centre (JSC).

Citation

If you use this work in a scientific context, please cite the following: @article{ gasenzer2024towards, title={Towards generalizing deep-audio fake detection networks}, author={Konstantin Gasenzer and Moritz Wolter}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=RGewtLtvHz}, note={} }

Owner

  • Name: GAN Police
  • Login: gan-police
  • Kind: organization

whoop whoop it's the sound of the GAN police. Keepin' yo world safe from deepfakes

GitHub Events

Total
  • Watch event: 1
  • Fork event: 2
Last Year
  • Watch event: 1
  • Fork event: 2

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 34
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.03
  • Merged pull requests: 33
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 27
  • Average time to close issues: N/A
  • Average time to close pull requests: 7 minutes
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 27
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • kgasenzer (33)
  • v0lta (4)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/test.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
docs/requirements.txt pypi
  • myst_parser *
  • sphinx *
  • sphinx-rtd-theme *
  • sphinx_markdown_builder *
pyproject.toml pypi
  • captum *
  • librosa *
  • matplotlib ==3.7.4
  • numpy *
  • optuna *
  • pandas *
  • ptwt *
  • pywavelets *
  • ssqueezepy *
  • tensorboard *
  • tikzplotlib ==0.10.1
  • torch ==2.0.0
  • torch-summary *
  • torchaudio ==2.0.0
  • torchmetrics *
  • torchvision ==0.15.1
  • tox *
  • tqdm *
requirements.txt pypi
  • captum *
  • librosa *
  • matplotlib ==3.7.4
  • numpy *
  • optuna *
  • pandas *
  • ptwt *
  • pywavelets *
  • ssqueezepy *
  • tensorboard *
  • tikzplotlib ==0.10.1
  • torch ==2.0.0
  • torch-summary *
  • torchaudio ==2.0.0
  • torchmetrics *
  • torchvision ==0.15.1
  • tqdm *