avdeepfake1m

[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

https://github.com/controlnet/av-deepfake1m

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

Basic Info
Statistics
  • Stars: 145
  • Watchers: 7
  • Forks: 10
  • Open Issues: 1
  • Releases: 0
Created over 2 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

AV-Deepfake1M

This is the official repository for the paper AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset (Best Award).

News

Abstract

The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets. The proposed dataset will play a vital role in building the next-generation deepfake localization methods.

https://github.com/user-attachments/assets/d91aee8a-0fb5-4dff-ba20-86420332fed5

Dataset

Download

We're hosting 1M-Deepfakes Detection Challenge at ACM MM 2024.

Baseline Benchmark

| Method | AP@0.5 | AP@0.75 | AP@0.9 | AP@0.95 | AR@50 | AR@20 | AR@10 | AR@5 | |----------------------------|--------|---------|--------|---------|-------|-------|-------|-------| | PyAnnote | 00.03 | 00.00 | 00.00 | 00.00 | 00.67 | 00.67 | 00.67 | 00.67 | | Meso4 | 09.86 | 06.05 | 02.22 | 00.59 | 38.92 | 38.81 | 36.47 | 26.91 | | MesoInception4 | 08.50 | 05.16 | 01.89 | 00.50 | 39.27 | 39.00 | 35.78 | 24.59 | | EfficientViT | 14.71 | 02.42 | 00.13 | 00.01 | 27.04 | 26.43 | 23.90 | 20.31 | | TriDet + VideoMAEv2 | 21.67 | 05.83 | 00.54 | 00.06 | 20.27 | 20.12 | 19.50 | 18.18 | | TriDet + InternVideo | 29.66 | 09.02 | 00.79 | 00.09 | 24.08 | 23.96 | 23.50 | 22.55 | | ActionFormer + VideoMAEv2 | 20.24 | 05.73 | 00.57 | 00.07 | 19.97 | 19.81 | 19.11 | 17.80 | | ActionFormer + InternVideo | 36.08 | 12.01 | 01.23 | 00.16 | 27.11 | 27.00 | 26.60 | 25.80 | | BA-TFD | 37.37 | 06.34 | 00.19 | 00.02 | 45.55 | 35.95 | 30.66 | 26.82 | | BA-TFD+ | 44.42 | 13.64 | 00.48 | 00.03 | 48.86 | 40.37 | 34.67 | 29.88 | | UMMAFormer | 51.64 | 28.07 | 07.65 | 01.58 | 44.07 | 43.45 | 42.09 | 40.27 |

Metadata Structure

The metadata is a json file for each subset (train, val), which is a list of dictionaries. The fields in the dictionary are as follows. - file: the path to the video file. - original: if the current video is fake, the path to the original video; otherwise, the original path in VoxCeleb2. - split: the name of the current subset. - modifytype: the type of modifications in different modalities, which can be ["real", "visualmodified", "audiomodified", "bothmodified"]. We evaluate the deepfake detection performance based on this field. - audiomodel: the audio generation model used for generating this video. - fakesegments: the timestamps of the fake segments. We evaluate the temporal localization performance based on this field. - audiofakesegments: the timestamps of the fake segments in audio modality. - visualfakesegments: the timestamps of the fake segments in visual modality. - videoframes: the number of frames in the video. - audioframes: the number of frames in the audio.

SDK

We provide a Python library avdeepfake1m to load the dataset and evaluation.

Installation

bash pip install avdeepfake1m

Usage

Prepare the dataset as follows.

|- train_metadata.json |- train_metadata | |- ... |- train | |- ... |- val_metadata.json |- val_metadata | |- ... |- val | |- ... |- test_files.txt |- test

Load the dataset.

```python from avdeepfake1m.loader import AVDeepfake1mDataModule

access to Lightning DataModule

dm = AVDeepfake1mDataModule("/path/to/dataset") ```

Evaluate the predictions. Firstly prepare the predictions as described in the details. Then run the following code.

python from avdeepfake1m.evaluation import ap_ar_1d, auc print(ap_ar_1d("<PREDICTION_JSON>", "<METADATA_JSON>", "file", "fake_segments", 1, [0.5, 0.75, 0.9, 0.95], [50, 30, 20, 10, 5], [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95])) print(auc("<PREDICTION_TXT>", "<METADATA_JSON>", "file", "fake_segments"))

License

The dataset is under the EULA. You need to agree and sign the EULA to access the dataset.

The baseline Xception code /examples/xception is under MIT Licence. The BA-TFD/BA-TFD+ code /examples/batfd is from ControlNet/LAV-DF under CC BY-NC 4.0 Licence.

The other parts of this project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

If you find this work useful in your research, please cite it.

The AV-Deepfake1M++ dataset paper: bibtex @article{cai2025av, title={AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations}, author={Cai, Zhixi and Kuckreja, Kartik and Ghosh, Shreya and Chuchra, Akanksha and Khan, Muhammad Haris and Tariq, Usman and Gedeon, Tom and Dhall, Abhinav}, journal={arXiv preprint arXiv:2507.20579}, year={2025} }

The AV-Deepfake1M dataset paper: bibtex @inproceedings{cai2024av, title={AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset}, author={Cai, Zhixi and Ghosh, Shreya and Adatia, Aman Pankaj and Hayat, Munawar and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin}, booktitle={Proceedings of the 32nd ACM International Conference on Multimedia}, pages={7414--7423}, year={2024}, doi={10.1145/3664647.3680795} }

The challenge summary paper: bibtex @inproceedings{cai20241m, title={1M-Deepfakes Detection Challenge}, author={Cai, Zhixi and Dhall, Abhinav and Ghosh, Shreya and Hayat, Munawar and Kollias, Dimitrios and Stefanov, Kalin and Tariq, Usman}, booktitle={Proceedings of the 32nd ACM International Conference on Multimedia}, pages={11355--11359}, year={2024}, doi={10.1145/3664647.3689145} }

Owner

  • Name: ControlNet
  • Login: ControlNet
  • Kind: user

Study on: Computer Vision | Artificial Intelligence

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
  type: conference-paper
  title: "AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset"
  authors:
  - family-names: "Cai"
    given-names: "Zhixi"
  - family-names: "Ghosh"
    given-names: "Shreya"
  - family-names: "Adatia"
    given-names: "Aman Pankaj"
  - family-names: "Hayat"
    given-names: "Munawar"
  - family-names: "Dhall"
    given-names: "Abhinav"
  - family-names: "Stefanov"
    given-names: "Kalin"
  collection-title: "Proceedings of the 32nd ACM International Conference on Multimedia"
  year: 2023
  location:
    name: "Melbourne, Australia"
  start: 7414
  end: 7423
  doi: "10.1145/3664647.3680795"

GitHub Events

Total
  • Issues event: 22
  • Watch event: 63
  • Delete event: 1
  • Issue comment event: 22
  • Push event: 19
  • Pull request event: 2
  • Fork event: 6
  • Create event: 4
Last Year
  • Issues event: 22
  • Watch event: 63
  • Delete event: 1
  • Issue comment event: 22
  • Push event: 19
  • Pull request event: 2
  • Fork event: 6
  • Create event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 9
  • Total pull requests: 1
  • Average time to close issues: 27 days
  • Average time to close pull requests: 7 months
  • Total issue authors: 9
  • Total pull request authors: 1
  • Average comments per issue: 1.44
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 1
  • Average time to close issues: 27 days
  • Average time to close pull requests: 7 months
  • Issue authors: 9
  • Pull request authors: 1
  • Average comments per issue: 1.44
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • JoeLeelyf (2)
  • chuxiuhong (1)
  • JianjiaGuan (1)
  • isjwdu (1)
  • XuecWu (1)
  • Wsen-Jiang (1)
  • MKlmt (1)
  • 2log2n (1)
  • pplrabbit (1)
  • lds217 (1)
  • jorvredeveld (1)
  • zhangxin-xd (1)
  • 17Skye17 (1)
  • NHeLv1 (1)
  • wangzhiyuan120 (1)
Pull Request Authors
  • haotianll (1)
  • mst-rajatmishra (1)
  • garlic1234567 (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 2,253 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
pypi.org: avdeepfake1m
  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,253 Last month
Rankings
Dependent packages count: 10.2%
Average: 33.8%
Dependent repos count: 57.5%
Maintainers (1)
Last synced: 7 months ago