avdeepfake1m

[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

https://github.com/controlnet/av-deepfake1m

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, acm.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

Basic Info

Host: GitHub
Owner: ControlNet
License: other
Language: Python
Default Branch: master
Homepage: https://dl.acm.org/doi/10.1145/3664647.3680795
Size: 674 KB

Statistics

Stars: 145
Watchers: 7
Forks: 10
Open Issues: 1
Releases: 0

Created over 2 years ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

AV-Deepfake1M

This is the official repository for the paper AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset (Best Award).

News

[2025/08/03] 🔥 AV-Deepfake1M++ technical report is in arXiv.
[2025/04/08] 🏆 2025 1M-Deepfakes Challenge starts. New dataset (2M videos!) - AV-Deepfake1M++ is released.
[2024/10/13] 🚀 PYPI package is released.
[2024/07/15] 🔥 AV-Deepfake1M paper is accepted in MM 2024.
[2024/03/09] 🏆 2024 1M-Deepfakes Challenge starts.

Abstract

The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets. The proposed dataset will play a vital role in building the next-generation deepfake localization methods.

https://github.com/user-attachments/assets/d91aee8a-0fb5-4dff-ba20-86420332fed5

Dataset

Download

We're hosting 1M-Deepfakes Detection Challenge at ACM MM 2024.

Baseline Benchmark

| Method | AP@0.5 | AP@0.75 | AP@0.9 | AP@0.95 | AR@50 | AR@20 | AR@10 | AR@5 | |----------------------------|--------|---------|--------|---------|-------|-------|-------|-------| | PyAnnote | 00.03 | 00.00 | 00.00 | 00.00 | 00.67 | 00.67 | 00.67 | 00.67 | | Meso4 | 09.86 | 06.05 | 02.22 | 00.59 | 38.92 | 38.81 | 36.47 | 26.91 | | MesoInception4 | 08.50 | 05.16 | 01.89 | 00.50 | 39.27 | 39.00 | 35.78 | 24.59 | | EfficientViT | 14.71 | 02.42 | 00.13 | 00.01 | 27.04 | 26.43 | 23.90 | 20.31 | | TriDet + VideoMAEv2 | 21.67 | 05.83 | 00.54 | 00.06 | 20.27 | 20.12 | 19.50 | 18.18 | | TriDet + InternVideo | 29.66 | 09.02 | 00.79 | 00.09 | 24.08 | 23.96 | 23.50 | 22.55 | | ActionFormer + VideoMAEv2 | 20.24 | 05.73 | 00.57 | 00.07 | 19.97 | 19.81 | 19.11 | 17.80 | | ActionFormer + InternVideo | 36.08 | 12.01 | 01.23 | 00.16 | 27.11 | 27.00 | 26.60 | 25.80 | | BA-TFD | 37.37 | 06.34 | 00.19 | 00.02 | 45.55 | 35.95 | 30.66 | 26.82 | | BA-TFD+ | 44.42 | 13.64 | 00.48 | 00.03 | 48.86 | 40.37 | 34.67 | 29.88 | | UMMAFormer | 51.64 | 28.07 | 07.65 | 01.58 | 44.07 | 43.45 | 42.09 | 40.27 |

Metadata Structure

The metadata is a json file for each subset (train, val), which is a list of dictionaries. The fields in the dictionary are as follows. - file: the path to the video file. - original: if the current video is fake, the path to the original video; otherwise, the original path in VoxCeleb2. - split: the name of the current subset. - modifytype: the type of modifications in different modalities, which can be ["real", "visualmodified", "audiomodified", "bothmodified"]. We evaluate the deepfake detection performance based on this field. - audiomodel: the audio generation model used for generating this video. - fakesegments: the timestamps of the fake segments. We evaluate the temporal localization performance based on this field. - audiofakesegments: the timestamps of the fake segments in audio modality. - visualfakesegments: the timestamps of the fake segments in visual modality. - videoframes: the number of frames in the video. - audioframes: the number of frames in the audio.

SDK

We provide a Python library avdeepfake1m to load the dataset and evaluation.

Installation

bash pip install avdeepfake1m

Usage

Prepare the dataset as follows.

|- train_metadata.json |- train_metadata | |- ... |- train | |- ... |- val_metadata.json |- val_metadata | |- ... |- val | |- ... |- test_files.txt |- test

Load the dataset.

```python from avdeepfake1m.loader import AVDeepfake1mDataModule

access to Lightning DataModule

dm = AVDeepfake1mDataModule("/path/to/dataset") ```

Evaluate the predictions. Firstly prepare the predictions as described in the details. Then run the following code.

python from avdeepfake1m.evaluation import ap_ar_1d, auc print(ap_ar_1d("<PREDICTION_JSON>", "<METADATA_JSON>", "file", "fake_segments", 1, [0.5, 0.75, 0.9, 0.95], [50, 30, 20, 10, 5], [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95])) print(auc("<PREDICTION_TXT>", "<METADATA_JSON>", "file", "fake_segments"))

License

The dataset is under the EULA. You need to agree and sign the EULA to access the dataset.

The baseline Xception code /examples/xception is under MIT Licence. The BA-TFD/BA-TFD+ code /examples/batfd is from ControlNet/LAV-DF under CC BY-NC 4.0 Licence.

The other parts of this project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

If you find this work useful in your research, please cite it.

The AV-Deepfake1M++ dataset paper: bibtex @article{cai2025av, title={AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations}, author={Cai, Zhixi and Kuckreja, Kartik and Ghosh, Shreya and Chuchra, Akanksha and Khan, Muhammad Haris and Tariq, Usman and Gedeon, Tom and Dhall, Abhinav}, journal={arXiv preprint arXiv:2507.20579}, year={2025} }

The AV-Deepfake1M dataset paper: bibtex @inproceedings{cai2024av, title={AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset}, author={Cai, Zhixi and Ghosh, Shreya and Adatia, Aman Pankaj and Hayat, Munawar and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin}, booktitle={Proceedings of the 32nd ACM International Conference on Multimedia}, pages={7414--7423}, year={2024}, doi={10.1145/3664647.3680795} }

The challenge summary paper: bibtex @inproceedings{cai20241m, title={1M-Deepfakes Detection Challenge}, author={Cai, Zhixi and Dhall, Abhinav and Ghosh, Shreya and Hayat, Munawar and Kollias, Dimitrios and Stefanov, Kalin and Tariq, Usman}, booktitle={Proceedings of the 32nd ACM International Conference on Multimedia}, pages={11355--11359}, year={2024}, doi={10.1145/3664647.3689145} }

Owner

Name: ControlNet
Login: ControlNet
Kind: user

Website: controlnet.space
Repositories: 30
Profile: https://github.com/ControlNet

Study on: Computer Vision | Artificial Intelligence

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
  type: conference-paper
  title: "AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset"
  authors:
  - family-names: "Cai"
    given-names: "Zhixi"
  - family-names: "Ghosh"
    given-names: "Shreya"
  - family-names: "Adatia"
    given-names: "Aman Pankaj"
  - family-names: "Hayat"
    given-names: "Munawar"
  - family-names: "Dhall"
    given-names: "Abhinav"
  - family-names: "Stefanov"
    given-names: "Kalin"
  collection-title: "Proceedings of the 32nd ACM International Conference on Multimedia"
  year: 2023
  location:
    name: "Melbourne, Australia"
  start: 7414
  end: 7423
  doi: "10.1145/3664647.3680795"

GitHub Events

Total

Issues event: 22
Watch event: 63
Delete event: 1
Issue comment event: 22
Push event: 19
Pull request event: 2
Fork event: 6
Create event: 4

Last Year

Issues event: 22
Watch event: 63
Delete event: 1
Issue comment event: 22
Push event: 19
Pull request event: 2
Fork event: 6
Create event: 4

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 9
Total pull requests: 1
Average time to close issues: 27 days
Average time to close pull requests: 7 months
Total issue authors: 9
Total pull request authors: 1
Average comments per issue: 1.44
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 9
Pull requests: 1
Average time to close issues: 27 days
Average time to close pull requests: 7 months
Issue authors: 9
Pull request authors: 1
Average comments per issue: 1.44
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

JoeLeelyf (2)
chuxiuhong (1)
JianjiaGuan (1)
isjwdu (1)
XuecWu (1)
Wsen-Jiang (1)
MKlmt (1)
2log2n (1)
pplrabbit (1)
lds217 (1)
jorvredeveld (1)
zhangxin-xd (1)
17Skye17 (1)
NHeLv1 (1)
wangzhiyuan120 (1)

Pull Request Authors

haotianll (1)
mst-rajatmishra (1)
garlic1234567 (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 2,253 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

pypi.org: avdeepfake1m

Homepage: https://github.com/ControlNet/AV-Deepfake1M
Documentation: https://avdeepfake1m.readthedocs.io/
License: Other/Proprietary License
Latest release: 0.0.4
published about 1 year ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 2,253 Last month

Rankings

Dependent packages count: 10.2%

Average: 33.8%

Dependent repos count: 57.5%

Maintainers (1)

ControlNet

Last synced: 12 months ago