avdeepfake1m
[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Repository
[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Basic Info
- Host: GitHub
- Owner: ControlNet
- License: other
- Language: Python
- Default Branch: master
- Homepage: https://dl.acm.org/doi/10.1145/3664647.3680795
- Size: 674 KB
Statistics
- Stars: 145
- Watchers: 7
- Forks: 10
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
AV-Deepfake1M
This is the official repository for the paper AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset (Best Award).
News
- [2025/08/03] 🔥 AV-Deepfake1M++ technical report is in arXiv.
- [2025/04/08] 🏆 2025 1M-Deepfakes Challenge starts. New dataset (2M videos!) - AV-Deepfake1M++ is released.
- [2024/10/13] 🚀 PYPI package is released.
- [2024/07/15] 🔥 AV-Deepfake1M paper is accepted in MM 2024.
- [2024/03/09] 🏆 2024 1M-Deepfakes Challenge starts.
Abstract
The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets. The proposed dataset will play a vital role in building the next-generation deepfake localization methods.
https://github.com/user-attachments/assets/d91aee8a-0fb5-4dff-ba20-86420332fed5
Dataset
Download
We're hosting 1M-Deepfakes Detection Challenge at ACM MM 2024.
Baseline Benchmark
| Method | AP@0.5 | AP@0.75 | AP@0.9 | AP@0.95 | AR@50 | AR@20 | AR@10 | AR@5 | |----------------------------|--------|---------|--------|---------|-------|-------|-------|-------| | PyAnnote | 00.03 | 00.00 | 00.00 | 00.00 | 00.67 | 00.67 | 00.67 | 00.67 | | Meso4 | 09.86 | 06.05 | 02.22 | 00.59 | 38.92 | 38.81 | 36.47 | 26.91 | | MesoInception4 | 08.50 | 05.16 | 01.89 | 00.50 | 39.27 | 39.00 | 35.78 | 24.59 | | EfficientViT | 14.71 | 02.42 | 00.13 | 00.01 | 27.04 | 26.43 | 23.90 | 20.31 | | TriDet + VideoMAEv2 | 21.67 | 05.83 | 00.54 | 00.06 | 20.27 | 20.12 | 19.50 | 18.18 | | TriDet + InternVideo | 29.66 | 09.02 | 00.79 | 00.09 | 24.08 | 23.96 | 23.50 | 22.55 | | ActionFormer + VideoMAEv2 | 20.24 | 05.73 | 00.57 | 00.07 | 19.97 | 19.81 | 19.11 | 17.80 | | ActionFormer + InternVideo | 36.08 | 12.01 | 01.23 | 00.16 | 27.11 | 27.00 | 26.60 | 25.80 | | BA-TFD | 37.37 | 06.34 | 00.19 | 00.02 | 45.55 | 35.95 | 30.66 | 26.82 | | BA-TFD+ | 44.42 | 13.64 | 00.48 | 00.03 | 48.86 | 40.37 | 34.67 | 29.88 | | UMMAFormer | 51.64 | 28.07 | 07.65 | 01.58 | 44.07 | 43.45 | 42.09 | 40.27 |
Metadata Structure
The metadata is a json file for each subset (train, val), which is a list of dictionaries. The fields in the dictionary are as follows. - file: the path to the video file. - original: if the current video is fake, the path to the original video; otherwise, the original path in VoxCeleb2. - split: the name of the current subset. - modifytype: the type of modifications in different modalities, which can be ["real", "visualmodified", "audiomodified", "bothmodified"]. We evaluate the deepfake detection performance based on this field. - audiomodel: the audio generation model used for generating this video. - fakesegments: the timestamps of the fake segments. We evaluate the temporal localization performance based on this field. - audiofakesegments: the timestamps of the fake segments in audio modality. - visualfakesegments: the timestamps of the fake segments in visual modality. - videoframes: the number of frames in the video. - audioframes: the number of frames in the audio.
SDK
We provide a Python library avdeepfake1m to load the dataset and evaluation.
Installation
bash
pip install avdeepfake1m
Usage
Prepare the dataset as follows.
|- train_metadata.json
|- train_metadata
| |- ...
|- train
| |- ...
|- val_metadata.json
|- val_metadata
| |- ...
|- val
| |- ...
|- test_files.txt
|- test
Load the dataset.
```python from avdeepfake1m.loader import AVDeepfake1mDataModule
access to Lightning DataModule
dm = AVDeepfake1mDataModule("/path/to/dataset") ```
Evaluate the predictions. Firstly prepare the predictions as described in the details. Then run the following code.
python
from avdeepfake1m.evaluation import ap_ar_1d, auc
print(ap_ar_1d("<PREDICTION_JSON>", "<METADATA_JSON>", "file", "fake_segments", 1, [0.5, 0.75, 0.9, 0.95], [50, 30, 20, 10, 5], [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))
print(auc("<PREDICTION_TXT>", "<METADATA_JSON>", "file", "fake_segments"))
License
The dataset is under the EULA. You need to agree and sign the EULA to access the dataset.
The baseline Xception code /examples/xception is under MIT Licence. The BA-TFD/BA-TFD+ code /examples/batfd is from ControlNet/LAV-DF under CC BY-NC 4.0 Licence.
The other parts of this project is under the CC BY-NC 4.0 license. See LICENSE for details.
References
If you find this work useful in your research, please cite it.
The AV-Deepfake1M++ dataset paper:
bibtex
@article{cai2025av,
title={AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations},
author={Cai, Zhixi and Kuckreja, Kartik and Ghosh, Shreya and Chuchra, Akanksha and Khan, Muhammad Haris and Tariq, Usman and Gedeon, Tom and Dhall, Abhinav},
journal={arXiv preprint arXiv:2507.20579},
year={2025}
}
The AV-Deepfake1M dataset paper:
bibtex
@inproceedings{cai2024av,
title={AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset},
author={Cai, Zhixi and Ghosh, Shreya and Adatia, Aman Pankaj and Hayat, Munawar and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={7414--7423},
year={2024},
doi={10.1145/3664647.3680795}
}
The challenge summary paper:
bibtex
@inproceedings{cai20241m,
title={1M-Deepfakes Detection Challenge},
author={Cai, Zhixi and Dhall, Abhinav and Ghosh, Shreya and Hayat, Munawar and Kollias, Dimitrios and Stefanov, Kalin and Tariq, Usman},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={11355--11359},
year={2024},
doi={10.1145/3664647.3689145}
}
Owner
- Name: ControlNet
- Login: ControlNet
- Kind: user
- Website: controlnet.space
- Repositories: 30
- Profile: https://github.com/ControlNet
Study on: Computer Vision | Artificial Intelligence
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you find this work useful in your research, please cite it."
preferred-citation:
type: conference-paper
title: "AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset"
authors:
- family-names: "Cai"
given-names: "Zhixi"
- family-names: "Ghosh"
given-names: "Shreya"
- family-names: "Adatia"
given-names: "Aman Pankaj"
- family-names: "Hayat"
given-names: "Munawar"
- family-names: "Dhall"
given-names: "Abhinav"
- family-names: "Stefanov"
given-names: "Kalin"
collection-title: "Proceedings of the 32nd ACM International Conference on Multimedia"
year: 2023
location:
name: "Melbourne, Australia"
start: 7414
end: 7423
doi: "10.1145/3664647.3680795"
GitHub Events
Total
- Issues event: 22
- Watch event: 63
- Delete event: 1
- Issue comment event: 22
- Push event: 19
- Pull request event: 2
- Fork event: 6
- Create event: 4
Last Year
- Issues event: 22
- Watch event: 63
- Delete event: 1
- Issue comment event: 22
- Push event: 19
- Pull request event: 2
- Fork event: 6
- Create event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 9
- Total pull requests: 1
- Average time to close issues: 27 days
- Average time to close pull requests: 7 months
- Total issue authors: 9
- Total pull request authors: 1
- Average comments per issue: 1.44
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 9
- Pull requests: 1
- Average time to close issues: 27 days
- Average time to close pull requests: 7 months
- Issue authors: 9
- Pull request authors: 1
- Average comments per issue: 1.44
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JoeLeelyf (2)
- chuxiuhong (1)
- JianjiaGuan (1)
- isjwdu (1)
- XuecWu (1)
- Wsen-Jiang (1)
- MKlmt (1)
- 2log2n (1)
- pplrabbit (1)
- lds217 (1)
- jorvredeveld (1)
- zhangxin-xd (1)
- 17Skye17 (1)
- NHeLv1 (1)
- wangzhiyuan120 (1)
Pull Request Authors
- haotianll (1)
- mst-rajatmishra (1)
- garlic1234567 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 2,253 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
pypi.org: avdeepfake1m
- Homepage: https://github.com/ControlNet/AV-Deepfake1M
- Documentation: https://avdeepfake1m.readthedocs.io/
- License: Other/Proprietary License
-
Latest release: 0.0.4
published 9 months ago