disinfomm
A multilingual and multimodal disinformation dataset with 5-level veracity labels and supportive evidence. Released with our ICMI 2025 paper.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.0%) to scientific vocabulary
Repository
A multilingual and multimodal disinformation dataset with 5-level veracity labels and supportive evidence. Released with our ICMI 2025 paper.
Basic Info
- Host: GitHub
- Owner: SaiSyokan
- License: other
- Default Branch: main
- Size: 13.7 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
DisinfoMM
DisinfoMM is a multilingual and multimodal dataset for disinformation and out-of-context (OOC) detection, released with our ICMI 2025 paper. The dataset includes fact-checked claims with paired images and multilingual evidence, supporting five-level veracity classification and evidence-aware model training.
📦 Dataset Overview
- Languages: English, Italian, Portuguese
- Modalities: Text (claim), Image
- Veracity Labels:
True,Mostly True,Incomplete,Mostly False,False - Supportive Information:
- Explanation texts
- Debunking source metadata (verdict, timestamp, keywords, article links)
- External links and multilingual evidence
🧪 Tasks Supported
- Multimodal disinformation detection
- Out-of-context image–text analysis
- Evidence-aware veracity classification
- Multilingual generalization benchmarking
🛠️ Repository Structure
bash
DisinfoMM/
├── data/ # (Coming soon) Dataset files, organized by language or split
├── code/ # (Planned) Baseline models and preprocessing scripts
├── LICENSE
└── README.md
🚧 Status
This repository is under preparation. The dataset and code will be released soon.
Please ⭐️ star or watch the repo to stay updated.
📬 Contact
If you have questions or would like early access, feel free to contact:
Shuhan Cui
The University of Tokyo
📧 syokan [at] g.ecc.u-tokyo.ac.jp
Owner
- Login: SaiSyokan
- Kind: user
- Repositories: 1
- Profile: https://github.com/SaiSyokan
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this dataset, please cite the following paper."
title: "A Multilingual, Multimodal Dataset for Disinformation and Out-of-Context Analysis with Rich Supportive Information"
authors:
- family-names: Cui
given-names: Shuhan
affiliation: The University of Tokyo, Tokyo, Japan
- family-names: Wang
given-names: Hanrui
affiliation: National Institute of Informatics, Tokyo, Japan
- family-names: Chang
given-names: Ching-Chun
affiliation: National Institute of Informatics, Tokyo, Japan
- family-names: Nguyen
given-names: Huy H.
affiliation: National Institute of Informatics, Tokyo, Japan
- family-names: Echizen
given-names: Isao
affiliation: National Institute of Informatics, Tokyo, Japan
date-released: 2025-10-13
version: "1.0"
doi: 10.1145/3716553.3750813
url: https://github.com/SaiSyokan/DisinfoMM
conference:
name: "Proceedings of the 27th International Conference on Multimodal Interaction (ICMI '25)"
location: "Canberra, ACT, Australia"
dates: "October 13–17, 2025"
isbn: "979-8-4007-1499-3/2025/10"
license: CC-BY-4.0
type: dataset
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3