disinfomm

A multilingual and multimodal disinformation dataset with 5-level veracity labels and supportive evidence. Released with our ICMI 2025 paper.

https://github.com/saisyokan/disinfomm

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.0%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

A multilingual and multimodal disinformation dataset with 5-level veracity labels and supportive evidence. Released with our ICMI 2025 paper.

Basic Info

Host: GitHub
Owner: SaiSyokan
License: other
Default Branch: main
Size: 13.7 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 7 months ago · Last pushed 7 months ago

Metadata Files

Readme License Citation

DisinfoMM

DisinfoMM is a multilingual and multimodal dataset for disinformation and out-of-context (OOC) detection, released with our ICMI 2025 paper. The dataset includes fact-checked claims with paired images and multilingual evidence, supporting five-level veracity classification and evidence-aware model training.

📄 Paper: A Multilingual, Multimodal Dataset for Disinformation and Out-of-Context Analysis with Rich Supportive Information (ICMI 2025)

📦 Dataset Overview

Languages: English, Italian, Portuguese
Modalities: Text (claim), Image
Veracity Labels: True, Mostly True, Incomplete, Mostly False, False
Supportive Information:
- Explanation texts
- Debunking source metadata (verdict, timestamp, keywords, article links)
- External links and multilingual evidence

🧪 Tasks Supported

Multimodal disinformation detection
Out-of-context image–text analysis
Evidence-aware veracity classification
Multilingual generalization benchmarking

🛠️ Repository Structure

bash DisinfoMM/ ├── data/ # (Coming soon) Dataset files, organized by language or split ├── code/ # (Planned) Baseline models and preprocessing scripts ├── LICENSE └── README.md

🚧 Status

This repository is under preparation. The dataset and code will be released soon.

Please ⭐️ star or watch the repo to stay updated.

📬 Contact

If you have questions or would like early access, feel free to contact:

Shuhan Cui
The University of Tokyo
📧 syokan [at] g.ecc.u-tokyo.ac.jp

Owner

Login: SaiSyokan
Kind: user

Repositories: 1
Profile: https://github.com/SaiSyokan

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset, please cite the following paper."
title: "A Multilingual, Multimodal Dataset for Disinformation and Out-of-Context Analysis with Rich Supportive Information"
authors:
  - family-names: Cui
    given-names: Shuhan
    affiliation: The University of Tokyo, Tokyo, Japan
  - family-names: Wang
    given-names: Hanrui
    affiliation: National Institute of Informatics, Tokyo, Japan
  - family-names: Chang
    given-names: Ching-Chun
    affiliation: National Institute of Informatics, Tokyo, Japan
  - family-names: Nguyen
    given-names: Huy H.
    affiliation: National Institute of Informatics, Tokyo, Japan
  - family-names: Echizen
    given-names: Isao
    affiliation: National Institute of Informatics, Tokyo, Japan
date-released: 2025-10-13
version: "1.0"
doi: 10.1145/3716553.3750813
url: https://github.com/SaiSyokan/DisinfoMM
conference:
  name: "Proceedings of the 27th International Conference on Multimodal Interaction (ICMI '25)"
  location: "Canberra, ACT, Australia"
  dates: "October 13–17, 2025"
isbn: "979-8-4007-1499-3/2025/10"
license: CC-BY-4.0
type: dataset

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science