disinfomm

A multilingual and multimodal disinformation dataset with 5-level veracity labels and supportive evidence. Released with our ICMI 2025 paper.

https://github.com/saisyokan/disinfomm

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A multilingual and multimodal disinformation dataset with 5-level veracity labels and supportive evidence. Released with our ICMI 2025 paper.

Basic Info
  • Host: GitHub
  • Owner: SaiSyokan
  • License: other
  • Default Branch: main
  • Size: 13.7 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

DisinfoMM

License: CC BY 4.0

DisinfoMM is a multilingual and multimodal dataset for disinformation and out-of-context (OOC) detection, released with our ICMI 2025 paper. The dataset includes fact-checked claims with paired images and multilingual evidence, supporting five-level veracity classification and evidence-aware model training.

📄 Paper: A Multilingual, Multimodal Dataset for Disinformation and Out-of-Context Analysis with Rich Supportive Information (ICMI 2025)


📦 Dataset Overview

  • Languages: English, Italian, Portuguese
  • Modalities: Text (claim), Image
  • Veracity Labels: True, Mostly True, Incomplete, Mostly False, False
  • Supportive Information:
    • Explanation texts
    • Debunking source metadata (verdict, timestamp, keywords, article links)
    • External links and multilingual evidence

🧪 Tasks Supported

  • Multimodal disinformation detection
  • Out-of-context image–text analysis
  • Evidence-aware veracity classification
  • Multilingual generalization benchmarking

🛠️ Repository Structure

bash DisinfoMM/ ├── data/ # (Coming soon) Dataset files, organized by language or split ├── code/ # (Planned) Baseline models and preprocessing scripts ├── LICENSE └── README.md

🚧 Status

This repository is under preparation. The dataset and code will be released soon.

Please ⭐️ star or watch the repo to stay updated.


📬 Contact

If you have questions or would like early access, feel free to contact:

Shuhan Cui
The University of Tokyo
📧 syokan [at] g.ecc.u-tokyo.ac.jp

Owner

  • Login: SaiSyokan
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset, please cite the following paper."
title: "A Multilingual, Multimodal Dataset for Disinformation and Out-of-Context Analysis with Rich Supportive Information"
authors:
  - family-names: Cui
    given-names: Shuhan
    affiliation: The University of Tokyo, Tokyo, Japan
  - family-names: Wang
    given-names: Hanrui
    affiliation: National Institute of Informatics, Tokyo, Japan
  - family-names: Chang
    given-names: Ching-Chun
    affiliation: National Institute of Informatics, Tokyo, Japan
  - family-names: Nguyen
    given-names: Huy H.
    affiliation: National Institute of Informatics, Tokyo, Japan
  - family-names: Echizen
    given-names: Isao
    affiliation: National Institute of Informatics, Tokyo, Japan
date-released: 2025-10-13
version: "1.0"
doi: 10.1145/3716553.3750813
url: https://github.com/SaiSyokan/DisinfoMM
conference:
  name: "Proceedings of the 27th International Conference on Multimodal Interaction (ICMI '25)"
  location: "Canberra, ACT, Australia"
  dates: "October 13–17, 2025"
isbn: "979-8-4007-1499-3/2025/10"
license: CC-BY-4.0
type: dataset

GitHub Events

Total
  • Push event: 3
Last Year
  • Push event: 3