https://github.com/alimzade/synthetic_data_efficacy

Exploring the use of synthetic data (via DDPM) in MRI-based brain tumor classification. Includes data quality evaluation (FID, IS) and classification with a modified VGG-19 CNN across four tumor classes.

https://github.com/alimzade/synthetic_data_efficacy

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Exploring the use of synthetic data (via DDPM) in MRI-based brain tumor classification. Includes data quality evaluation (FID, IS) and classification with a modified VGG-19 CNN across four tumor classes.

Basic Info
  • Host: GitHub
  • Owner: Alimzade
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 299 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

Exploring the Efficacy of Synthetic Data in MRI-Based Brain Tumor Classification

Abstract

Deep learning has advanced medical image classification but remains heavily reliant on large, diverse datasets, posing ethical and practical challenges. This study investigates the role of synthetic data, generated using Denoising Diffusion Probabilistic Models (DDPM), to address data scarcity in brain tumor MRI classification. Synthetic images were evaluated for fidelity and diversity using Frechet Inception Distance (FID) and Inception Scores (IS), demonstrating high quality for specific classes. A modified VGG-19 CNN classified MRIs into glioma, meningioma, pituitary, and no tumor classes. Experiments with varying real-to-synthetic data ratios revealed that synthetic data can enhance precision and recall for certain classes, though often at the cost of accuracy and generalization. Performance peaked at specific ratios, indicating an optimal balance between real and synthetic data. Fine-tuning with combined datasets improved metrics for underrepresented classes but yielded results comparable to models trained solely on real data. These findings underscore the potential of synthetic data to augment medical imaging datasets and address data scarcity while emphasizing the importance of balanced integration. Future research should focus on validating synthetic data through expert review, refining its quality, and testing its applicability across diverse datasets.

Contributors

  • Meher Aisha - Carried out introductory research and background study.
  • Emily Vorderwlbeke - Conducted data analysis and preprocessing.
  • Anar Alimzade - Developed and implemented generative and classification models for all experiments.
  • Hadi Sulaiman - Performed evaluation and interpretation of results.

Details

This study was conducted as part of the "Data Science Lab" course at the University of Passau. It investigates the effectiveness of synthetic data augmentation in the classification of brain tumor types using the Brain Tumor MRI Dataset.

project_ddpm

Owner

  • Login: Alimzade
  • Kind: user

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2