Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: UeenHuynh
  • License: bsd-2-clause
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 60.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Mammogram Deep Learning Pipeline for Breast Cancer Detection

A comprehensive deep learning pipeline for mammogram classification supporting multiple datasets, model architectures, and advanced training techniques for breast cancer detection research.

🔬 Project Overview

This project implements a robust mammogram classification system designed for breast cancer detection using state-of-the-art deep learning techniques. The pipeline supports:

  • Binary Classification: Benign vs. Malignant detection
  • Multi-class Classification: Normal, Benign, Malignant classification
  • ROI Processing: Region of Interest extraction and analysis
  • Transfer Learning: Pre-trained models with two-phase training
  • Data Augmentation: Advanced augmentation techniques including elastic transforms, mixup, and cutmix
  • Mixed Precision Training: Optimized training with FP16 support

📊 Supported Datasets

| Dataset | Type | Classes | Image Size | Description | |---------|------|---------|------------|-------------| | mini-MIAS | Multi-class | Normal, Benign, Malignant | 1024×1024 | Mammographic Image Analysis Society database | | mini-MIAS-binary | Binary | Benign, Malignant | 1024×1024 | Binary version of mini-MIAS | | CBIS-DDSM | Binary | Benign, Malignant | 512×512 | Curated Breast Imaging Subset of DDSM | | CMMD | Binary | Benign, Malignant | 224×224 | Chinese Mammography Database | | INbreast | Multi-class | Normal, Benign, Malignant | 224×224 | INbreast database with BI-RADS classification |

Dataset-Specific Features

  • CBIS-DDSM: Supports mammogram type filtering (calc, mass, all)
  • INbreast: BI-RADS mapping with automatic pectoral muscle removal
  • CMMD: Optimized for binary classification with balanced augmentation

🏗️ Model Architectures

Available Models

| Model | Type | Input Size | Description | |-------|------|------------|-------------| | CNN | Custom | Variable | Custom CNN architecture | | VGG | Pre-trained | 224×224 or 1024×1024 | VGG19 with transfer learning | | VGG-common | Pre-trained | 224×224 | Standard VGG19 configuration | | ResNet | Pre-trained | 224×224 | ResNet50 architecture | | Inception | Pre-trained | 224×224 | InceptionV3 model | | DenseNet | Pre-trained | 224×224 | DenseNet121 architecture | | MobileNet | Pre-trained | 224×224 | MobileNetV2 for efficient inference |

Model-Specific Configurations

  • Custom CNN: Adapts to input data dimensions automatically
  • Pre-trained Models: Support two-phase training (frozen → unfrozen layers)
  • Input Channels: 3 channels for pre-trained models, 1 channel for custom CNN

🛠️ Installation

Requirements

bash pip install tensorflow>=2.8.0 pip install scikit-learn pip install opencv-python pip install pandas pip install numpy pip install matplotlib pip install pydicom pip install tensorflow-io

Project Structure

mammogram-pipeline/ ├── src/ │ ├── main.py # Main training script │ ├── main2.py # INbreast-optimized script │ ├── mainruncmmd.py # CMMD-optimized script │ ├── config.py # Configuration parameters │ ├── cnn_models/ # Model architectures │ ├── data_operations/ # Data preprocessing │ └── utils.py # Utility functions ├── data/ # Dataset directory ├── saved_models/ # Trained model storage └── output/ # Results and visualizations

🚀 Usage Examples

Basic Training Commands

CMMD Dataset with MobileNet

bash python src/main.py -d CMMD -m MobileNet -r train -b 8 -lr 1e-3 -e1 50 -e2 50

mini-MIAS with VGG

bash python src/main.py -d mini-MIAS -m VGG -r train -b 4 -lr 1e-4 -e1 100 -e2 50

CBIS-DDSM with ResNet (Mass only)

bash python src/main.py -d CBIS-DDSM -mt mass -m ResNet -r train -b 2 -lr 1e-3

INbreast with Pectoral Muscle Removal

bash python src/main2.py -d INbreast -m DenseNet -r train --remove_pectoral -b 4

Testing Pre-trained Models

```bash

Test CMMD model

python src/main.py -d CMMD -m MobileNet -r test

Test with ROI processing

python src/main.py -d INbreast -m VGG -r test --roi ```

Advanced Training Options

```bash

With data augmentation and mixed precision

python src/main2.py -d INbreast -m MobileNet -r train \ --applyelastic --applymixup --remove_pectoral -b 8 -lr 1e-3

Custom CNN with specific parameters

python src/main.py -d mini-MIAS -m CNN -r train -b 16 -lr 1e-2 -e1 200 ```

⚙️ Configuration Parameters

Key Parameters in config.py

| Parameter | Default | Description | |-----------|---------|-------------| | batch_size | 8 | Training batch size | | learning_rate | 1e-3 | Initial learning rate | | early_stopping_patience | 10 | Epochs before early stopping | | reduce_lr_patience | 5 | Epochs before LR reduction | | reduce_lr_factor | 0.5 | LR reduction factor | | min_learning_rate | 1e-6 | Minimum learning rate | | augment_data | True | Enable data augmentation |

Dataset-Specific Image Sizes

python MINI_MIAS_IMG_SIZE = {"HEIGHT": 1024, "WIDTH": 1024} CMMD_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224} INBREAST_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224} VGG_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224} MOBILE_NET_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224}

BI-RADS Mapping (INbreast)

python BI_RADS_MAPPING = { "Normal": ["BI-RADS 1"], "Benign": ["BI-RADS 2", "BI-RADS 3"], "Malignant": ["BI-RADS 4a", "BI-RADS 4b", "BI-RADS 4c", "BI-RADS 5", "BI-RADS 6"] }

📁 Main Scripts Overview

main.py - General Purpose Script

  • Supports all datasets and models
  • Standard training pipeline
  • Suitable for most experiments

main2.py - INbreast Optimized

  • Specialized for INbreast dataset
  • Automatic pectoral muscle removal
  • Mixed precision training
  • Advanced augmentation options

mainruncmmd.py - CMMD Optimized

  • Optimized for CMMD dataset
  • Efficient binary classification
  • Streamlined preprocessing

🎯 Advanced Features

ROI Processing

```bash

Enable ROI extraction

python src/main.py -d INbreast -m VGG --roi ```

Two-Phase Training

  1. Phase 1: Frozen pre-trained layers (max_epoch_frozen)
  2. Phase 2: Unfrozen fine-tuning (max_epoch_unfrozen)

Data Augmentation

  • Rotation, scaling, shearing
  • Elastic deformation
  • Mixup and CutMix techniques
  • CLAHE enhancement

Mixed Precision Training

Automatically enabled in main2.py for faster training and reduced memory usage.

🏋️ Training Process

Standard Workflow

  1. Data Loading: Dataset-specific preprocessing
  2. Model Creation: Architecture selection and compilation
  3. Phase 1 Training: Frozen pre-trained layers
  4. Phase 2 Training: Fine-tuning all layers
  5. Evaluation: Performance metrics and visualization

Early Stopping & LR Scheduling

  • Monitor validation loss for early stopping
  • Reduce learning rate on plateau
  • Save best model weights automatically

Class Weight Handling

Automatic class weight calculation for imbalanced datasets to improve minority class performance.

📊 Model Evaluation

The pipeline provides comprehensive evaluation including: - Accuracy and loss curves - Confusion matrices - Classification reports - ROC curves and AUC scores - Grad-CAM visualizations (where applicable)

🔬 Research Applications

This pipeline has been designed for: - Comparative studies of deep learning architectures - Transfer learning effectiveness analysis - Data augmentation impact assessment - ROI vs. full image classification comparison - Multi-dataset generalization studies

📚 Citation

If you use this pipeline in your research, please cite:

bibtex @software{mammogram_dl_pipeline, title = {Mammogram Deep Learning Pipeline for Breast Cancer Detection}, abstract = {Breast cancer claims 11,400 lives on average every year in the UK, making it one of the deadliest diseases. This pipeline explores various deep learning techniques for mammogram classification using CNNs with transfer learning approaches.}, license = {BSD-2-Clause}, year = {2024} }

📄 License

This project is licensed under the BSD-2-Clause License.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests to improve the pipeline.

📞 Support

For questions or issues, please refer to the documentation or create an issue in the repository.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Adamouization/Breast-Cancer-Detection-Mammogram-Deep-Learning-Publication:
  PLOS ONE Submission
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Adam
    family-names: Jaamour
    email: a.jaamour@bath.edu
    orcid: 'https://orcid.org/0000-0002-8298-1302'
    affiliation: University of St Andrews
  - given-names: Craig
    family-names: Myles
    affiliation: University of St Andrews
    orcid: 'https://orcid.org/0000-0002-2701-3149'
identifiers:
  - type: doi
    value: 10.5281/zenodo.7980706
repository-code: >-
  https://github.com/Adamouization/Breast-Cancer-Detection-Mammogram-Deep-Learning-Publication
url: >-
  https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280841
abstract: >-
  Breast cancer claims 11,400 lives on average every year in
  the UK, making it one of the deadliest diseases.
  Mammography is the gold standard for detecting early signs
  of breast cancer, which can help cure the disease during
  its early stages. However, incorrect mammography diagnoses
  are common and may harm patients through unnecessary
  treatments and operations (or a lack of treatment).
  Therefore, systems that can learn to detect breast cancer
  on their own could help reduce the number of incorrect
  interpretations and missed cases. Various deep learning
  techniques, which can be used to implement a system that
  learns how to detect instances of breast cancer in
  mammograms, are explored throughout this paper.
  Convolution Neural Networks (CNNs) are used as part of a
  pipeline based on deep learning techniques. A divide and
  conquer approach is followed to analyse the effects on
  performance and efficiency when utilising diverse deep
  learning techniques such as varying network architectures
  (VGG19, ResNet50, InceptionV3, DenseNet121, MobileNetV2),
  class weights, input sizes, image ratios, pre-processing
  techniques, transfer learning, dropout rates, and types of
  mammogram projections. This approach serves as a starting
  point for model development of mammography classification
  tasks. Practitioners can benefit from this work by using
  the divide and conquer results to select the most suitable
  deep learning techniques for their case out-of-the-box,
  thus reducing the need for extensive exploratory
  experimentation. Multiple techniques are found to provide
  accuracy gains relative to a general baseline (VGG19 model
  using uncropped 512 × 512 pixels input images with a
  dropout rate of 0.2 and a learning rate of 1 × 10−3) on
  the Curated Breast Imaging Subset of DDSM (CBIS-DDSM)
  dataset. These techniques involve transfer learning
  pre-trained ImagetNet weights to a MobileNetV2
  architecture, with pre-trained weights from a binarised
  version of the mini Mammography Image Analysis Society
  (mini-MIAS) dataset applied to the fully connected layers
  of the model, coupled with using weights to alleviate
  class imbalance, and splitting CBIS-DDSM samples between
  images of masses and calcifications. Using these
  techniques, a 5.6% gain in accuracy over the baseline
  model was accomplished. Other deep learning techniques
  from the divide and conquer approach, such as larger image
  sizes, do not yield increased accuracies without the use
  of image pre-processing techniques such as Gaussian
  filtering, histogram equalisation and input cropping. S
keywords:
  - machine-learning
  - deep-learning
  - convolutional-neural-network
  - cnn
  - breast-cancer-detection
  - mammogram-classification
  - plos-one
license: BSD-2-Clause
commit: bc82a51cf1105d6bd24a9c35928d7f625eb456ef
version: '1.2'
date-released: '2023-05-29'

GitHub Events

Total
  • Push event: 304
  • Public event: 1
Last Year
  • Push event: 304
  • Public event: 1