breastmammo
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: UeenHuynh
- License: bsd-2-clause
- Language: Jupyter Notebook
- Default Branch: main
- Size: 60.1 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Mammogram Deep Learning Pipeline for Breast Cancer Detection
A comprehensive deep learning pipeline for mammogram classification supporting multiple datasets, model architectures, and advanced training techniques for breast cancer detection research.
🔬 Project Overview
This project implements a robust mammogram classification system designed for breast cancer detection using state-of-the-art deep learning techniques. The pipeline supports:
- Binary Classification: Benign vs. Malignant detection
- Multi-class Classification: Normal, Benign, Malignant classification
- ROI Processing: Region of Interest extraction and analysis
- Transfer Learning: Pre-trained models with two-phase training
- Data Augmentation: Advanced augmentation techniques including elastic transforms, mixup, and cutmix
- Mixed Precision Training: Optimized training with FP16 support
📊 Supported Datasets
| Dataset | Type | Classes | Image Size | Description | |---------|------|---------|------------|-------------| | mini-MIAS | Multi-class | Normal, Benign, Malignant | 1024×1024 | Mammographic Image Analysis Society database | | mini-MIAS-binary | Binary | Benign, Malignant | 1024×1024 | Binary version of mini-MIAS | | CBIS-DDSM | Binary | Benign, Malignant | 512×512 | Curated Breast Imaging Subset of DDSM | | CMMD | Binary | Benign, Malignant | 224×224 | Chinese Mammography Database | | INbreast | Multi-class | Normal, Benign, Malignant | 224×224 | INbreast database with BI-RADS classification |
Dataset-Specific Features
- CBIS-DDSM: Supports mammogram type filtering (
calc,mass,all) - INbreast: BI-RADS mapping with automatic pectoral muscle removal
- CMMD: Optimized for binary classification with balanced augmentation
🏗️ Model Architectures
Available Models
| Model | Type | Input Size | Description | |-------|------|------------|-------------| | CNN | Custom | Variable | Custom CNN architecture | | VGG | Pre-trained | 224×224 or 1024×1024 | VGG19 with transfer learning | | VGG-common | Pre-trained | 224×224 | Standard VGG19 configuration | | ResNet | Pre-trained | 224×224 | ResNet50 architecture | | Inception | Pre-trained | 224×224 | InceptionV3 model | | DenseNet | Pre-trained | 224×224 | DenseNet121 architecture | | MobileNet | Pre-trained | 224×224 | MobileNetV2 for efficient inference |
Model-Specific Configurations
- Custom CNN: Adapts to input data dimensions automatically
- Pre-trained Models: Support two-phase training (frozen → unfrozen layers)
- Input Channels: 3 channels for pre-trained models, 1 channel for custom CNN
🛠️ Installation
Requirements
bash
pip install tensorflow>=2.8.0
pip install scikit-learn
pip install opencv-python
pip install pandas
pip install numpy
pip install matplotlib
pip install pydicom
pip install tensorflow-io
Project Structure
mammogram-pipeline/
├── src/
│ ├── main.py # Main training script
│ ├── main2.py # INbreast-optimized script
│ ├── mainruncmmd.py # CMMD-optimized script
│ ├── config.py # Configuration parameters
│ ├── cnn_models/ # Model architectures
│ ├── data_operations/ # Data preprocessing
│ └── utils.py # Utility functions
├── data/ # Dataset directory
├── saved_models/ # Trained model storage
└── output/ # Results and visualizations
🚀 Usage Examples
Basic Training Commands
CMMD Dataset with MobileNet
bash
python src/main.py -d CMMD -m MobileNet -r train -b 8 -lr 1e-3 -e1 50 -e2 50
mini-MIAS with VGG
bash
python src/main.py -d mini-MIAS -m VGG -r train -b 4 -lr 1e-4 -e1 100 -e2 50
CBIS-DDSM with ResNet (Mass only)
bash
python src/main.py -d CBIS-DDSM -mt mass -m ResNet -r train -b 2 -lr 1e-3
INbreast with Pectoral Muscle Removal
bash
python src/main2.py -d INbreast -m DenseNet -r train --remove_pectoral -b 4
Testing Pre-trained Models
```bash
Test CMMD model
python src/main.py -d CMMD -m MobileNet -r test
Test with ROI processing
python src/main.py -d INbreast -m VGG -r test --roi ```
Advanced Training Options
```bash
With data augmentation and mixed precision
python src/main2.py -d INbreast -m MobileNet -r train \ --applyelastic --applymixup --remove_pectoral -b 8 -lr 1e-3
Custom CNN with specific parameters
python src/main.py -d mini-MIAS -m CNN -r train -b 16 -lr 1e-2 -e1 200 ```
⚙️ Configuration Parameters
Key Parameters in config.py
| Parameter | Default | Description |
|-----------|---------|-------------|
| batch_size | 8 | Training batch size |
| learning_rate | 1e-3 | Initial learning rate |
| early_stopping_patience | 10 | Epochs before early stopping |
| reduce_lr_patience | 5 | Epochs before LR reduction |
| reduce_lr_factor | 0.5 | LR reduction factor |
| min_learning_rate | 1e-6 | Minimum learning rate |
| augment_data | True | Enable data augmentation |
Dataset-Specific Image Sizes
python
MINI_MIAS_IMG_SIZE = {"HEIGHT": 1024, "WIDTH": 1024}
CMMD_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224}
INBREAST_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224}
VGG_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224}
MOBILE_NET_IMG_SIZE = {"HEIGHT": 224, "WIDTH": 224}
BI-RADS Mapping (INbreast)
python
BI_RADS_MAPPING = {
"Normal": ["BI-RADS 1"],
"Benign": ["BI-RADS 2", "BI-RADS 3"],
"Malignant": ["BI-RADS 4a", "BI-RADS 4b", "BI-RADS 4c", "BI-RADS 5", "BI-RADS 6"]
}
📁 Main Scripts Overview
main.py - General Purpose Script
- Supports all datasets and models
- Standard training pipeline
- Suitable for most experiments
main2.py - INbreast Optimized
- Specialized for INbreast dataset
- Automatic pectoral muscle removal
- Mixed precision training
- Advanced augmentation options
mainruncmmd.py - CMMD Optimized
- Optimized for CMMD dataset
- Efficient binary classification
- Streamlined preprocessing
🎯 Advanced Features
ROI Processing
```bash
Enable ROI extraction
python src/main.py -d INbreast -m VGG --roi ```
Two-Phase Training
- Phase 1: Frozen pre-trained layers (
max_epoch_frozen) - Phase 2: Unfrozen fine-tuning (
max_epoch_unfrozen)
Data Augmentation
- Rotation, scaling, shearing
- Elastic deformation
- Mixup and CutMix techniques
- CLAHE enhancement
Mixed Precision Training
Automatically enabled in main2.py for faster training and reduced memory usage.
🏋️ Training Process
Standard Workflow
- Data Loading: Dataset-specific preprocessing
- Model Creation: Architecture selection and compilation
- Phase 1 Training: Frozen pre-trained layers
- Phase 2 Training: Fine-tuning all layers
- Evaluation: Performance metrics and visualization
Early Stopping & LR Scheduling
- Monitor validation loss for early stopping
- Reduce learning rate on plateau
- Save best model weights automatically
Class Weight Handling
Automatic class weight calculation for imbalanced datasets to improve minority class performance.
📊 Model Evaluation
The pipeline provides comprehensive evaluation including: - Accuracy and loss curves - Confusion matrices - Classification reports - ROC curves and AUC scores - Grad-CAM visualizations (where applicable)
🔬 Research Applications
This pipeline has been designed for: - Comparative studies of deep learning architectures - Transfer learning effectiveness analysis - Data augmentation impact assessment - ROI vs. full image classification comparison - Multi-dataset generalization studies
📚 Citation
If you use this pipeline in your research, please cite:
bibtex
@software{mammogram_dl_pipeline,
title = {Mammogram Deep Learning Pipeline for Breast Cancer Detection},
abstract = {Breast cancer claims 11,400 lives on average every year in the UK, making it one of the deadliest diseases. This pipeline explores various deep learning techniques for mammogram classification using CNNs with transfer learning approaches.},
license = {BSD-2-Clause},
year = {2024}
}
📄 License
This project is licensed under the BSD-2-Clause License.
🤝 Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests to improve the pipeline.
📞 Support
For questions or issues, please refer to the documentation or create an issue in the repository.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Adamouization/Breast-Cancer-Detection-Mammogram-Deep-Learning-Publication:
PLOS ONE Submission
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Adam
family-names: Jaamour
email: a.jaamour@bath.edu
orcid: 'https://orcid.org/0000-0002-8298-1302'
affiliation: University of St Andrews
- given-names: Craig
family-names: Myles
affiliation: University of St Andrews
orcid: 'https://orcid.org/0000-0002-2701-3149'
identifiers:
- type: doi
value: 10.5281/zenodo.7980706
repository-code: >-
https://github.com/Adamouization/Breast-Cancer-Detection-Mammogram-Deep-Learning-Publication
url: >-
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280841
abstract: >-
Breast cancer claims 11,400 lives on average every year in
the UK, making it one of the deadliest diseases.
Mammography is the gold standard for detecting early signs
of breast cancer, which can help cure the disease during
its early stages. However, incorrect mammography diagnoses
are common and may harm patients through unnecessary
treatments and operations (or a lack of treatment).
Therefore, systems that can learn to detect breast cancer
on their own could help reduce the number of incorrect
interpretations and missed cases. Various deep learning
techniques, which can be used to implement a system that
learns how to detect instances of breast cancer in
mammograms, are explored throughout this paper.
Convolution Neural Networks (CNNs) are used as part of a
pipeline based on deep learning techniques. A divide and
conquer approach is followed to analyse the effects on
performance and efficiency when utilising diverse deep
learning techniques such as varying network architectures
(VGG19, ResNet50, InceptionV3, DenseNet121, MobileNetV2),
class weights, input sizes, image ratios, pre-processing
techniques, transfer learning, dropout rates, and types of
mammogram projections. This approach serves as a starting
point for model development of mammography classification
tasks. Practitioners can benefit from this work by using
the divide and conquer results to select the most suitable
deep learning techniques for their case out-of-the-box,
thus reducing the need for extensive exploratory
experimentation. Multiple techniques are found to provide
accuracy gains relative to a general baseline (VGG19 model
using uncropped 512 × 512 pixels input images with a
dropout rate of 0.2 and a learning rate of 1 × 10−3) on
the Curated Breast Imaging Subset of DDSM (CBIS-DDSM)
dataset. These techniques involve transfer learning
pre-trained ImagetNet weights to a MobileNetV2
architecture, with pre-trained weights from a binarised
version of the mini Mammography Image Analysis Society
(mini-MIAS) dataset applied to the fully connected layers
of the model, coupled with using weights to alleviate
class imbalance, and splitting CBIS-DDSM samples between
images of masses and calcifications. Using these
techniques, a 5.6% gain in accuracy over the baseline
model was accomplished. Other deep learning techniques
from the divide and conquer approach, such as larger image
sizes, do not yield increased accuracies without the use
of image pre-processing techniques such as Gaussian
filtering, histogram equalisation and input cropping. S
keywords:
- machine-learning
- deep-learning
- convolutional-neural-network
- cnn
- breast-cancer-detection
- mammogram-classification
- plos-one
license: BSD-2-Clause
commit: bc82a51cf1105d6bd24a9c35928d7f625eb456ef
version: '1.2'
date-released: '2023-05-29'
GitHub Events
Total
- Push event: 304
- Public event: 1
Last Year
- Push event: 304
- Public event: 1