breastdcedl

BreastDCEDL Pretreatment MRI scans of 2070 Breast cancer patients. A Deep Learning-Ready DCE-MRI Breast Cancer Dataset from I-SPY2 and I-SPY1 trials (1,2 and DUKE.

https://github.com/naomifridman/breastdcedl

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary

Keywords

ai breast-cancer dataset dce-mri deep-learning dicom-files her2 mri mri-data pcr vit
Last synced: 6 months ago · JSON representation

Repository

BreastDCEDL Pretreatment MRI scans of 2070 Breast cancer patients. A Deep Learning-Ready DCE-MRI Breast Cancer Dataset from I-SPY2 and I-SPY1 trials (1,2 and DUKE.

Basic Info
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 3
  • Open Issues: 1
  • Releases: 1
Topics
ai breast-cancer dataset dce-mri deep-learning dicom-files her2 mri mri-data pcr vit
Created about 1 year ago · Last pushed 7 months ago
Metadata Files
Readme License Zenodo

README.md

BreastDCEDL

BreastDCEDL is a curated collection of pretreatment 3D dynamic contrast-enhanced MRI (DCE-MRI) scans from 2,070 breast cancer patients, assembled into a deep learningready dataset. It integrates data from three major clinical trials: I-SPY2 (n = 982), I-SPY1 (n = 173), and Duke (n = 916). The dataset, originally sourced from The Cancer Imaging Archive (TCIA), includes:

  • 3D raw MRI scans converted to NIfTI format
  • Corresponding 3D tumor binary segmentation masks
  • Clinical and demographic metadata, including pCR, HER2, HR, age, and race

Article

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and Developing a Transformer Implementation
Read on arXiv

Description:
This work introduces BreastDCEDL, a carefully assembled dataset of pre-treatment 3D DCE-MRI scans, and presents a transformer-based deep learning approach for analyzing these images. The dataset brings together imaging data from multiple sources to support research in breast cancer detection and diagnosis, while the accompanying transformer implementation demonstrates state-of-the-art performance on this challenging medical imaging task.


BreastDCEDL Data Download

  • I-SPY1
    The complete I-SPY1 dataset is available for direct download from Zenodo:
    https://zenodo.org/records/15627233

  • Duke
    The full Duke cohort can be accessed via The Cancer Imaging Archive (TCIA).
    Conversion to NIfTI format can be performed using the provided code in the DUKE/ folder.
    A minimized versioncontaining three (nz, 256256) tumor-centered scans per patientis available on Zenodo:
    https://zenodo.org/records/15627233

  • I-SPY2
    The full I-SPY2 dataset is available on TCIA and can be converted to NIfTI using the code in this repository.
    A pre-converted NIfTI version will be made available on TCIA in the near future.

Benchmark Prediction Tasks

The dataset provides a standardized benchmark for three central classification tasks in breast cancer MRI:

  • Pathological Complete Response (pCR): A binary classification task predicting treatment response based on pretreatment imaging. Approximately 32.2% of patients (n = 317) achieved pCR, offering a moderately balanced class distribution. pCR (pathologic complete response) refers to the complete disappearance of all invasive cancer cells in the breast and lymph nodes following neoadjuvant therapy and is considered a strong surrogate for favorable long-term prognosis.

  • Hormone Receptor (HR) Status: Classification of HR positivity (present in 54.5% of cases, n = 537) directly from imaging, assessing the link between MRI features and receptor expression.

  • HER2 Status: Prediction of HER2 expression (positive in 24.8%, n = 244) from imaging data, enabling evaluation of MRI-based biomarker inference.

Ground truth labels include HR, HER2, and pCR status, as well as molecular subtypes (HR+/HER2, HER2+, Triple Negative) and MammaPrint risk categories. The dataset is split into training, validation, and test cohorts with preserved class distributions to ensure consistent and reproducible | Split | pCR N | pCR+ | pCR | HR N | HR+ | HR | HER2 N | HER2+ | HER2 | |------------|-------|------|------|------|------|------|--------|-------|-------| | Training | 1099 | 324 | 775 | 1543 | 997 | 546 | 1542 | 349 | 1193 | | Validation | 174 | 53 | 121 | 269 | 168 | 101 | 269 | 58 | 211 | | Test | 175 | 53 | 122 | 271 | 173 | 98 | 269 | 56 | 213 | | Total | 1448 | 430 | 1018 | 2083 | 1338 | 745 | 2080 | 463 | 1617 |

Note: pCR N refers to the number of patients with non-missing pCR labels; similarly, HR N and HER2 N indicate the number of patients with available HR and HER2 status, respectively. Class distributions are shown for each split.


DCE-MRI Clinical Background

Dynamic Contrast-Enhanced MRI (DCE-MRI) is a 3D imaging technique that captures a sequence of scans before and after the injection of a contrast agent (typically gadolinium). The contrast enhances visibility of blood vessels and tissue perfusion, allowing observation of how the agent accumulates and clears from tissues over time.

Tumors exhibit characteristic enhancement patterns: malignant lesions often enhance quickly and wash out, while benign lesions typically enhance more slowly or steadily. Radiologists assess these patterns by reviewing two or three key time pointscommonly the pre-contrast image and one or two post-contrast phases (e.g., the 2nd, 3rd, or 4th scan in the series). This helps them distinguish between benign and malignant lesions and informs treatment decisions.

These enhancement dynamics are critical both for clinical evaluation and for machine learning models that aim to predict malignancy, treatment response, or other tumor characteristics.


Dataset Details

I-SPY2 Dataset

The I-SPY2 trial (Li et al., 2022; Newitt et al., 2021) provides DCE-MRI scans for 982 patients acquired from 2010 to 2016 across over 22 clinical centers using a standardized imaging protocol.

  • Target cohort: Women with high-risk, locally advanced breast cancer
  • Clinical data: pCR, HR, HER2, MammaPrint (MP) scores, type of neoadjuvant therapy, age, and race

** Imaging Details:**

  • Each MRI scan includes 3 to 12 time points (mostly 7)
  • Radiologists selected 3 time points for tumor segmentation: typically scans 0 (pre-contrast), 2 (early post-contrast), and 5 or 6 (late post-contrast). These selections are provided in the metadata under pre, post_early, and post_late.

I-SPY1 Dataset

The I-SPY1 dataset is a predecessor to I-SPY2 and contains similar imaging and clinical information, with slightly fewer patients and minor differences in acquisition protocols.

  • Patients: 173 with 35 usable DCE scans
  • Clinical data: pCR, HR, HER2, and other core biomarkers

Example from I-SPY1


Duke Dataset

The Duke Breast Cancer Dataset consists of 920 patients with biopsy-confirmed invasive breast cancer, collected between 2000 and 2014.

  • Only 288 patients (31%) received neoadjuvant chemotherapy (NAC) and have annotated pCR values.
  • The rest underwent surgery first, followed by adjuvant therapy, and are not included in pCR analysis.
  • DCE-MRI scans include one pre-contrast and 24 post-contrast acquisitions, spaced 12 minutes apart.

Example from I-SPY1

** Data Processing Notes:**

  • Bounding box annotations of the largest tumor are provided.
  • No full tumor segmentation masks are available for Duke.

Citation

If you use the BreastDCEDL dataset or code in your research, please cite:

Certainly! Heres a concise, original description of your article, followed by the article and dataset citation sections in Markdown:


Article

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and Developing a Transformer Implementation
Read on arXiv

Citation: bibtex @article{fridman2025breastdcedl, title={BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and Developing a Transformer Implementation}, author={Fridman, Naomi and others}, journal={arXiv preprint arXiv:2506.12190}, year={2025}, doi={10.48550/arXiv.2506.12190} }

Dataset

BreastDCEDL Dataset
Available on Zenodo

Citation: ```bibtex @dataset{fridman2025breastdcedl_dataset, author = {Fridman, Naomi and others}, title = {BreastDCEDL: Curated DCE-MRI Dataset}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.15627233} }


Source

All datasets were originally acquired from:

  • The Cancer Imaging Archive (TCIA)
  • Monticciolo et al., 2018, Journal of the American College of Radiology (JACR)
  • ClinicalTrials.gov - I-SPY2 (NCT01042379) <!--stackedit_data: eyJoaXN0b3J5IjpbMTY0MDcxODczOSwtMTM4MTMyNTczM119 -->

Owner

  • Name: naomi fridman
  • Login: naomifridman
  • Kind: user
  • Location: Tel Aviv Israel
  • Company: Data Scientist, self employed

MSc Applied Mathematics. Technion Haifa. BSc Mathematics, Philosophy, Computer science. TLV University. Machine Learning, Deep Learning.

GitHub Events

Total
  • Issues event: 2
  • Watch event: 3
  • Issue comment event: 3
  • Member event: 2
  • Push event: 61
  • Fork event: 4
  • Create event: 2
Last Year
  • Issues event: 2
  • Watch event: 3
  • Issue comment event: 3
  • Member event: 2
  • Push event: 61
  • Fork event: 4
  • Create event: 2