breastdcedl

BreastDCEDL Pretreatment MRI scans of 2070 Breast cancer patients. A Deep Learning-Ready DCE-MRI Breast Cancer Dataset from I-SPY2 and I-SPY1 trials (1,2 and DUKE.

https://github.com/naomifridman/breastdcedl

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary

Keywords

ai breast-cancer dataset dce-mri deep-learning dicom-files her2 mri mri-data pcr vit

Last synced: 11 months ago · JSON representation

Repository

BreastDCEDL Pretreatment MRI scans of 2070 Breast cancer patients. A Deep Learning-Ready DCE-MRI Breast Cancer Dataset from I-SPY2 and I-SPY1 trials (1,2 and DUKE.

Basic Info

Host: GitHub
Owner: naomifridman
License: other
Language: Jupyter Notebook
Default Branch: main
Homepage: https://zenodo.org/records/15627233
Size: 1.05 GB

Statistics

Stars: 6
Watchers: 1
Forks: 3
Open Issues: 1
Releases: 1

Topics

ai breast-cancer dataset dce-mri deep-learning dicom-files her2 mri mri-data pcr vit

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme License Zenodo

BreastDCEDL

BreastDCEDL is a curated collection of pretreatment 3D dynamic contrast-enhanced MRI (DCE-MRI) scans from 2,070 breast cancer patients, assembled into a deep learningready dataset. It integrates data from three major clinical trials: I-SPY2 (n = 982), I-SPY1 (n = 173), and Duke (n = 916). The dataset, originally sourced from The Cancer Imaging Archive (TCIA), includes:

3D raw MRI scans converted to NIfTI format
Corresponding 3D tumor binary segmentation masks
Clinical and demographic metadata, including pCR, HER2, HR, age, and race

Article

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and Developing a Transformer Implementation
Read on arXiv

Description:
This work introduces BreastDCEDL, a carefully assembled dataset of pre-treatment 3D DCE-MRI scans, and presents a transformer-based deep learning approach for analyzing these images. The dataset brings together imaging data from multiple sources to support research in breast cancer detection and diagnosis, while the accompanying transformer implementation demonstrates state-of-the-art performance on this challenging medical imaging task.

BreastDCEDL Data Download

I-SPY1
The complete I-SPY1 dataset is available for direct download from Zenodo:
https://zenodo.org/records/15627233
Duke
The full Duke cohort can be accessed via The Cancer Imaging Archive (TCIA).
Conversion to NIfTI format can be performed using the provided code in the DUKE/ folder.
A minimized versioncontaining three (n_z, 256256) tumor-centered scans per patientis available on Zenodo:
https://zenodo.org/records/15627233
I-SPY2
The full I-SPY2 dataset is available on TCIA and can be converted to NIfTI using the code in this repository.
A pre-converted NIfTI version will be made available on TCIA in the near future.

Benchmark Prediction Tasks

The dataset provides a standardized benchmark for three central classification tasks in breast cancer MRI:

Pathological Complete Response (pCR): A binary classification task predicting treatment response based on pretreatment imaging. Approximately 32.2% of patients (n = 317) achieved pCR, offering a moderately balanced class distribution. pCR (pathologic complete response) refers to the complete disappearance of all invasive cancer cells in the breast and lymph nodes following neoadjuvant therapy and is considered a strong surrogate for favorable long-term prognosis.
Hormone Receptor (HR) Status: Classification of HR positivity (present in 54.5% of cases, n = 537) directly from imaging, assessing the link between MRI features and receptor expression.
HER2 Status: Prediction of HER2 expression (positive in 24.8%, n = 244) from imaging data, enabling evaluation of MRI-based biomarker inference.

Ground truth labels include HR, HER2, and pCR status, as well as molecular subtypes (HR+/HER2, HER2+, Triple Negative) and MammaPrint risk categories. The dataset is split into training, validation, and test cohorts with preserved class distributions to ensure consistent and reproducible | Split | pCR N | pCR+ | pCR | HR N | HR+ | HR | HER2 N | HER2+ | HER2 | |------------|-------|------|------|------|------|------|--------|-------|-------| | Training | 1099 | 324 | 775 | 1543 | 997 | 546 | 1542 | 349 | 1193 | | Validation | 174 | 53 | 121 | 269 | 168 | 101 | 269 | 58 | 211 | | Test | 175 | 53 | 122 | 271 | 173 | 98 | 269 | 56 | 213 | | Total | 1448 | 430 | 1018 | 2083 | 1338 | 745 | 2080 | 463 | 1617 |

Note: pCR N refers to the number of patients with non-missing pCR labels; similarly, HR N and HER2 N indicate the number of patients with available HR and HER2 status, respectively. Class distributions are shown for each split.

DCE-MRI Clinical Background

Dynamic Contrast-Enhanced MRI (DCE-MRI) is a 3D imaging technique that captures a sequence of scans before and after the injection of a contrast agent (typically gadolinium). The contrast enhances visibility of blood vessels and tissue perfusion, allowing observation of how the agent accumulates and clears from tissues over time.

Tumors exhibit characteristic enhancement patterns: malignant lesions often enhance quickly and wash out, while benign lesions typically enhance more slowly or steadily. Radiologists assess these patterns by reviewing two or three key time pointscommonly the pre-contrast image and one or two post-contrast phases (e.g., the 2nd, 3rd, or 4th scan in the series). This helps them distinguish between benign and malignant lesions and informs treatment decisions.

These enhancement dynamics are critical both for clinical evaluation and for machine learning models that aim to predict malignancy, treatment response, or other tumor characteristics.

Dataset Details

I-SPY2 Dataset

The I-SPY2 trial (Li et al., 2022; Newitt et al., 2021) provides DCE-MRI scans for 982 patients acquired from 2010 to 2016 across over 22 clinical centers using a standardized imaging protocol.

Target cohort: Women with high-risk, locally advanced breast cancer
Clinical data: pCR, HR, HER2, MammaPrint (MP) scores, type of neoadjuvant therapy, age, and race

** Imaging Details:**

Each MRI scan includes 3 to 12 time points (mostly 7)
Radiologists selected 3 time points for tumor segmentation: typically scans 0 (pre-contrast), 2 (early post-contrast), and 5 or 6 (late post-contrast). These selections are provided in the metadata under pre, post_early, and post_late.

I-SPY1 Dataset

The I-SPY1 dataset is a predecessor to I-SPY2 and contains similar imaging and clinical information, with slightly fewer patients and minor differences in acquisition protocols.

Patients: 173 with 35 usable DCE scans
Clinical data: pCR, HR, HER2, and other core biomarkers

Example from I-SPY1

Duke Dataset

The Duke Breast Cancer Dataset consists of 920 patients with biopsy-confirmed invasive breast cancer, collected between 2000 and 2014.

Only 288 patients (31%) received neoadjuvant chemotherapy (NAC) and have annotated pCR values.
The rest underwent surgery first, followed by adjuvant therapy, and are not included in pCR analysis.
DCE-MRI scans include one pre-contrast and 24 post-contrast acquisitions, spaced 12 minutes apart.

Example from I-SPY1

** Data Processing Notes:**

Bounding box annotations of the largest tumor are provided.
No full tumor segmentation masks are available for Duke.

Citation

If you use the BreastDCEDL dataset or code in your research, please cite:

Certainly! Heres a concise, original description of your article, followed by the article and dataset citation sections in Markdown:

Article

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and Developing a Transformer Implementation
Read on arXiv

Citation: bibtex @article{fridman2025breastdcedl, title={BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and Developing a Transformer Implementation}, author={Fridman, Naomi and others}, journal={arXiv preprint arXiv:2506.12190}, year={2025}, doi={10.48550/arXiv.2506.12190} }

Dataset

BreastDCEDL Dataset
Available on Zenodo

Citation: ```bibtex @dataset{fridman2025breastdcedl_dataset, author = {Fridman, Naomi and others}, title = {BreastDCEDL: Curated DCE-MRI Dataset}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.15627233} }

Source

All datasets were originally acquired from:

The Cancer Imaging Archive (TCIA)
Monticciolo et al., 2018, Journal of the American College of Radiology (JACR)
ClinicalTrials.gov - I-SPY2 (NCT01042379)

Owner

Name: naomi fridman
Login: naomifridman
Kind: user
Location: Tel Aviv Israel
Company: Data Scientist, self employed

Website: https://www.linkedin.com/in/naomi-fridman/
Repositories: 2
Profile: https://github.com/naomifridman

MSc Applied Mathematics. Technion Haifa. BSc Mathematics, Philosophy, Computer science. TLV University. Machine Learning, Deep Learning.

GitHub Events

Total

Issues event: 2
Watch event: 3
Issue comment event: 3
Member event: 2
Push event: 61
Fork event: 4
Create event: 2

Last Year

Issues event: 2
Watch event: 3
Issue comment event: 3
Member event: 2
Push event: 61
Fork event: 4
Create event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

breastdcedl

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

BreastDCEDL

Article

BreastDCEDL Data Download

Benchmark Prediction Tasks

DCE-MRI Clinical Background

Dataset Details

I-SPY2 Dataset

I-SPY1 Dataset

Duke Dataset

Citation

Article

Dataset

Source

Owner

GitHub Events

Total

Last Year