https://github.com/alimzade/ovarian-cancer-classifiers

Various methods and experiments for classification and clustering of 750GB data of WSI (Whole Slide Image) and TMA (Tissue Micro-Array) images from Kaggle competition.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Various methods and experiments for classification and clustering of 750GB data of WSI (Whole Slide Image) and TMA (Tissue Micro-Array) images from Kaggle competition.

Basic Info

Host: GitHub
Owner: Alimzade
Language: Jupyter Notebook
Default Branch: main
Size: 112 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme

Ovarian Cancer Subtype Classification

Overview

This project focuses on classifying subtypes of ovarian cancer using Whole Slide Images (WSI) and Tissue Microarrays (TMA) data from the UBC-OCEAN Kaggle competition. The task is to classify ovarian cancer images into the following subtypes:

Clear Cell Carcinoma (CC)
Endometrioid Carcinoma (EC)
High-grade Serous Carcinoma (HGSC)
Low-grade Serous Carcinoma (LGSC)
Mucinous Carcinoma (MC)
Other (Note: This class is absent in the training set but present in the test set.)

The dataset consists of images from different hospitals, with the test set containing images from institutions not represented in the training set, adding a challenge of domain shift.

OCEAN-Optional-Figure

Data Description

WSI (Whole Slide Images): Large images (up to 100,000 x 50,000 pixels) at 20x magnification. The average file size is 1-2 GB.
TMA (Tissue Microarrays): Smaller images (~4,000 x 4,000 pixels) at 40x magnification, but there are relatively fewer TMA samples in the dataset.

Methods and Experiments

TMA Experiments

Due to the limited number of TMA images, several approaches were explored:

Data Augmentation: Various augmentation methods were applied to increase the amount of training data, including rotations, flips, and color transformations.
Cross-Validation: Cross-validation was employed to improve model generalization.
Feature Clustering: Features were extracted from the images and then clustered to explore patterns and improve classification accuracy.
Tiling: TMA images were divided into smaller tiles, and tile-wise predictions were aggregated to classify the entire image based on the most common tile prediction.
Autoencoder: An autoencoder trained on external TMA images from different cancers (scraped from Stanford's Tissue Microarray database) was used for feature extraction.
Segmentation and Core Detection: Segmentation algorithms were used to detect the tissue cores within the TMA images.

Results: The classification results for TMA images were less promising, likely due to the small dataset size and lack of variation.

WSI Experiments

Given the enormous size of WSI images, the following techniques were applied:

Multiple Instance Learning (MIL): MIL classifiers were used where each WSI image was treated as a bag of feature vectors. The model was tasked with learning from these instances without requiring precise region annotations.
Tiling and Feature Extraction: Similar to TMA, WSI images were divided into smaller tiles. Feature extraction was performed on these tiles using pre-trained models like ResNet.
Clustering: Features extracted from WSI tiles were clustered, and the cluster information was used as additional input for model training, improving the model's ability to differentiate between cancer subtypes.
Hyperparameter Tuning: Various hyperparameters were tuned across different experiments to optimize model performance, which is detailed in the corresponding Jupyter notebooks.

Best Approach: The MIL classifier approach, treating WSI images as bags of instances, yielded the best performance.

Results

TMA Experiments:

Results were less successful due to the small dataset and lack of class variation. The best approach involved data augmentation and tiling, but performance was still limited.

WSI Experiments:

The best performing model was the MIL classifier, which treated each WSI as a collection of feature vectors. Additional improvements were achieved through feature clustering and hyperparameter tuning.

Owner

Login: Alimzade
Kind: user

Repositories: 1
Profile: https://github.com/Alimzade

GitHub Events

Total

Push event: 1
Create event: 2

Last Year

Push event: 1
Create event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/alimzade/ovarian-cancer-classifiers

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Ovarian Cancer Subtype Classification

Overview

Data Description

Methods and Experiments

TMA Experiments

WSI Experiments

Results

TMA Experiments:

WSI Experiments:

Owner

GitHub Events

Total

Last Year