aapi_code

A local application frontend and a backend server based on U-Net and Dectectron2 as a solution to the auto annotation of pathology images (Columbia Data Science Institute Fall 2020 Capstone Project)

https://github.com/alexliyihao/aapi_code

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

asap data-augmentation detectron2 image-segmentation unet-pytorch
Last synced: 6 months ago · JSON representation

Repository

A local application frontend and a backend server based on U-Net and Dectectron2 as a solution to the auto annotation of pathology images (Columbia Data Science Institute Fall 2020 Capstone Project)

Basic Info
  • Host: GitHub
  • Owner: alexliyihao
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 482 MB
Statistics
  • Stars: 2
  • Watchers: 0
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
asap data-augmentation detectron2 image-segmentation unet-pytorch
Created about 5 years ago · Last pushed about 3 years ago
Metadata Files
Readme License Citation

README.md

Auto Annotation of Pathology Images

Columbia Data Science Institute Capstone Project, Fall 2020

Mentor: Dr. Adler Perotte

Instructor: Dr. Adam S. Kelleher

Team member:

Yihao Li, Chao Huang, Yufeng Ma, Xiaoyun Zhu, Shuo Yang

This project aims to create a machine learning-driven user interface for the annotation of very large pathology images. Each image may be 10s of thousands by 10s of thousands of pixels. As a result, annotation of the entire slide for object recognition or semantic/instance segmentation can be time consuming when entities are only a few pixels in diameter. This project aims to build a framework for maximally leveraging expert annotator (clinician) time by interleaving annotation (label generation) with inference to provide an intuitive notion of model fit and the minimal amount of labeling required for acceptable model performance.

Project Final Report

The final report for this project can be seen from: Final Report

Video Demonstration

A video presentation with slides can be found on Youtube via https://youtu.be/XTHRxxOoG-k.

Installation

  1. Required packages can be found in the requirements file, it's recommended to use a virtual environment to install all required packages through pip.
  2. Note that although detectron2 is used in this repository, it's NOT explicitly listed in the requirements due to its complex dependencies on the version of PyTorch and CUDA. Therefore, it's better to build it from source by following the official guide.

Repository Structure

  1. Collage Generator: the module for generating synthetic whole slide images (a.k.a, collages) from vignettes, which utilize a complex algorithm. The algorithm is fully described and explained in the sub-directory called illustration.

  2. Vignettes Data: contains vignettes used for generating synthetic whole slide images.

  3. COCO-Format Converter: the module for generating instance segmentation datasets from collages using COCO-compatible format.

  4. Core ML Components: the module storing essential functions and tools for training and serving UNet models for segmentation.

    • preprocessing: contains functions for the preprocessing pipeline, namely cropping images as patches, saving patches as HDF5 files and loading data as PyTorch Datasets with augmentations.
    • modeling: contains UNet model architecture, which is wrapped as a PyTorch Lightning model. Also, essential functions for postprocessing are also provided.
    • utils: contains essential utility functions for manipulating slides and annotations.
    • api: high level APIs exposed for the model serving component.
    • config: a configuration file denoting target classes and parameters for the segmentation task.
  5. Scripts: contains useful scripts for tuning (using Optuna) and testing models. Can also be used as a reference for calling low-level functions.

  6. Demo Notebooks: contains several useful demo notebooks showing

    the usage of core components.

Owner

  • Name: Yihao Li
  • Login: alexliyihao
  • Kind: user
  • Location: New York, NY
  • Company: Columbia University in the City of New York

Staff Associate @ Columbia Neurology Dept. | Member @ RILEM TC-DCS | M.S. Data Science @ Columbia University

GitHub Events

Total
Last Year