feature-inspect

tools for UMAP and linear probe inspection

https://github.com/uit-hdl/feature-inspect

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.5%) to scientific vocabulary

Keywords

linearprobing python umap wsi
Last synced: 6 months ago · JSON representation ·

Repository

tools for UMAP and linear probe inspection

Basic Info
Statistics
  • Stars: 0
  • Watchers: 6
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
linearprobing python umap wsi
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

feature-inspect

The result of the following paper: Open-source framework for detecting bias and overfitting for large pathology images

This package is an open-source tool to explore high-level features from images with UMAPs and/or linear-probing. This is becoming increasingly important as we're now seeing more large-scale models being made. How they perform for your task and dataset needs to be evaluated before use. The main purpose of creating the package is:
1. to make common guidelines for UMAPs parameters (e.g. from Kobak and Berens) more accessible. 2. to provide objective metrics (to be used cautiously) for evaluating feature-spaces 3. to create a tool for exploring models that can scale for large inputs (e.g. whole-slide images)

Installation

```bash pip install feature_inspect

optional if you want to use linear probing

pip install featureinspect[lpinspect] ```

GPU acceleration for UMAP

To install the libraries needed for cuml, please use https://docs.rapids.ai/install/ and install the "cuml" and pytorch package using conda. Further, to use the GPU acceleration, pass use_cuml=True to make_umap

Usage

Examples are given in the examples folder. But a simple example is:

```python import numpy as np images = np.random.rand(100, 32, 32, 3)

.. use a model or clustering method to extract features from the images

which should be an array of shape (100, N), where N is the number of features

features = [[...]] from umapinspect import makeumap

make_umap(features)

if you install linear_probe

from lpinspect import lpeval

labels should be a list of strings in the same order as the features

labels = [...] data = [{"image": f, "label": l} for f, l in zip(features, labels)] lpeval(data=data) `` Performance metrics and detailed results are written using [tensorboard](https://www.tensorflow.org/tensorboard). you can initialise a writer like this:from torch.utils.tensorboard import SummaryWriter; writer = SummaryWriter(logdir="path/to/logdir") and pass it to themakeumapandlpeval` functions.

UMAPs can be rendered to html instead of the most common matplotlib solution. The UI looks similar to this: ./figures/umap.png

Usage with MONAI

MONAI has some interfaces similar to pytorch-ignite that allows you to create a model with only a few lines of code. I personally prefer this approach when training models. The following code snippet will attach handlers that evaluate the model using UMAPs and linear-probing on the validation set. from monai_handlers.LinearProbeHandler import LinearProbeHandler from monai_handlers.UmapHandler import UmapHandler val_postprocessing = Compose([EnsureTyped(keys=CommonKeys.PRED)]) evaluator = SupervisedEvaluator( device=device, val_data_loader=dl_val, network=model, val_handlers=[ UmapHandler(model=model, feature_layer_name=feature_layer_name, umap_dir=out_path, summary_writer=writer, output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])), LinearProbeHandler(model=model, feature_layer_name=feature_layer_name, out_dir=out_path, summary_writer=writer, output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])), ], key_val_metric={ "val_acc": Accuracy(output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])) }, postprocessing=val_postprocessing, )

Recreating the results from the paper

First, follow the instructions at https://github.com/uit-hdl/code-overfit-detection-framework. This will produce embeddings in the out/ folder. Then you can run the following:

```bash

Creating a fine-tuned phikon model to do disease-classification on TCGA-LUSC

ipython examples/usecaselinearprobe.py -- --embeddings-path out/phikonTCGALUSC-tilesembedding.zarr/ --label-file out/tcga-tile-annotations.csv --label-key disease --out-dir outphikonlp_disease --epochs 20 --batch-size 256

ipython examples/evaluatelp.py -- --embeddings-path out/phikonCPTAC-tilesembedding.zarr/ --label-file out/cptac-tile-annotations.csv --label-key disease --out-dir outphikonlpdisease --model-dir outphikonlp_disease --tensorboard-name cptac ```

Owner

  • Name: Health Data Lab
  • Login: uit-hdl
  • Kind: organization
  • Location: Norway

Open source projects by the Health Data Lab at UiT

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
  - family-names: Anders Sildnes
    given-names: Nikita Shvetsov
  - family-names: Masoud Tafavvoghi
    given-names: Vi Ngoc-Nha Tran
  - family-names: Kajsa Møllersen
    given-names: Lill-Tove Rasmussen Busund
  - family-names: Thomas K. Kilvær
    given-names: Lars Ailo Bongo
title: Open-source framework for detecting bias and overfitting for large pathology images
version: 1.0.0
url: https://arxiv.org/abs/2503.01827
date-released: '2025-03-07'
preferred-citation:
  authors:
    - family-names: Anders Sildnes
      given-names: Nikita Shvetsov
    - family-names: Masoud Tafavvoghi
      given-names: Vi Ngoc-Nha Tran
    - family-names: Kajsa Møllersen
      given-names: Lill-Tove Rasmussen Busund
    - family-names: Thomas K. Kilvær
      given-names: Lars Ailo Bongo
  title: Open-source framework for detecting bias and overfitting for large pathology images
  url: https://arxiv.org/abs/2503.01827
  type: generic
  year: '2025'
  conference: {}
  publisher: {}

GitHub Events

Total
  • Push event: 8
  • Public event: 1
  • Create event: 1
Last Year
  • Push event: 8
  • Public event: 1
  • Create event: 1

Dependencies

pyproject.toml pypi