feature-inspect

tools for UMAP and linear probe inspection

https://github.com/uit-hdl/feature-inspect

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, nature.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.5%) to scientific vocabulary

Keywords

linearprobing python umap wsi

Last synced: 9 months ago · JSON representation ·

Repository

tools for UMAP and linear probe inspection

Basic Info

Host: GitHub
Owner: uit-hdl
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2503.01827
Size: 82 KB

Statistics

Stars: 0
Watchers: 6
Forks: 0
Open Issues: 0
Releases: 0

Topics

linearprobing python umap wsi

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

feature-inspect

The result of the following paper: Open-source framework for detecting bias and overfitting for large pathology images

This package is an open-source tool to explore high-level features from images with UMAPs and/or linear-probing. This is becoming increasingly important as we're now seeing more large-scale models being made. How they perform for your task and dataset needs to be evaluated before use. The main purpose of creating the package is:
1. to make common guidelines for UMAPs parameters (e.g. from Kobak and Berens) more accessible. 2. to provide objective metrics (to be used cautiously) for evaluating feature-spaces 3. to create a tool for exploring models that can scale for large inputs (e.g. whole-slide images)

Installation

```bash pip install feature_inspect

optional if you want to use linear probing

pip install featureinspect[lpinspect] ```

GPU acceleration for UMAP

To install the libraries needed for cuml, please use https://docs.rapids.ai/install/ and install the "cuml" and pytorch package using conda. Further, to use the GPU acceleration, pass use_cuml=True to make_umap

Usage

Examples are given in the examples folder. But a simple example is:

```python import numpy as np images = np.random.rand(100, 32, 32, 3)

.. use a model or clustering method to extract features from the images

which should be an array of shape (100, N), where N is the number of features

features = [[...]] from umapinspect import makeumap

make_umap(features)

if you install linear_probe

from lpinspect import lpeval

labels should be a list of strings in the same order as the features

labels = [...] data = [{"image": f, "label": l} for f, l in zip(features, labels)] lpeval(data=data) ``Performance metrics and detailed results are written using [tensorboard](https://www.tensorflow.org/tensorboard). you can initialise a writer like this:from torch.utils.tensorboard import SummaryWriter; writer = SummaryWriter(logdir="path/to/logdir")and pass it to themakeumapandlpeval` functions.

UMAPs can be rendered to html instead of the most common matplotlib solution. The UI looks similar to this: ./figures/umap.png

Usage with MONAI

MONAI has some interfaces similar to pytorch-ignite that allows you to create a model with only a few lines of code. I personally prefer this approach when training models. The following code snippet will attach handlers that evaluate the model using UMAPs and linear-probing on the validation set. from monai_handlers.LinearProbeHandler import LinearProbeHandler from monai_handlers.UmapHandler import UmapHandler val_postprocessing = Compose([EnsureTyped(keys=CommonKeys.PRED)]) evaluator = SupervisedEvaluator( device=device, val_data_loader=dl_val, network=model, val_handlers=[ UmapHandler(model=model, feature_layer_name=feature_layer_name, umap_dir=out_path, summary_writer=writer, output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])), LinearProbeHandler(model=model, feature_layer_name=feature_layer_name, out_dir=out_path, summary_writer=writer, output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])), ], key_val_metric={ "val_acc": Accuracy(output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])) }, postprocessing=val_postprocessing, )

Recreating the results from the paper

First, follow the instructions at https://github.com/uit-hdl/code-overfit-detection-framework. This will produce embeddings in the out/ folder. Then you can run the following:

```bash

Creating a fine-tuned phikon model to do disease-classification on TCGA-LUSC

ipython examples/usecaselinearprobe.py -- --embeddings-path out/phikonTCGALUSC-tilesembedding.zarr/ --label-file out/tcga-tile-annotations.csv --label-key disease --out-dir outphikonlp_disease --epochs 20 --batch-size 256

ipython examples/evaluatelp.py -- --embeddings-path out/phikonCPTAC-tilesembedding.zarr/ --label-file out/cptac-tile-annotations.csv --label-key disease --out-dir outphikonlpdisease --model-dir outphikonlp_disease --tensorboard-name cptac ```

Owner

Name: Health Data Lab
Login: uit-hdl
Kind: organization
Location: Norway

Website: http://hdl.cs.uit.no/
Repositories: 22
Profile: https://github.com/uit-hdl

Open source projects by the Health Data Lab at UiT

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
  - family-names: Anders Sildnes
    given-names: Nikita Shvetsov
  - family-names: Masoud Tafavvoghi
    given-names: Vi Ngoc-Nha Tran
  - family-names: Kajsa Møllersen
    given-names: Lill-Tove Rasmussen Busund
  - family-names: Thomas K. Kilvær
    given-names: Lars Ailo Bongo
title: Open-source framework for detecting bias and overfitting for large pathology images
version: 1.0.0
url: https://arxiv.org/abs/2503.01827
date-released: '2025-03-07'
preferred-citation:
  authors:
    - family-names: Anders Sildnes
      given-names: Nikita Shvetsov
    - family-names: Masoud Tafavvoghi
      given-names: Vi Ngoc-Nha Tran
    - family-names: Kajsa Møllersen
      given-names: Lill-Tove Rasmussen Busund
    - family-names: Thomas K. Kilvær
      given-names: Lars Ailo Bongo
  title: Open-source framework for detecting bias and overfitting for large pathology images
  url: https://arxiv.org/abs/2503.01827
  type: generic
  year: '2025'
  conference: {}
  publisher: {}

GitHub Events

Total

Push event: 8
Public event: 1
Create event: 1

Last Year

Push event: 8
Public event: 1
Create event: 1

Dependencies

pyproject.toml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

feature-inspect

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

feature-inspect

Installation

optional if you want to use linear probing

GPU acceleration for UMAP

Usage

.. use a model or clustering method to extract features from the images

which should be an array of shape (100, N), where N is the number of features

if you install linear_probe

labels should be a list of strings in the same order as the features

Usage with MONAI

Recreating the results from the paper

Creating a fine-tuned phikon model to do disease-classification on TCGA-LUSC

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies