feature-inspect
tools for UMAP and linear probe inspection
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.5%) to scientific vocabulary
Keywords
Repository
tools for UMAP and linear probe inspection
Basic Info
- Host: GitHub
- Owner: uit-hdl
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2503.01827
- Size: 82 KB
Statistics
- Stars: 0
- Watchers: 6
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
feature-inspect
The result of the following paper: Open-source framework for detecting bias and overfitting for large pathology images
This package is an open-source tool to explore high-level features from images with UMAPs and/or linear-probing.
This is becoming increasingly important as we're now seeing more large-scale models being made.
How they perform for your task and dataset needs to be evaluated before use.
The main purpose of creating the package is:
1. to make common guidelines for UMAPs parameters (e.g. from Kobak and Berens) more accessible.
2. to provide objective metrics (to be used cautiously) for evaluating feature-spaces
3. to create a tool for exploring models that can scale for large inputs (e.g. whole-slide images)
Installation
```bash pip install feature_inspect
optional if you want to use linear probing
pip install featureinspect[lpinspect] ```
GPU acceleration for UMAP
To install the libraries needed for cuml, please use https://docs.rapids.ai/install/ and install the "cuml" and pytorch package using conda. Further, to use the GPU acceleration, pass use_cuml=True to make_umap
Usage
Examples are given in the examples folder. But a simple example is:
```python import numpy as np images = np.random.rand(100, 32, 32, 3)
.. use a model or clustering method to extract features from the images
which should be an array of shape (100, N), where N is the number of features
features = [[...]] from umapinspect import makeumap
make_umap(features)
if you install linear_probe
from lpinspect import lpeval
labels should be a list of strings in the same order as the features
labels = [...]
data = [{"image": f, "label": l} for f, l in zip(features, labels)]
lpeval(data=data)
``
Performance metrics and detailed results are written using [tensorboard](https://www.tensorflow.org/tensorboard).
you can initialise a writer like this:from torch.utils.tensorboard import SummaryWriter; writer = SummaryWriter(logdir="path/to/logdir")
and pass it to themakeumapandlpeval` functions.
UMAPs can be rendered to html instead of the most common matplotlib solution. The UI looks similar to this: ./figures/umap.png
Usage with MONAI
MONAI has some interfaces similar to pytorch-ignite that allows you to create a model with only a few lines of code.
I personally prefer this approach when training models. The following code snippet will attach handlers that evaluate the model using UMAPs and linear-probing on the validation set.
from monai_handlers.LinearProbeHandler import LinearProbeHandler
from monai_handlers.UmapHandler import UmapHandler
val_postprocessing = Compose([EnsureTyped(keys=CommonKeys.PRED)])
evaluator = SupervisedEvaluator(
device=device,
val_data_loader=dl_val,
network=model,
val_handlers=[
UmapHandler(model=model, feature_layer_name=feature_layer_name, umap_dir=out_path, summary_writer=writer,
output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])),
LinearProbeHandler(model=model, feature_layer_name=feature_layer_name, out_dir=out_path, summary_writer=writer,
output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL])),
],
key_val_metric={
"val_acc": Accuracy(output_transform=from_engine([CommonKeys.PRED, CommonKeys.LABEL]))
},
postprocessing=val_postprocessing,
)
Recreating the results from the paper
First, follow the instructions at https://github.com/uit-hdl/code-overfit-detection-framework.
This will produce embeddings in the out/ folder. Then you can run the following:
```bash
Creating a fine-tuned phikon model to do disease-classification on TCGA-LUSC
ipython examples/usecaselinearprobe.py -- --embeddings-path out/phikonTCGALUSC-tilesembedding.zarr/ --label-file out/tcga-tile-annotations.csv --label-key disease --out-dir outphikonlp_disease --epochs 20 --batch-size 256
ipython examples/evaluatelp.py -- --embeddings-path out/phikonCPTAC-tilesembedding.zarr/ --label-file out/cptac-tile-annotations.csv --label-key disease --out-dir outphikonlpdisease --model-dir outphikonlp_disease --tensorboard-name cptac ```
Owner
- Name: Health Data Lab
- Login: uit-hdl
- Kind: organization
- Location: Norway
- Website: http://hdl.cs.uit.no/
- Repositories: 22
- Profile: https://github.com/uit-hdl
Open source projects by the Health Data Lab at UiT
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
- family-names: Anders Sildnes
given-names: Nikita Shvetsov
- family-names: Masoud Tafavvoghi
given-names: Vi Ngoc-Nha Tran
- family-names: Kajsa Møllersen
given-names: Lill-Tove Rasmussen Busund
- family-names: Thomas K. Kilvær
given-names: Lars Ailo Bongo
title: Open-source framework for detecting bias and overfitting for large pathology images
version: 1.0.0
url: https://arxiv.org/abs/2503.01827
date-released: '2025-03-07'
preferred-citation:
authors:
- family-names: Anders Sildnes
given-names: Nikita Shvetsov
- family-names: Masoud Tafavvoghi
given-names: Vi Ngoc-Nha Tran
- family-names: Kajsa Møllersen
given-names: Lill-Tove Rasmussen Busund
- family-names: Thomas K. Kilvær
given-names: Lars Ailo Bongo
title: Open-source framework for detecting bias and overfitting for large pathology images
url: https://arxiv.org/abs/2503.01827
type: generic
year: '2025'
conference: {}
publisher: {}
GitHub Events
Total
- Push event: 8
- Public event: 1
- Create event: 1
Last Year
- Push event: 8
- Public event: 1
- Create event: 1