proscia-ai-tools

Proscia AI tools

https://github.com/proscia/proscia-ai-tools

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, sciencedirect.com, nature.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Proscia AI tools

Basic Info

Host: GitHub
Owner: Proscia
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 14.7 MB

Statistics

Stars: 31
Watchers: 7
Forks: 2
Open Issues: 0
Releases: 1

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme License Citation Codeowners

Foundation Models for digital pathology at your fingertips

In most computer vision fields, the challenge lies in algorithm development, and accessing images is straightforward. But in computational pathology, data scientists face unique hurdles. Simple tasks like storing, manipulating and loading images can be a time sink and source of frustration.

The challenges in computational pathology are many. Scanner vendors use proprietary file formats. Loading whole slide image (WSI) files from multiple vendors requires various code packages that are not well maintained. WSI files are storage intensive, holding gigabytes of data per file. Processing high magnification WSI files for downstream deep learning workflows requires cropping WSI files into many smaller images, turning a single WSI file into sometimes thousands of individual data products to track and maintain. Only after overcoming these hurdles and more can a data scientist start to build a model for the important task at hand– whether that’s detecting a specific biomarker, identifying mitoses, classifying a tumor type or any other of countless uses for AI in pathology.

However, with Proscia’s innovative Concentriq® Embeddings, these challenges are becoming relics of the past. AI development has recently pivoted from developing one-off models specifically tailored to specific tasks towards developing universal feature extractors, known as "foundation models", that learn representations from vast amounts of unlabeled data. With the success of language foundation models that power applications like ChatGPT, computer vision has followed suit with models like DINO and ConvNext, recognizing the immense potential of foundation models for accelerating downstream task-specific development. Now, the state of the art approach to many computer vision tasks in medical image analysis is to start by computing embeddings from images using a foundation model. Concentriq® Embeddings is now the first step in any computational pathology endeavor, leveraging vision foundation models to extract vital visual features from histopathology scans, turning cumbersome image files into standardized, lightweight feature vectors known as embeddings. Concentriq® Embeddings not only simplifies the process but also accelerates the development workflow in digital histopathology.

With Concentriq® Embeddings, Proscia is putting the power of foundation model embedding at developers’ fingertips to accelerate the digital histopathology image analysis workflow.

Transforming Pathology with Concentriq® Embeddings

Proscia is proud to announce Concentriq® Embeddings, a seamless extension of our Concentriq® LS platform. This tool is designed specifically for pharmaceutical companies, biotech companies, CROs, and academic research organizations to foster image-based research without the traditional barriers. Concentriq® Embeddings is a backend service that provides foundation model embeddings from any slide in Concentriq® LS. The service extracts rich visual features at any magnification, promptly providing access to the visual information in the slide, at a greatly compressed memory footprint.

Through Concentriq® Embeddings, developers can access some of the most widely used foundation models, and Proscia plans to continue adding to the list of supported foundation models. Instead of wading through the ever-growing and dense literature attempting to crown a “best” foundation model for pathology, Concentriq® Embeddings allows researchers to easily try out many feature extractors on a downstream task, and future-proofs for the inevitably better foundation models of tomorrow.

Currently supported foundation models include:

DinoV2
- Model Tag: facebook/dinov2-base
- Patch Size: 224
- Embedding Dimension: 768
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
PLIP
- Model Tag: vinid/plip
- Patch Size: 224
- Embedding Dimension: 512
- License: MIT
- 🤗 HuggingFace page
- Paper
ConvNext
- Model Tag: facebook/convnext-base-384-22k-1k
- Patch Size: 384
- Embedding Dimension: 1024
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
CTransPath
- Model Tag: 1aurent/swin_tiny_patch4_window7_224.CTransPath
- Patch Size: 224
- Embedding Dimension: 768
- License: GNU GPLv3
- 🤗 HuggingFace page
- Paper
H-optimus-0
- Model Tag: bioptimus/H-optimus-0
- Patch Size: 224
- Embedding Dimension: 1536
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
Virchow
- Model Tag: paige-ai/Virchow
- Patch Size: 224
- Embedding Dimension: 2560
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper

Computational pathology development finally has a straightforward workflow

Concentriq® Embeddings revolutionizes the AI development process for everything from selecting WSIs to extracting features from them, making algorithm development more straightforward than ever.

Before:

Data scientists juggled countless non-standardized WSI file formats and struggled with often poorly-maintained code packages for accessing whole slide image (WSI) files.

Now with Concentriq® Embeddings:

Forget about OpenSlide, proprietary SDKs from scanner vendors, and OpenPhi. Concentriq® LS and Concentriq® Embeddings are all you need for your AI development.

Before:

Training models for pathology images required downloading large WSI files and extensive storage capacity since each file often exceeds several gigabytes. This often produced more data than could be accommodated by a standard laptop hard drive. Furthermore, downloading such substantial amounts of data on a typical internet connection could take several hours or even days, significantly slowing down the research workflow and delaying critical advancements without costly specialty infrastructure.

Now with Concentriq® Embeddings:

Rather than managing slides that consume gigabytes of memory, data scientists and researchers now interact with lightweight feature representations that occupy just a few megabytes. For a concrete example: an RGB WSI crop of 512 pixels on each side contains 512x512x3= 786,432 unsigned 8-bit integers, or 786,432 bytes. In contrast, a Vision Transformer (ViT) feature vector (embedding) of this crop contains 768 floats at 4 bytes apiece, for 3,072 bytes. The feature vector is a compressed representation of the image, with a compression rate of 256! This means a 1 Gb WSI becomes less than 4 Mb.

Before:

Preparing high magnification WSI files for downstream deep learning workflows involved cropping WSI files into many smaller images, turning a single WSI file into sometimes thousands of individual data products to track and maintain.

Now with Concentriq® Embeddings:

Data bookkeeping is greatly simplified. Concentriq® Embeddings tiles each slide and returns a single safetensor file per slide containing the embeddings. Even though the slide’s visual information is contained in a single convenient file, Concentriq® Embeddings provides an interface for loading feature vectors from individual crops into memory.

This is how simple model development can be.

Discover the efficiency of the Concentriq® Embeddings workflow.

```python from prosciaaitools.client import ClientWrapper as Client

ceapiclient = Client(url=endpoint, email=email, password=pwd) ticketid = ceapiclient.embedrepos(ids=[1234], model="bioptimus/H-optimus-0", mpp=1) embeddings = ceapiclient.getembeddings(ticketid) ```

Setup

Quickstart

bash pip install git+https://github.com/Proscia/proscia-ai-tools.git

Owner

Name: Proscia
Login: Proscia
Kind: organization

Repositories: 5
Profile: https://github.com/Proscia

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Baatz
    given-names: Jeff
    email: jeff.baatz@proscia.com
  - family-names: Chivers
    given-names: Corey
    orcid: https://orcid.org/0000-0001-7290-2183
    email: corey.chivers@proscia.com
  - family-names: Ianni
    given-names: Julianna
    email: julianna@proscia.com
  - family-names: Spurrier
    given-names: Vaughn
    email: vaughn.spurrier@proscia.com
  - family-names: Toulgaridis
    given-names: Kyriakos
    email: kyriakos.toulgaridis@proscia.com
  - family-names: Wahl
    given-names: Casey
    email: casey.wahl@proscia.com

title: "Proscia AI Tools"
version: 0.0.1
date-released: 2024-09-30

GitHub Events

Total

Release event: 1
Watch event: 6
Delete event: 16
Issue comment event: 2
Push event: 24
Pull request review event: 17
Pull request event: 27
Fork event: 3
Create event: 18

Last Year

Release event: 1
Watch event: 6
Delete event: 16
Issue comment event: 2
Push event: 24
Pull request review event: 17
Pull request event: 27
Fork event: 3
Create event: 18

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 10
Average time to close issues: N/A
Average time to close pull requests: about 20 hours
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 10
Average time to close issues: N/A
Average time to close pull requests: about 20 hours
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

proscia-ai-tools

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Foundation Models for digital pathology at your fingertips

Transforming Pathology with Concentriq® Embeddings

Computational pathology development finally has a straightforward workflow

This is how simple model development can be.

Setup

Quickstart

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels