Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, sciencedirect.com, nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Repository
Proscia AI tools
Basic Info
- Host: GitHub
- Owner: Proscia
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 14.7 MB
Statistics
- Stars: 31
- Watchers: 7
- Forks: 2
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Foundation Models for digital pathology at your fingertips
In most computer vision fields, the challenge lies in algorithm development, and accessing images is straightforward. But in computational pathology, data scientists face unique hurdles. Simple tasks like storing, manipulating and loading images can be a time sink and source of frustration.
The challenges in computational pathology are many. Scanner vendors use proprietary file formats. Loading whole slide image (WSI) files from multiple vendors requires various code packages that are not well maintained. WSI files are storage intensive, holding gigabytes of data per file. Processing high magnification WSI files for downstream deep learning workflows requires cropping WSI files into many smaller images, turning a single WSI file into sometimes thousands of individual data products to track and maintain. Only after overcoming these hurdles and more can a data scientist start to build a model for the important task at hand– whether that’s detecting a specific biomarker, identifying mitoses, classifying a tumor type or any other of countless uses for AI in pathology.
However, with Proscia’s innovative Concentriq® Embeddings, these challenges are becoming relics of the past. AI development has recently pivoted from developing one-off models specifically tailored to specific tasks towards developing universal feature extractors, known as "foundation models", that learn representations from vast amounts of unlabeled data. With the success of language foundation models that power applications like ChatGPT, computer vision has followed suit with models like DINO and ConvNext, recognizing the immense potential of foundation models for accelerating downstream task-specific development. Now, the state of the art approach to many computer vision tasks in medical image analysis is to start by computing embeddings from images using a foundation model. Concentriq® Embeddings is now the first step in any computational pathology endeavor, leveraging vision foundation models to extract vital visual features from histopathology scans, turning cumbersome image files into standardized, lightweight feature vectors known as embeddings. Concentriq® Embeddings not only simplifies the process but also accelerates the development workflow in digital histopathology.
With Concentriq® Embeddings, Proscia is putting the power of foundation model embedding at developers’ fingertips to accelerate the digital histopathology image analysis workflow.
Transforming Pathology with Concentriq® Embeddings
Proscia is proud to announce Concentriq® Embeddings, a seamless extension of our Concentriq® LS platform. This tool is designed specifically for pharmaceutical companies, biotech companies, CROs, and academic research organizations to foster image-based research without the traditional barriers. Concentriq® Embeddings is a backend service that provides foundation model embeddings from any slide in Concentriq® LS. The service extracts rich visual features at any magnification, promptly providing access to the visual information in the slide, at a greatly compressed memory footprint.
Through Concentriq® Embeddings, developers can access some of the most widely used foundation models, and Proscia plans to continue adding to the list of supported foundation models. Instead of wading through the ever-growing and dense literature attempting to crown a “best” foundation model for pathology, Concentriq® Embeddings allows researchers to easily try out many feature extractors on a downstream task, and future-proofs for the inevitably better foundation models of tomorrow.
Currently supported foundation models include:
- DinoV2
- Model Tag:
facebook/dinov2-base - Patch Size: 224
- Embedding Dimension: 768
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
- Model Tag:
- PLIP
- Model Tag:
vinid/plip - Patch Size: 224
- Embedding Dimension: 512
- License: MIT
- 🤗 HuggingFace page
- Paper
- Model Tag:
- ConvNext
- Model Tag:
facebook/convnext-base-384-22k-1k - Patch Size: 384
- Embedding Dimension: 1024
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
- Model Tag:
- CTransPath
- Model Tag:
1aurent/swin_tiny_patch4_window7_224.CTransPath - Patch Size: 224
- Embedding Dimension: 768
- License: GNU GPLv3
- 🤗 HuggingFace page
- Paper
- Model Tag:
- H-optimus-0
- Model Tag:
bioptimus/H-optimus-0 - Patch Size: 224
- Embedding Dimension: 1536
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
- Model Tag:
- Virchow
- Model Tag:
paige-ai/Virchow - Patch Size: 224
- Embedding Dimension: 2560
- License: Apache-2.0
- 🤗 HuggingFace page
- Paper
- Model Tag:
Computational pathology development finally has a straightforward workflow
Concentriq® Embeddings revolutionizes the AI development process for everything from selecting WSIs to extracting features from them, making algorithm development more straightforward than ever.
Before:
Data scientists juggled countless non-standardized WSI file formats and struggled with often poorly-maintained code packages for accessing whole slide image (WSI) files.
Now with Concentriq® Embeddings:
Forget about OpenSlide, proprietary SDKs from scanner vendors, and OpenPhi. Concentriq® LS and Concentriq® Embeddings are all you need for your AI development.
Before:
Training models for pathology images required downloading large WSI files and extensive storage capacity since each file often exceeds several gigabytes. This often produced more data than could be accommodated by a standard laptop hard drive. Furthermore, downloading such substantial amounts of data on a typical internet connection could take several hours or even days, significantly slowing down the research workflow and delaying critical advancements without costly specialty infrastructure.
Now with Concentriq® Embeddings:
Rather than managing slides that consume gigabytes of memory, data scientists and researchers now interact with lightweight feature representations that occupy just a few megabytes. For a concrete example: an RGB WSI crop of 512 pixels on each side contains 512x512x3= 786,432 unsigned 8-bit integers, or 786,432 bytes. In contrast, a Vision Transformer (ViT) feature vector (embedding) of this crop contains 768 floats at 4 bytes apiece, for 3,072 bytes. The feature vector is a compressed representation of the image, with a compression rate of 256! This means a 1 Gb WSI becomes less than 4 Mb.
Before:
Preparing high magnification WSI files for downstream deep learning workflows involved cropping WSI files into many smaller images, turning a single WSI file into sometimes thousands of individual data products to track and maintain.
Now with Concentriq® Embeddings:
Data bookkeeping is greatly simplified. Concentriq® Embeddings tiles each slide and returns a single safetensor file per slide containing the embeddings. Even though the slide’s visual information is contained in a single convenient file, Concentriq® Embeddings provides an interface for loading feature vectors from individual crops into memory.
This is how simple model development can be.
Discover the efficiency of the Concentriq® Embeddings workflow.
```python from prosciaaitools.client import ClientWrapper as Client
ceapiclient = Client(url=endpoint, email=email, password=pwd) ticketid = ceapiclient.embedrepos(ids=[1234], model="bioptimus/H-optimus-0", mpp=1) embeddings = ceapiclient.getembeddings(ticketid) ```
Setup
Quickstart
bash
pip install git+https://github.com/Proscia/proscia-ai-tools.git
Owner
- Name: Proscia
- Login: Proscia
- Kind: organization
- Repositories: 5
- Profile: https://github.com/Proscia
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Baatz
given-names: Jeff
email: jeff.baatz@proscia.com
- family-names: Chivers
given-names: Corey
orcid: https://orcid.org/0000-0001-7290-2183
email: corey.chivers@proscia.com
- family-names: Ianni
given-names: Julianna
email: julianna@proscia.com
- family-names: Spurrier
given-names: Vaughn
email: vaughn.spurrier@proscia.com
- family-names: Toulgaridis
given-names: Kyriakos
email: kyriakos.toulgaridis@proscia.com
- family-names: Wahl
given-names: Casey
email: casey.wahl@proscia.com
title: "Proscia AI Tools"
version: 0.0.1
date-released: 2024-09-30
GitHub Events
Total
- Release event: 1
- Watch event: 6
- Delete event: 16
- Issue comment event: 2
- Push event: 24
- Pull request review event: 17
- Pull request event: 27
- Fork event: 3
- Create event: 18
Last Year
- Release event: 1
- Watch event: 6
- Delete event: 16
- Issue comment event: 2
- Push event: 24
- Pull request review event: 17
- Pull request event: 27
- Fork event: 3
- Create event: 18
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: about 20 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: about 20 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- corey-chivers-proscia (15)
- vaughnproscia (1)