mosaic
A framework for training WSI-level CoCa models for computational pathology
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
A framework for training WSI-level CoCa models for computational pathology
Basic Info
- Host: GitHub
- Owner: SanderMoon
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.01 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MOSAIC
MOSAIC (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. The code has been released as part of the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025). The exact version of the code used in the paper is available in the 0.1.0 tag. Please note that this repository contains more content than described in the associated paper. Specifically, there are additional model definitions based on HIPT, including 'attention' model configurations that can be used with features extracted using HIPT.
Table of Contents
Installation
Prerequisites
- Python >= 3.10
- CUDA-compatible GPU (recommended for training, but CPU is supported)
Install from Source
- Clone the repository:
bash
git clone https://github.com/SanderMoon/MOSAIC.git
cd MOSAIC
- Install the package:
bash
pip install -e .
- Install additional dependencies:
```bash
Install pycocoevalcap manually (not available on PyPI)
pip install git+https://github.com/salaniz/pycocoevalcap.git ```
Development Installation
For development, install with additional development dependencies:
bash
pip install -e ".[dev]"
Quick Start
Slide-Level Inference with Text Generation
For pathology slide analysis with text generation, you can process multiple slide feature files:
```python from mosaic.modelfactory import createmodel, load_pretrained import torch import os
Model configuration
modelname = "cocastage2perceiverfrozenuni" pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt" # Update filename for other models device = "cpu" # or "cuda" if available
Create model and tokenizer
model, tokenizer, amp, inputdtype = createmodel( modelname=modelname, pretrained=None, precision="bf16", device=device, init_tokenizer=True, )
Load pretrained weights
loadpretrained(model, pretrained=pretrainedpath, device=device)
def loadfeaturesfrompth(filepath: str) -> torch.Tensor: """ Load features from a .pth file with nested dictionary structure.
Returns:
torch.Tensor: Features of shape [1, N, D] where N is number of patches
"""
data = torch.load(file_path, map_location=device)
features_list = []
# Extract features from nested structure: {level: {patch_id: {'feature': tensor}}}
for level_key in data.keys():
level_data = data[level_key]
for patch_id in sorted(level_data.keys()):
if "feature" in level_data[patch_id]:
feature = level_data[patch_id]["feature"]
if not isinstance(feature, torch.Tensor):
feature = torch.tensor(feature)
features_list.append(feature.to(device))
if features_list:
stacked_features = torch.stack(features_list, dim=0)
return stacked_features.unsqueeze(0)
else:
raise ValueError(f"No features found in {file_path}")
Generation parameters
generationparams = { "seqlen": 128, "maxseqlen": 128, "temperature": 1.0, "generationtype": "topk", "topk": 1, "minseqlen": 5, "repetitionpenalty": 1.1, }
Process slide features
slidepath = "data/nevuscase147100.pth" # Example slide visualfeatures = loadfeaturesfrompth(slide_path)
model.eval() with torch.nograd(): # Generate pathology report generatedids = model.generate( image=visualfeatures, sottokenid=tokenizer.allspecialids[0], # Start of text token eostokenid=tokenizer.allspecialids[1], # End of text token padtokenid=tokenizer.allspecialids[3], # Padding token **generationparams, )
# Decode generated text
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated Report: {generated_text.strip()}")
```
Pre-trained Models
Pre-trained MOSAIC models are available on Hugging Face at SaltySander/MOSAIC. Three model variants are available:
- mosaic-perceiver-biogpt-lora.pt - LoRA fine-tuned model
- mosaic-perceiver-biogpt-frozen.pt - Frozen backbone model
- mosaic-perceiver-biogpt-unfrozen.pt - Fully fine-tuned model
Downloading Model Checkpoints
The models require access permission. Please request access to the repository first, then set your Hugging Face token:
```bash
Set your Hugging Face token
export HFTOKEN=yourhuggingfacetokenhere
Install huggingface_hub CLI
pip install huggingface_hub[cli]
Download the LoRA model (change filename for other models)
huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False ```
Or download manually by visiting the Hugging Face repository and downloading the desired checkpoint file to your local checkpoints/ directory.
Training
Basic Training Command
Here's an example training command with the main parameters:
bash
python main.py \
--model=coca_stage_2_perceiver_lora_uni \
--pretrained=path/to/pretrained/model.pt \
--train-split=path/to/train_ids.txt \
--test-split=path/to/test_ids.txt \
--val-split=path/to/val_ids.txt \
--logs=path/to/logs \
--text-data-file=path/to/reports.json \
--root-dir=path/to/features \
--log-local \
--workers=8 \
--batch-size=4 \
--accum-freq=1 \
--epochs=30 \
--lr=1e-4 \
--beta1=0.9 \
--beta2=0.999 \
--eps=1.0e-8 \
--wd=1e-6 \
--lr-scheduler=cosine \
--warmup=600 \
--precision=pure_bf16 \
--image-features-cutoff=100000 \
--report-to=tensorboard \
--log-every-n-steps=1 \
--seed=42 \
--coca-caption-loss-weight=2.0 \
--coca-contrastive-loss-weight=1.0 \
--device=cuda \
--dist-backend=nccl \
--local-loss \
--gather-with-grad \
--save-frequency=1 \
--val-frequency=1 \
--caption-val-freq=1 \
--eval-grace-period=5 \
--caption-val-max-seq-len=256 \
--val-gen-top-k=1 \
--zsc-specimen-class-mapping path/to/class_mappings.json \
--zsc-class-prompt-mapping path/to/prompt_mappings.json \
--eval-metric-ci=0.95 \
--eval-metric-bootstraps=1 \
--test
For detailed training parameters and configuration options, see the training documentation.
Documentation
This repository contains additional documentation to help you get started:
- Model Configurations: Learn how model configurations are structured and defined
- Dataset Structure: Understand the required data structure for training
- Training Guide: Detailed training parameters and options
Citation
If you use this software in your research, please cite our paper:
BibTeX
bibtex
@misc{lucassen2025pathologyreportgenerationmultimodal,
title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
year={2025},
eprint={2502.19293},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.19293},
}
APA Style
Lucassen, R. T., Moonemans, S. P. J., van de Luijtgaarden, T., Breimer, G. E., Blokx, W. A. M., & Veta, M. (2025). Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions. arXiv preprint arXiv:2502.19293.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributing
We welcome contributions! Please feel free to submit issues and pull requests.
Contact
For questions or support, please contact:
- Sander Moonemans: sander.moonemans@gmail.com
This work was developed as part of research into computational pathology and vision-language models for medical image analysis.
Owner
- Name: Sander Moonemans
- Login: SanderMoon
- Kind: user
- Location: Nijmegen
- Repositories: 11
- Profile: https://github.com/SanderMoon
Hi, I'm Sander, a passionate engineer interested in data and AI. Currently studying a MSc. in Data Science & AI at TU/e .
Citation (CITATION)
If you use this software in your research, please cite:
@misc{lucassen2025pathologyreportgenerationmultimodal,
title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
year={2025},
eprint={2502.19293},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.19293},
}
GitHub Events
Total
- Watch event: 1
- Push event: 4
- Pull request event: 2
- Fork event: 1
- Create event: 2
Last Year
- Watch event: 1
- Push event: 4
- Pull request event: 2
- Fork event: 1
- Create event: 2
Dependencies
- actions/cache v3 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- evaluate ==0.4.3
- expecttest ==0.2.1
- h5py ==3.12.1
- language-tool-python ==2.8.1
- lorem-text ==2.1
- matplotlib ==3.9.2
- nltk ==3.9.1
- numpy ==2.0.1
- pandas ==2.2.2
- peft ==0.12.0
- rouge-score ==0.1.2
- sacremoses ==0.1.1
- scikit-learn ==1.5.1
- tensorboard ==2.17.1
- torch ==2.4.1
- transformers ==4.44.1
- black ==24.8.0
- debugpy ==1.8.2
- evaluate ==0.4.3
- expecttest ==0.2.1
- h5py ==3.12.1
- language-tool-python ==2.8.1
- lorem-text ==2.1
- matplotlib ==3.9.2
- nltk ==3.9.1
- numpy ==2.0.1
- pandas ==2.2.2
- peft ==0.12.0
- pytest ==8.3.2
- pytest-cov ==5.0.0
- rouge-score ==0.1.2
- sacremoses ==0.1.1
- scikit-learn ==1.5.1
- tensorboard ==2.17.1
- torch ==2.4.1
- transformers ==4.44.1