FuseMedML

FuseMedML: a framework for accelerated discovery in machine learning based biomedicine - Published in JOSS (2023)

https://github.com/biomedsciai/fuse-med-ml

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision

Scientific Fields

Economics Social Sciences - 85% confidence

Last synced: 6 months ago · JSON representation

Repository

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

Basic Info

Host: GitHub
Owner: BiomedSciAI
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 104 MB

Statistics

Stars: 149
Watchers: 11
Forks: 37
Open Issues: 36
Releases: 21

Topics

ai cmmd collaboration ct deep-learning fuse fuse-med-ml fusemedml hacktoberfest healthcare isic knight-challenge machine-learning medical medical-imaging multimodality python pytorch stoic vision

Created over 4 years ago · Last pushed 7 months ago

Metadata Files

Readme Contributing License Code of conduct

Effective Code Reuse across ML projects!

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

FuseMedML is part of the PyTorch Ecosystem.

Motivation - "Oh, the pain!"

Analyzing many ML research projects we discovered that * Projects bring up is taking far too long, even when very similar projects were already done in the past by the same lab! * Porting individual components across projects was painful - resulting in "reinventing the wheel" time after time

How the magic happens

1. A simple yet super effective design concept

Data is kept in a nested (hierarchical) dictionary

This is a key aspect in FuseMedML (shortly named as "fuse"). It's a key driver of flexibility, and allows to easily deal with multi modality information. ```python from fuse.utils import NDict

samplendict = NDict() samplendict['input.mri'] = # ... samplendict['input.ctviewa'] = # ... samplendict['input.ctviewb'] = # ... samplendict['groundtruth.diseaselevel_label'] = # ... ```

This data can be a single sample, it can be for a minibatch, for an entire epoch, or anything that is desired. The "nested key" ("a.b.c.d.etc') is called "path key", as it can be seen as a path inside the nested dictionary.

Components are written in a way that allows to define input and output keys, to be read and written from the nested dict See a short introduction video (3 minutes) to how FuseMedML components work:

https://user-images.githubusercontent.com/7043815/177197158-d3ea0736-629e-4dcb-bd5e-666993fbcfa2.mp4

Examples - using FuseMedML-style components

A multi head model FuseMedML style component, allows easy reuse across projects:

python ModelMultiHead( conv_inputs=(('data.input.img', 1),), # input to the backbone model backbone=BackboneResnet3D(in_channels=1), # PyTorch nn Module heads=[ # list of heads - gives the option to support multi task / multi head approach Head3D(head_name='classification', mode="classification", conv_inputs=[("model.backbone_features", 512)] # Input to the classification head ,), ] )

Our default loss implementation - creates an easy wrap around a callable function, while being FuseMedML style python LossDefault( pred='model.logits.classification', # input - model prediction scores target='data.label', # input - ground truth labels callable=torch.nn.functional.cross_entropy # callable - function that will get the prediction scores and labels extracted from batch_dict and compute the loss )

An example metric that can be used python MetricAUCROC( pred='model.output', # input - model prediction scores target='data.label' # input - ground truth labels )

Note that several components return answers directly and not write it into the nested dictionary. This is perfectly fine, and to allow maximum flexibility we do not require any usage of output path keys.

Creating a custom FuseMedML component

Creating custom FuseMedML components is easy - in the following example we add a new data processing operator:

A data pipeline operator ```python class OpPad(OpBase): def call(self, sampledict: NDict, keyin: str, padding: List[int], fill: int = 0, mode: str = 'constant', key_out:Optional[str]=None, ):

    # we extract the element in the defined key location (for example 'input.xray_img')
    img = sample_dict[key_in]
    assert isinstance(img, np.ndarray), f'Expected np.ndarray but got {type(img)}'
    processed_img = np.pad(img, pad_width=padding, mode=mode, constant_values=fill)

    # store the result in the requested output key (or in key_in if no key_out is provided)
    key_out = key_in if key_out is None
    sample_dict[key_out] = processed_img

    # returned the modified nested dict
    return sample_dict

```

Since the key location isn't hardcoded, this module can be easily reused across different research projects with very different data sample structures. More code reuse - Hooray!

FuseMedML-style components in general are any classes or functions that define which key paths will be written and which will be read. Arguments can be freely named, and you don't even have to write anything to the nested dict. Some FuseMedML components return a value directly - for example, loss functions.

2. "Batteries included" key components, built using the same design concept

fuse.data - A declarative super flexible data processing pipeline

Easy dealing with complex multi modality scenario
Advanced caching, including periodic audits to automatically detect stale caches
Default ready-to-use Dataset and Sampler classes
See detailed introduction here

fuse.eval - a standalone library for evaluating ML models (not necessarily trained with FuseMedML)

The package includes collection of off-the-shelf metrics and utilities such as statistical significance tests, calibration, thresholding, model comparison and more. See detailed introduction here

fuse.dl - reusable dl (deep learning) model architecture components, loss functions, etc.

Supported DL libraries

Some components depend on pytorch. For example, fuse.data is oriented towards pytorch DataSet, DataLoader, DataSampler etc. fuse.dl makes heavy usage of pytorch models. Some components do not depend on any specific DL library - for example fuse.eval.

Broadly speaking, the supported DL libraries are: * "Pure" pytorch * pytorch-lightning

Before you ask - pytorch-lightning and FuseMedML play along very nicely and have in practice orthogonal and additive benefits :) See Simple FuseMedML + PytorchLightning Example for simple supervised learning cases, and this example for completely custom usage of pytorch-lightning and FuseMedML - useful for advanced scenarios such as Reinforcement Learning and generative models.

Domain Extensions

fuse-med-ml, the core library, is completely domain agnostic! Domain extensions are optionally installable packages that deal with specific (sub) domains. For example:

fuseimg which was battle-tested in many medical imaging related projects (different organs, imaging modalities, tasks, etc.)
fusedrug (to be released soon) which focuses on molecular biology and chemistry - prediction, generation and more

Domain extensions contain concrete implementation of components and components parts within the relevant domain, for example: * Data pipeline operations - for example, a 3d affine transformation of a 3d image * Evaluation metrics - for example, a custom metric evaluating docking of a potential drug with a protein target * Loss functions - for example, a custom segmentation evaluation loss

The recommended directory structure mimics fuse-med-ml core structure your_package data #everything related to datasets, samplers, data processing pipeline Ops, etc. dl #everything related to deep learning architectures, optimizers, loss functions etc. eval #evaluation metrics utils #any utilities

You are highly encouraged to create additional domain extensions and/or contribute to the existing ones! There's no need to wait for any approval, you can create domain extensions on your own repos right away

Note - in general, we find it helpful to follow the same directory structure shown above even in small and specific research projects that use FuseMedML for consistency and easy landing for newcomers into your project :)

Installation

FuseMedML is tested on Python >= 3.10 and PyTorch >= 2.0

We recommend using a Conda environment

Create a conda environment using the following command (you can replace FUSEMEDML with your preferred enviornment name) bash conda create -n FUSEMEDML python=3.10 conda activate FUSEMEDML

Now one shall install PyTorch and it's corresponding cudatoolkit. See here for the exact command that will suit your local environment. For example: conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

and then do Option 1 or Option 2 below inside the activated conda env

Option 1: Install from source (recommended)

The best way to install FuseMedML is to clone the repository and install it in an editable mode using pip: bash $ pip install -e .[all] This mode installs all the currently publicly available domain extensions - fuseimg as of now, fusedrug will be added soon.

To install FuseMedML with an included collection of examples install it using: bash $ pip install -e .[all,examples]

Option 2: Install from PyPI

bash $ pip install fuse-med-ml[all] or with examples: bash $ pip install fuse-med-ml[all,examples]

Examples

Easy access "Hello World" colab notebook
Classification
- MNIST - a simple example, including training, inference and evaluation over MNIST dataset
- STOIC - severe COVID-19 classifier baseline given a Computed-Tomography (CT), age group and gender. Challenge description
- KNIGHT Challenge - preoperative prediction of risk class for patients with renal masses identified in clinical Computed Tomography (CT) imaging of the kidneys. Including data pre-processing, baseline implementation and evaluation pipeline for the challenge.
- Multimodality tutorial - demonstration of two popular simple methods integrating imaging and clinical data (tabular) using FuseMedML
- Skin Lesion - skin lesion classification , including training, inference and evaluation over the public dataset introduced in ISIC challenge
- Breast Cancer Lesion Classification - lesions classification of tumor ( benign, malignant) in breast mammography over the public dataset introduced in The Chinese Mammography Database (CMMD)
- Mortality prediction for ICU patients - Example of EHR transformer applied to the data of Intensive Care Units patients for in-hospital mortality prediction. The dataset is from PhysioNet Computing in Cardiology Challenge (2012)
Pre-training
- Medical Imaging Pre-training and Downstream Task Validation - pre-training a model on 3D MRI medical imaging and then using it for classification and segmentation downstream tasks.

Walkthrough template

Walkthrough Template - includes several TODO notes, marking the minimal scope of code required to get your pipeline up and running. The template also includes useful explanations and tips.

Community support - join the discussion!

Slack workspace at fusemedml.slack.com for informal communication - click here to join
Github Discussions

Citation

If you use FuseMedML in scientific context, please consider citing our JOSS paper: bibtex @article{Golts2023, doi = {10.21105/joss.04943}, url = {https://doi.org/10.21105/joss.04943}, year = {2023}, publisher = {The Open Journal}, volume = {8}, number = {81}, pages = {4943}, author = {Alex Golts and Moshe Raboh and Yoel Shoshan and Sagi Polaczek and Simona Rabinovici-Cohen and Efrat Hexter}, title = {FuseMedML: a framework for accelerated discovery in machine learning based biomedicine}, journal = {Journal of Open Source Software} }

Owner

Name: BiomedSciAI
Login: BiomedSciAI
Kind: organization

Repositories: 6
Profile: https://github.com/BiomedSciAI

JOSS Publication

FuseMedML: a framework for accelerated discovery in machine learning based biomedicine

Published

January 22, 2023

DOI

10.21105/joss.04943

Volume 8, Issue 81, Page 4943

Authors

Alex Golts
IBM Research - Haifa, Israel

Moshe Raboh
IBM Research - Haifa, Israel

Yoel Shoshan
IBM Research - Haifa, Israel

Sagi Polaczek
IBM Research - Haifa, Israel

Simona Rabinovici-Cohen
IBM Research - Haifa, Israel

Efrat Hexter
IBM Research - Haifa, Israel

Editor

Jacob Schreiber

GitHub Events

Total

Create event: 29
Issues event: 2
Release event: 1
Watch event: 9
Delete event: 10
Member event: 5
Issue comment event: 9
Push event: 88
Pull request review event: 71
Pull request review comment event: 37
Pull request event: 50
Fork event: 4

Last Year

Create event: 29
Issues event: 2
Release event: 1
Watch event: 9
Delete event: 10
Member event: 5
Issue comment event: 9
Push event: 88
Pull request review event: 71
Pull request review comment event: 37
Pull request event: 50
Fork event: 4

Committers

Last synced: 7 months ago

All Time

Total Commits: 479
Total Committers: 38
Avg Commits per committer: 12.605
Development Distribution Score (DDS): 0.745

Past Year

Commits: 47
Committers: 15
Avg Commits per committer: 3.133
Development Distribution Score (DDS): 0.702

Top Committers

Name	Email	Commits
Moshiko Raboh	8****h	122
Alex Golts	a**s@i**m	84
moshiko	m**h@i**m	46
Sagi Polaczek	5****k	40
itaijj	i**2@g**m	24
YoelShoshan	y**n@g**m	23
Liam Hazan	l**n@i**m	18
itaiguez	i**z@i**m	12
Michal Ozery-Flato	1****o	12
#Sagi Polaczek#Sagi.Polaczek@ibm.com	s**p@l**m	11
sagi	s**k@i**m	11
Daniel Shats	d**1@g**m	11
Avihu Dekel	a**2@g**m	8
Ben Shapira	6****7	5
Sivan Ravid	1****s	5
ttlusty	t**y@i**m	4
IdoAmosIBM	1****M	4
simona	s**a@i**m	3
Antonio Foncubierta Rodríguez	a****a	3
Ella Barkan	e**a@i**m	3
Vadim Ratner	9****c	3
shakedpe	4****e	3
vadimra	v**a@i**m	2
Amir Egozi	e**5@g**m	2
Panos Vagenas	3****s	2
Yishai Levine	l**0@g**m	2
agiova	3****a	2
imgbot[bot]	3****]	2
mayhamri	9****i	2
simona-rc	8****c	2
and 8 more...

Committer Domains (Top 20 + Academic)

ibm.com: 6 il.ibm.com: 3 github.ninio.org: 1 lsf-gpu3.haifa.ibm.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 31
Total pull requests: 184
Average time to close issues: 3 months
Average time to close pull requests: 13 days
Total issue authors: 12
Total pull request authors: 25
Average comments per issue: 1.32
Average comments per pull request: 0.46
Merged pull requests: 160
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 59
Average time to close issues: N/A
Average time to close pull requests: 2 days
Issue authors: 1
Pull request authors: 14
Average comments per issue: 0.0
Average comments per pull request: 0.25
Merged pull requests: 53
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

SagiPolaczek (11)
mosheraboh (6)
anupamajha1 (3)
avihu111 (2)
egozi (2)
alex-golts (2)
simona-rc (1)
ellabarkan (1)
sivanravidos (1)
itaijj (1)
Betty-J (1)
smartdanny (1)

Pull Request Authors

mosheraboh (49)
SagiPolaczek (40)
YoelShoshan (23)
alex-golts (15)
michalozeryflato (13)
smartdanny (9)
bensha6757 (9)
sivanravidos (9)
IdoAmosIBM (7)
liamhazan (7)
ellabarkan (6)
itaijj (6)
floccinauc (5)
mayhamri (4)
avihu111 (3)

Top Labels

Issue Labels

bug (13) enhancement (11) good first issue (6) question (2) up-for-grabs (1) wontfix (1) DEV (1) help wanted (1) documentation (1)

Pull Request Labels

enhancement (3) example (1)

Packages

Total packages: 1
Total downloads:
- pypi 396 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 23
Total maintainers: 1

pypi.org: fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

Homepage: https://github.com/BiomedSciAI/fuse-med-ml/
Documentation: https://fuse-med-ml.readthedocs.io/
License: Apache License 2.0
Latest release: 0.4.0
published over 1 year ago

Versions: 23
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 396 Last month

Rankings

Stargazers count: 6.6%

Forks count: 6.8%

Dependent packages count: 10.1%

Average: 11.7%

Downloads: 13.4%

Dependent repos count: 21.6%

Maintainers (1)

mosheraboh

Last synced: 6 months ago

Dependencies

fuse/requirements.txt pypi

deepdiff *
h5py *
hdf5plugin *
hydra-core *
ipykernel *
ipython *
matplotlib >=3.3.3
nibabel *
numpy >=1.18.5
omegaconf *
pandas >=1.2
paramiko *
psutil *
pycocotools >=2.0.1
pytorch_lightning *
scikit-learn >=0.23.2
scipy >=1.5.4
statsmodels *
tables *
tensorboard *
termcolor >=1.1.0
torch >=1.5.0
torchvision >=0.8.1
tqdm >=4.52.0
wget *
xmlrunner *

fuse/requirements_dev.txt pypi

black ==22.3.0
flake8 *
mypy ==0.950
testbook *

fuseimg/requirements.txt pypi

SimpleITK >=1.2.0
medpy *
opencv-python <=4.3.0.36
pydicom *
scikit-image >=0.17.2
torchvision >=0.8.1

.github/workflows/lint.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite
psf/black stable composite
py-actions/flake8 v2 composite

.github/workflows/python-publish.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite

fuse_examples/requirements.txt pypi

medpy *
monai *
pydicom *
scikit-image *
transformers *

setup.py pypi

FuseMedML

Science Score: 93.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Effective Code Reuse across ML projects!

Jump to:

Motivation - "Oh, the pain!"

How the magic happens

1. A simple yet super effective design concept

Data is kept in a nested (hierarchical) dictionary

Examples - using FuseMedML-style components

Creating a custom FuseMedML component

2. "Batteries included" key components, built using the same design concept

fuse.data - A declarative super flexible data processing pipeline

fuse.eval - a standalone library for evaluating ML models (not necessarily trained with FuseMedML)

fuse.dl - reusable dl (deep learning) model architecture components, loss functions, etc.

Supported DL libraries

Domain Extensions

Installation

We recommend using a Conda environment

Option 1: Install from source (recommended)

Option 2: Install from PyPI

Examples

Walkthrough template

Community support - join the discussion!

Citation

Owner

JOSS Publication

FuseMedML: a framework for accelerated discovery in machine learning based biomedicine

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: fuse-med-ml

Rankings

Maintainers (1)

Dependencies