https://github.com/bowang-lab/ecg-fm

An electrocardiogram analysis foundation model.

https://github.com/bowang-lab/ecg-fm

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary

Keywords

electrocardiogram foundation-models healthcare machine-learning mimic-iv-ecg physionet2021 transformer
Last synced: 6 months ago · JSON representation

Repository

An electrocardiogram analysis foundation model.

Basic Info
  • Host: GitHub
  • Owner: bowang-lab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 20.8 MB
Statistics
  • Stars: 147
  • Watchers: 5
  • Forks: 10
  • Open Issues: 2
  • Releases: 0
Topics
electrocardiogram foundation-models healthcare machine-learning mimic-iv-ecg physionet2021 transformer
Created almost 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md



MIT License arxiv

ECG-FM is a foundation model for electrocardiogram (ECG) analysis. Committed to open-source practices, ECG-FM was developed in collaboration with the fairseq_signals framework, which implements a collection of deep learning methods for ECG analysis. This repository serves as a landing page and will host project-specific scripts as this work progresses.

Getting Started

🛠️ Installation

Clone fairseq_signals and refer to the requirements and installation section in the top-level README.

🚀 Quick Start

Please refer to our inference quickstart tutorial, which outlines inference and visualization pipelines.

📦 Model

Model checkpoints have been made publicly available for download on HuggingFace. Specifically, there is:

mimic_iv_ecg_physionet_pretrained.pt - Pretrained on MIMIC-IV-ECG v1.0 and PhysioNet 2021 v1.0.3.

mimic_iv_ecg_finetuned.pt - Finetuned from mimic_iv_ecg_physionet_pretrained.pt on MIMIC-IV-ECG v1.0 dataset.

ECG-FM has 90.9 million parameters, adopts the wav2vec 2.0 architecture, and was pretrained using the W2V+CMSC+RLM (WCR) method. Further details are available in our paper.

🫀 Data Preparation

We implemented a flexible, end-to-end, multi-source data preprocessing pipeline. Please refer to it here.

⚙️ Command-line Usage

The command-line inference tutorial describes the result extraction and post-processing. There is also a script for performing linear probing experiments.

All training is performed through the fairseq_signals framework. To maximize reproducibility, we have provided configuration files.

Pretraining

Our pretraining uses the mimic_iv_ecg_physionet_pretrained.yaml config (can modify w2vcmscrlm.yaml as desired).

After modifying the relevant configuration file as desired, pretraining is performed using hydra's command line interface. This command highlights some popular config overrides: ``` FAIRSEQSIGNALSROOT="" MANIFESTDIR="/cmsc" OUTPUTDIR=""

fairseq-hydra-train \ task.data=$MANIFESTDIR \ dataset.validsubset=valid \ dataset.batchsize=64 \ dataset.numworkers=10 \ dataset.disablevalidation=false \ distributedtraining.distributedworldsize=4 \ optimization.updatefreq=[2] \ checkpoint.savedir=$OUTPUTDIR \ checkpoint.saveinterval=10 \ checkpoint.keeplastepochs=0 \ common.logformat=csv \ --config-dir $FAIRSEQSIGNALSROOT/examples/w2vcmsc/config/pretraining \ --config-name w2vcmscrlm ```

Notes: - With CMSC pretraining, the batch size refers to pairs of adjacent segments. Therefore, the effective pretraining batch size is 64 pairs * 2 segments per pair * 4 GPUs * 2 gradient accumulations (update_freq) = 1024 segments.

Finetuning

Our finetuning uses the mimic_iv_ecg_finetuned.yaml config (can modify diagnosis.yaml as desired).

This command highlights some popular config overrides: ``` FAIRSEQSIGNALSROOT="" PRETRAINEDMODEL="" MANIFESTDIR="" LABELDIR="" OUTPUTDIR="" NUMLABELS=$(($(wc -l < "$LABELDIR/labeldef.csv") - 1)) POSWEIGHT=$(cat $LABELDIR/posweight.txt)

fairseq-hydra-train \ task.data=$MANIFESTDIR \ model.modelpath=$PRETRAINEDMODEL \ model.numlabels=$NUMLABELS \ optimization.lr=[1e-06] \ optimization.maxepoch=140 \ dataset.batchsize=256 \ dataset.numworkers=5 \ dataset.disablevalidation=true \ distributedtraining.distributedworldsize=1 \ distributedtraining.findunusedparameters=True \ checkpoint.savedir=$OUTPUTDIR \ checkpoint.saveinterval=1 \ checkpoint.keeplastepochs=0 \ common.logformat=csv \ +task.labelfile=$LABELDIR/y.npy \ +criterion.posweight=$POSWEIGHT \ --config-dir $FAIRSEQSIGNALSROOT/examples/w2vcmsc/config/finetuning/ecg_transformer \ --config-name diagnosis ```

🏷️ Labeler

Functionality for our comphensive free-text pattern matching and knowledge graph-based label manipulation is showcased in the labeler.ipynb notebook.

💬 Questions

Inquiries may be directed to kaden.mckeen@mail.utoronto.ca.

Owner

  • Name: WangLab @ U of T
  • Login: bowang-lab
  • Kind: organization
  • Location: 190 Elizabeth St, Toronto, ON M5G 2C4 Canada

BoWang's Lab at University of Toronto

GitHub Events

Total
  • Issues event: 16
  • Watch event: 97
  • Issue comment event: 15
  • Push event: 4
  • Fork event: 11
Last Year
  • Issues event: 16
  • Watch event: 97
  • Issue comment event: 15
  • Push event: 4
  • Fork event: 11

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 15
  • Total Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 15
  • Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Kaden McKeen m****l@h****m 15

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 11
  • Total pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Total issue authors: 10
  • Total pull request authors: 0
  • Average comments per issue: 1.18
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 10
  • Pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Issue authors: 9
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • FUminlee (2)
  • jhargun (1)
  • jihyeheo (1)
  • arnovonkietzell (1)
  • ndhuynh02 (1)
  • Anmisdwx (1)
  • DeweiHu (1)
  • Nidhogg-Mzy (1)
  • turkalpmd (1)
  • mystic-technolab (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels