https://github.com/bowang-lab/ecg-fm
An electrocardiogram analysis foundation model.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.8%) to scientific vocabulary
Keywords
Repository
An electrocardiogram analysis foundation model.
Basic Info
Statistics
- Stars: 147
- Watchers: 5
- Forks: 10
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
ECG-FM is a foundation model for electrocardiogram (ECG) analysis. Committed to open-source practices, ECG-FM was developed in collaboration with the fairseq_signals framework, which implements a collection of deep learning methods for ECG analysis. This repository serves as a landing page and will host project-specific scripts as this work progresses.
Getting Started
🛠️ Installation
Clone fairseq_signals and refer to the requirements and installation section in the top-level README.
🚀 Quick Start
Please refer to our inference quickstart tutorial, which outlines inference and visualization pipelines.
📦 Model
Model checkpoints have been made publicly available for download on HuggingFace. Specifically, there is:
mimic_iv_ecg_physionet_pretrained.pt
- Pretrained on MIMIC-IV-ECG v1.0 and PhysioNet 2021 v1.0.3.
mimic_iv_ecg_finetuned.pt
- Finetuned from mimic_iv_ecg_physionet_pretrained.pt on MIMIC-IV-ECG v1.0 dataset.
ECG-FM has 90.9 million parameters, adopts the wav2vec 2.0 architecture, and was pretrained using the W2V+CMSC+RLM (WCR) method. Further details are available in our paper.
🫀 Data Preparation
We implemented a flexible, end-to-end, multi-source data preprocessing pipeline. Please refer to it here.
⚙️ Command-line Usage
The command-line inference tutorial describes the result extraction and post-processing. There is also a script for performing linear probing experiments.
All training is performed through the fairseq_signals framework. To maximize reproducibility, we have provided configuration files.
Pretraining
Our pretraining uses the mimic_iv_ecg_physionet_pretrained.yaml config (can modify w2vcmscrlm.yaml as desired).
After modifying the relevant configuration file as desired, pretraining is performed using hydra's command line interface. This command highlights some popular config overrides:
```
FAIRSEQSIGNALSROOT="
fairseq-hydra-train \ task.data=$MANIFESTDIR \ dataset.validsubset=valid \ dataset.batchsize=64 \ dataset.numworkers=10 \ dataset.disablevalidation=false \ distributedtraining.distributedworldsize=4 \ optimization.updatefreq=[2] \ checkpoint.savedir=$OUTPUTDIR \ checkpoint.saveinterval=10 \ checkpoint.keeplastepochs=0 \ common.logformat=csv \ --config-dir $FAIRSEQSIGNALSROOT/examples/w2vcmsc/config/pretraining \ --config-name w2vcmscrlm ```
Notes:
- With CMSC pretraining, the batch size refers to pairs of adjacent segments. Therefore, the effective pretraining batch size is 64 pairs * 2 segments per pair * 4 GPUs * 2 gradient accumulations (update_freq) = 1024 segments.
Finetuning
Our finetuning uses the mimic_iv_ecg_finetuned.yaml config (can modify diagnosis.yaml as desired).
This command highlights some popular config overrides:
```
FAIRSEQSIGNALSROOT="
fairseq-hydra-train \ task.data=$MANIFESTDIR \ model.modelpath=$PRETRAINEDMODEL \ model.numlabels=$NUMLABELS \ optimization.lr=[1e-06] \ optimization.maxepoch=140 \ dataset.batchsize=256 \ dataset.numworkers=5 \ dataset.disablevalidation=true \ distributedtraining.distributedworldsize=1 \ distributedtraining.findunusedparameters=True \ checkpoint.savedir=$OUTPUTDIR \ checkpoint.saveinterval=1 \ checkpoint.keeplastepochs=0 \ common.logformat=csv \ +task.labelfile=$LABELDIR/y.npy \ +criterion.posweight=$POSWEIGHT \ --config-dir $FAIRSEQSIGNALSROOT/examples/w2vcmsc/config/finetuning/ecg_transformer \ --config-name diagnosis ```
🏷️ Labeler
Functionality for our comphensive free-text pattern matching and knowledge graph-based label manipulation is showcased in the labeler.ipynb notebook.
💬 Questions
Inquiries may be directed to kaden.mckeen@mail.utoronto.ca.
Owner
- Name: WangLab @ U of T
- Login: bowang-lab
- Kind: organization
- Location: 190 Elizabeth St, Toronto, ON M5G 2C4 Canada
- Website: https://wanglab.ml
- Repositories: 11
- Profile: https://github.com/bowang-lab
BoWang's Lab at University of Toronto
GitHub Events
Total
- Issues event: 16
- Watch event: 97
- Issue comment event: 15
- Push event: 4
- Fork event: 11
Last Year
- Issues event: 16
- Watch event: 97
- Issue comment event: 15
- Push event: 4
- Fork event: 11
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Kaden McKeen | m****l@h****m | 15 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 11
- Total pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Total issue authors: 10
- Total pull request authors: 0
- Average comments per issue: 1.18
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Issue authors: 9
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- FUminlee (2)
- jhargun (1)
- jihyeheo (1)
- arnovonkietzell (1)
- ndhuynh02 (1)
- Anmisdwx (1)
- DeweiHu (1)
- Nidhogg-Mzy (1)
- turkalpmd (1)
- mystic-technolab (1)