https://github.com/aiot-mlsys-lab/meit

[ACL 2025 Findings🔥] Official implementation of "Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation"

https://github.com/aiot-mlsys-lab/meit

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • â—‹
    Academic email domains
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (7.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

[ACL 2025 Findings🔥] Official implementation of "Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation"

Basic Info
  • Host: GitHub
  • Owner: AIoT-MLSys-Lab
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.65 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed 12 months ago
Metadata Files
Readme

README.md

MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation (ACL 2025 Findings🔥)

Easy steps for efficient implementations

Step 1: download data and preprocess data

  • 1.download data from google drive to your linux device: Google drive link: #############.

  • 2.proprocess data: get into the 'config.yaml' file to set up the link of downloaded data:

``` if mimic: dataset: datasetname: 'mimic' ## this is for mimic dataset 21k ecgpath: 'xxxx' # add your image file path here textpath: 'xxxtrain.csv'

if ptbxl: dataset: datasetname: 'ptbxl' ## this is for PTB-XL dataset 21k ecgpath: '/fs/scratch/PAS2473/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3/' # add your image file path here textpath: '/users/PAS2473/brucewan666/ECG/ECG/instructdata/RGenptbxl_train.csv' ```

  • 3.run preprocess data:

``` get into preprocess_ecg.py set the path of yours (an example of ptbxl):

buildinstructdataset(ecgname='ptbxl',savepath='/users/PAS2473/brucewan666/ECG/ECG/instructdata/ptbxlecg_train.jsonl') # mimic

```

  • 4.set up environment:

pip install -r requirements.txt

  • 5.run ecginstructiontuning data and inference with only one ECG_instruction, give an example of mimic-ecg data:

``` export CUDAVISIBLEDEVICES=0

MODELSIZE=7B NUMGPUS=1 BATCHSIZEPERGPU=16 TOTALBATCHSIZE=64 # 144 50277 GRADIENTACCSTEPS=$(($TOTALBATCHSIZE/$NUMGPUS/$BATCHSIZEPERGPU)) echo "Training llama model ${MODELSIZE} using $NUMGPUS GPUs, $BATCHSIZEPERGPU batch size per GPU, $GRADIENTACCSTEPS gradient accumulation steps" # --usedeepspeed \ # --deepspeedconfigfile /home/wan.512/ECGLLMs/open-instruct/dsconfigs/stage3nooffloadingaccelerate.conf \

Lora training

accelerate launch --mainprocessport 31225 \ --mixedprecision bf16 \ --nummachines 1 \ --numprocesses $NUMGPUS \ /users/PAS2473/brucewan666/ECG/ECG/finetuneecgllmwithloramimic.py \ --modelnameorpath meta-llama/Llama-2-7b-hf \ --uselora \ --lorarank 64 \ --loraalpha 128 \ --loradropout 0.1 \ --tokenizername meta-llama/Llama-2-7b-hf \ --useslowtokenizer \ --trainfile /users/PAS2473/brucewan666/ECG/ECG/instructdata/mimicecg.jsonl \ --maxseqlength 128 \ --preprocessingnumworkers 16 \ --checkpointingsteps epoch \ --perdevicetrainbatchsize $BATCHSIZEPERGPU \ --gradientaccumulationsteps $GRADIENTACCSTEPS \ --learningrate 2e-5 \ --lrschedulertype linear \ --warmupratio 0.03 \ --weightdecay 0. \ --numtrainepochs 5 \ --outputdir /fs/scratch/PAS2473/zhongweisaveckpt/gpt2largelorackpt \ --withtracking \ --reportto tensorboard \ --useecgllm \ --devratio 0.1 \ --valtestratio 0.1 \ --loggingsteps 100 \ --evalstep 3200 \ --teststep 4000 \ --llmtype llama2 \ --cachedir /fs/scratch/PAS2473/zhongweimodels

```

Owner

  • Name: OSU AIoT-MLSys Lab
  • Login: AIoT-MLSys-Lab
  • Kind: organization
  • Location: United States of America

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Push event: 1
  • Public event: 1
Last Year
  • Issues event: 2
  • Watch event: 2
  • Push event: 1
  • Public event: 1

Dependencies

requirements.txt pypi
  • accelerate ==0.31.0
  • alpaca-eval ==0.6.2
  • antlr4-python3-runtime ==4.11.0
  • autoflake *
  • beaker-py *
  • bitsandbytes >=0.41.1
  • black *
  • datasets *
  • deepspeed ==0.15.0
  • einops *
  • evaluate >=0.4.0
  • fire *
  • flake8 *
  • flash-attn ==2.6.3
  • flask *
  • gradio >=3.50.2
  • hf_transfer *
  • immutabledict *
  • isort *
  • jsonlines *
  • langdetect *
  • mpmath ==1.3.0
  • nltk ==3.8.1
  • openai >=1.0.0
  • openpyxl *
  • packaging *
  • peft >=0.11.1
  • protobuf *
  • pytest *
  • rouge_score *
  • scipy *
  • sentencepiece *
  • sympy ==1.12.0
  • tensorboard *
  • termcolor *
  • tiktoken *
  • tokenizers ==0.19.1
  • torch ==2.4.0
  • transformers ==4.43.4
  • unidic-lite *
  • vllm >=0.5.4
  • wandb *