https://github.com/aiot-mlsys-lab/meit
[ACL 2025 Findings🔥] Official implementation of "Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation"
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
â—‹DOI references
-
â—‹Academic publication links
-
â—‹Academic email domains
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary
Repository
[ACL 2025 Findings🔥] Official implementation of "Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation"
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation (ACL 2025 Findings🔥)
Easy steps for efficient implementations
Step 1: download data and preprocess data
1.download data from google drive to your linux device: Google drive link: #############.
2.proprocess data: get into the 'config.yaml' file to set up the link of downloaded data:
``` if mimic: dataset: datasetname: 'mimic' ## this is for mimic dataset 21k ecgpath: 'xxxx' # add your image file path here textpath: 'xxxtrain.csv'
if ptbxl: dataset: datasetname: 'ptbxl' ## this is for PTB-XL dataset 21k ecgpath: '/fs/scratch/PAS2473/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3/' # add your image file path here textpath: '/users/PAS2473/brucewan666/ECG/ECG/instructdata/RGenptbxl_train.csv' ```
- 3.run preprocess data:
``` get into preprocess_ecg.py set the path of yours (an example of ptbxl):
buildinstructdataset(ecgname='ptbxl',savepath='/users/PAS2473/brucewan666/ECG/ECG/instructdata/ptbxlecg_train.jsonl') # mimic
```
- 4.set up environment:
pip install -r requirements.txt
- 5.run ecginstructiontuning data and inference with only one ECG_instruction, give an example of mimic-ecg data:
``` export CUDAVISIBLEDEVICES=0
MODELSIZE=7B NUMGPUS=1 BATCHSIZEPERGPU=16 TOTALBATCHSIZE=64 # 144 50277 GRADIENTACCSTEPS=$(($TOTALBATCHSIZE/$NUMGPUS/$BATCHSIZEPERGPU)) echo "Training llama model ${MODELSIZE} using $NUMGPUS GPUs, $BATCHSIZEPERGPU batch size per GPU, $GRADIENTACCSTEPS gradient accumulation steps" # --usedeepspeed \ # --deepspeedconfigfile /home/wan.512/ECGLLMs/open-instruct/dsconfigs/stage3nooffloadingaccelerate.conf \
Lora training
accelerate launch --mainprocessport 31225 \ --mixedprecision bf16 \ --nummachines 1 \ --numprocesses $NUMGPUS \ /users/PAS2473/brucewan666/ECG/ECG/finetuneecgllmwithloramimic.py \ --modelnameorpath meta-llama/Llama-2-7b-hf \ --uselora \ --lorarank 64 \ --loraalpha 128 \ --loradropout 0.1 \ --tokenizername meta-llama/Llama-2-7b-hf \ --useslowtokenizer \ --trainfile /users/PAS2473/brucewan666/ECG/ECG/instructdata/mimicecg.jsonl \ --maxseqlength 128 \ --preprocessingnumworkers 16 \ --checkpointingsteps epoch \ --perdevicetrainbatchsize $BATCHSIZEPERGPU \ --gradientaccumulationsteps $GRADIENTACCSTEPS \ --learningrate 2e-5 \ --lrschedulertype linear \ --warmupratio 0.03 \ --weightdecay 0. \ --numtrainepochs 5 \ --outputdir /fs/scratch/PAS2473/zhongweisaveckpt/gpt2largelorackpt \ --withtracking \ --reportto tensorboard \ --useecgllm \ --devratio 0.1 \ --valtestratio 0.1 \ --loggingsteps 100 \ --evalstep 3200 \ --teststep 4000 \ --llmtype llama2 \ --cachedir /fs/scratch/PAS2473/zhongweimodels
```
Owner
- Name: OSU AIoT-MLSys Lab
- Login: AIoT-MLSys-Lab
- Kind: organization
- Location: United States of America
- Website: https://aiot-mlsys-lab.github.io/
- Repositories: 15
- Profile: https://github.com/AIoT-MLSys-Lab
GitHub Events
Total
- Issues event: 2
- Watch event: 2
- Push event: 1
- Public event: 1
Last Year
- Issues event: 2
- Watch event: 2
- Push event: 1
- Public event: 1
Dependencies
- accelerate ==0.31.0
- alpaca-eval ==0.6.2
- antlr4-python3-runtime ==4.11.0
- autoflake *
- beaker-py *
- bitsandbytes >=0.41.1
- black *
- datasets *
- deepspeed ==0.15.0
- einops *
- evaluate >=0.4.0
- fire *
- flake8 *
- flash-attn ==2.6.3
- flask *
- gradio >=3.50.2
- hf_transfer *
- immutabledict *
- isort *
- jsonlines *
- langdetect *
- mpmath ==1.3.0
- nltk ==3.8.1
- openai >=1.0.0
- openpyxl *
- packaging *
- peft >=0.11.1
- protobuf *
- pytest *
- rouge_score *
- scipy *
- sentencepiece *
- sympy ==1.12.0
- tensorboard *
- termcolor *
- tiktoken *
- tokenizers ==0.19.1
- torch ==2.4.0
- transformers ==4.43.4
- unidic-lite *
- vllm >=0.5.4
- wandb *