heart-disease-predictive-diagnosis-r
https://github.com/alexksh2/heart-disease-predictive-diagnosis-r
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: alexksh2
- Language: R
- Default Branch: main
- Size: 87.9 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
HeartDiseasePredictive_Diagnosis
According to the World Health Organisation, cardiovascular disease are the most significant cause of mortality annually as it results in the deaths of approximately 17.9 million individuals every year (World Health Organization, 2019). While the healthcare industry generates substantial data on patients, diseases and diagnosis, it is found that these data are not properly analysed. (Hassan, 2022)
The key root cause for heart disease often in the context of sex, smoking, age, family history, poor diet, cholesterol, physical inactivity, high blood pressure, overweightness, and alcohol use.
Therefore, the objective of this study is to effectively predicting the presence of coronary heart disease using machine learning classifiers such as Logistic Regression, K-Nearest Neighbours, Classification and Regression Tree and Random Forest.
The Dataset
Content
This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 76 attributes, including the predicted attribute, but all published experiments refer to using a subset of 14 of them. The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease.
https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset
Context
1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholestoral in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) colored by flourosopy
13. thal: 0 = normal; 1 = fixed defect; 2 = reversable defect
14. The names and social security numbers of the patients were recently removed from the database, replaced with dummy values.
Data Treatment
Undersampling have been done to reduce class imbalance such that the ML model outputs are not biased towards the majority class. (Yenigün, O. 2023).
Model Results
1. Logistic Regression
Overall Accuracy Rate: 208/250 = 83.2%
F1 score: 0.8384615
F2 score: 0.8582677
2. K-Nearest Neighbours
Overall Accuracy Rate: 211/250 = 84.4%
F1 score: 0.8368201
F2 score: 0.8143322
3. Classification and Regression Tree (Classification)
Overall Accuracy Rate: 234/250 = 93.6%
F1 score: 0.936
F2 score: 0.936
4. Random Forest
Overall Accuracy Rate: 244/250 = 97.6%
F1 score: 0.9758065
F2 score: 0.9711075
Intrepretation of Model Results:
The analytical results found that Random Forest Model has the highest predictive accuracy on heart disease predictive diagnosis as it has the highest overall accuracy rate (97%), F1 score and F2 score as compared to logistic regression model, K-Nearest Neighbors model and Classification and Regression Tree Model.
Variable Importance Bar Chart of CART Model:
Variable Importance Plot of Random Forest:
Conclusion: Both tree models (CART and Random Forest) results indicated that Thal is the most significant variable in heart disease predictive diagnosis.
Analysis on Results
Thal variables refer to the results of Thallium stress test, which are often used in nuclear medicine to evaluate blood flow to the heart muscle and diagnose coronary artery disease (Mayo Clinic. 2017). Thallium is a radioactive substance that is injected into the bloodstream for coronary arteriogram procedure.
Normal (Thal 0): The results indicates no significant issues concerning blood flow to heart during rest and stress. Therefore, there is a consistent blood supply through coronary arteries to the cardiac muscles.
Fixed Defect (Thal 1): This result suggests that an area of cardiac muscles is not receiving adequate blood flow during rest and stress and this might be indicator of a past heart condition which has lead to permanent damage to the cardiac tissues.
Reversible Defect (Thal 2): This result indicates that there is areduction in blood flow to a certain area of the heart during stress, but the blood flow improves when the stress is relieved. It suggests a temporary blood flow issue, which may be caused by myocardial ischemia despite mycordial infarction has not occured.
The result has also proven the accuracy of stress thallium-201 scanning procedure, which is found to be a highly sensitive and specific screening procedure for 52 consecutive myocardial ischemia patients (Stolzenberg, J., & London, R. 1979).
Citation
World Health Organization: WHO. (2019, June 11). Cardiovascular diseases. Who.int; World Health Organization: WHO. https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1
Hassan, Ch. A. ul, Iqbal, J., Irfan, R., Hussain, S., Algarni, A. D., Bukhari, S. S. H., Alturki, N., & Ullah, S. S. (2022). Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers. Sensors, 22(19), 7227. https://doi.org/10.3390/s22197227
Yenigün, O. (2023, March 29). Handling Class Imbalance in Machine Learning. MLearning.ai. https://medium.com/mlearning-ai/handling-class-imbalance-in-machine-learning-cb1473e825ce
Mayo Clinic. (2017). Nuclear stress test - Mayo Clinic. Mayoclinic.org. https://www.mayoclinic.org/tests-procedures/nuclear-stress-test/about/pac-20385231
Stolzenberg, J., & London, R. (1979). Reliability of stress thallium-201 scanning in the clinical evaluation of coronary artery disease. Clinical Nuclear Medicine, 4(6), 225–228. https://doi.org/10.1097/00003072-197906000-00001
Owner
- Login: alexksh2
- Kind: user
- Repositories: 1
- Profile: https://github.com/alexksh2
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Alex" given-names: "Khoo Shien How" orcid: "https://orcid.org/0000-0000-0000-0000" title: "Heart Disease Predictive Diagnosis" version: 2.0.4 date-released: 2023-11-3 url: "https://github.com/alexksh2/Heart_Disease_Predictive_Diagnosis"