https://github.com/ajaparicio36/heart-disease-dataset-py

Prelims for SE3132 - Linear Regression (Single and Multiple) for Heart Disease Risk

https://github.com/ajaparicio36/heart-disease-dataset-py

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Prelims for SE3132 - Linear Regression (Single and Multiple) for Heart Disease Risk

Basic Info
  • Host: GitHub
  • Owner: ajaparicio36
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 1.06 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Linear Regression of Heart Disease Risk and Factors

What is this chart about?

Questions to Answer

  1. Why did you choose this dataset?

Disease can struck suddenly sometimes, but that is just because of negligence. I chose this dataset to inform myself more about the risk of Heart Disease and what factors can affect it. I strive to be healthy, and information about this doesn't hurt to learn. So answering this dataset isn't just about for the submission in the exam but also rather a learning, that our bodies are fragile, yet there are many things we can do to keep it safe. This dataset was generated based on science, so it doesn't just contain some nutty values some guy ripped off from AI chat bots anyway.

  1. What is your conclusion upon conducting linear regression on this dataset?

In terms of my own dataset, Heart Risk is directly influenced by both symptoms and risk factors, as can be seen in both regression graphs, the heart risk corresponds to how many of both symptoms and risk factors you have.

The single linear regression, after cleaning the data, grouping and normalizing them, seems to show that the more symptoms/risk factors you have increase your likelihood of developing heart risk.

In multiple linear regression, The data is centered around the line, which also shows great correlation, which is also supported by both a low mean squared error, and high R² score.

In conclusion, this dataset correctly shows that with enough symptoms and heart risks, you will have a high risk of developing heart disease. Although this dataset was only generated but based off scientific means, a thorough research with real life cases might be able to show more conclusive evidence.

  1. How relevant is linear regression today?

Very relevant as it can showcase a single or multiple factors correlation to an independent variable, which can prove theories, problems, and other real world cases as well. Linear regression is able to give a conclusion to test datas in a wide field of scientific studies.

Science research is the main application of linear regression, but there might be other fields that may benefit to this type of graph.

SE 3132 - Prelim Exam Submission

This github repository is the prelim exam submission for SE 3132 of Anthony John B. Aparicio.

Owner

  • Name: Anthony John
  • Login: ajaparicio36
  • Kind: user
  • Location: Philippines

Student from Central Philippine University Studying BS in Software Engineering Acads / Pro Gaming

GitHub Events

Total
  • Member event: 1
  • Push event: 3
  • Create event: 2
Last Year
  • Member event: 1
  • Push event: 3
  • Create event: 2

Dependencies

requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • seaborn *