lung-cancer-risk-prediction-with-machine-learning-models

https://github.com/meetjariwala10/lung-cancer-risk-prediction-with-machine-learning-models

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: MeetJariwala10
Language: Jupyter Notebook
Default Branch: main
Size: 446 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

Lung-Cancer-Risk-Prediction-with-Machine-Learning-Models

This repository implements machine learning models for lung cancer risk prediction inspired by the paper:

Dritsas, E.; Trigka, M. (2022). Lung Cancer Risk Prediction with Machine Learning Models. Big Data and Cognitive Computing, 6(4), 139.
DOI: 10.3390/bdcc6040139

The paper demonstrates a comparative analysis of several classifiers (e.g., Naive Bayes, SVM, Random Forest, Rotation Forest, etc.) on a publicly available dataset and highlights the superior performance of the Rotation Forest classifier in terms of accuracy, precision, recall, F-Measure, and AUC.

Overview

This project implements lung cancer risk prediction models using machine learning techniques. The key features of this repository include:

Data Preprocessing: Balancing the dataset using SMOTE.
Feature Analysis: Evaluating feature importance using methods like gain ratio and random forest.
Modeling: Training a variety of classification models such as Naive Bayes, Bayesian network, logistic regression, SVM, Random Forest, and Rotation Forest.
Evaluation: Assessing models with metrics including accuracy, precision, recall, F-Measure, and AUC via 10-fold cross-validation in the Weka environment.

The project is implemented in a Jupyter Notebook (MLMINIPROJECT.ipynb) that contains the code and experiments.

Prerequisites and Installation

To run the code in this repository, please ensure you have the following:

Python 3.x installed.
Required Python libraries such as:
- numpy
- pandas
- scikit-learn
- imblearn (for SMOTE)
- matplotlib or seaborn (for plotting)

You can install these dependencies using pip:

```bash pip install numpy pandas scikit-learn imbalanced-learn matplotlib seaborn

Owner

Name: Meet Jariwala
Login: MeetJariwala10
Kind: user

Repositories: 1
Profile: https://github.com/MeetJariwala10

Coding Enthusiast

Citation (CITATIONS.bib)

@Article{bdcc6040139,
AUTHOR = {Dritsas, Elias and Trigka, Maria},
TITLE = {Lung Cancer Risk Prediction with Machine Learning Models},
JOURNAL = {Big Data and Cognitive Computing},
VOLUME = {6},
YEAR = {2022},
NUMBER = {4},
ARTICLE-NUMBER = {139},
URL = {https://www.mdpi.com/2504-2289/6/4/139},
ISSN = {2504-2289},
ABSTRACT = {The lungs are the center of breath control and ensure that every cell in the body receives oxygen. At the same time, they filter the air to prevent the entry of useless substances and germs into the body. The human body has specially designed defence mechanisms that protect the lungs. However, they are not enough to completely eliminate the risk of various diseases that affect the lungs. Infections, inflammation or even more serious complications, such as the growth of a cancerous tumor, can affect the lungs. In this work, we used machine learning (ML) methods to build efficient models for identifying high-risk individuals for incurring lung cancer and, thus, making earlier interventions to avoid long-term complications. The suggestion of this article is the Rotation Forest that achieves high performance and is evaluated by well-known metrics, such as precision, recall, F-Measure, accuracy and area under the curve (AUC). More specifically, the evaluation of the experiments showed that the proposed model prevailed with an AUC of 99.3%, F-Measure, precision, recall and accuracy of 97.1%.},
DOI = {10.3390/bdcc6040139}
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science