thesis_jasperbraakman

https://github.com/jasperbraakman/thesis_jasperbraakman

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: JasperBraakman
Language: Jupyter Notebook
Default Branch: main
Size: 43.1 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme Citation

Soil Pollution Prediction using Machine Learning

Overview

This project aims to enhance soil pollution prediction by integrating expert-defined risk zones with machine learning models. The study evaluates the impact of expert knowledge on predictive performance through systematic data preparation, model structuring, evaluation, and interpretation. Using the Netherlands as a case study, this research investigates various stages and methods of incorporating expert-defined risk zones to improve soil pollution assessments.

Notebooks

The project comprises four main Jupyter Notebooks, each focusing on different aspects of the study:

EDA.ipynb:

Purpose: Conduct exploratory data analysis (EDA) to understand the datasets and their distributions.
Contents: Data loading, visualization of soil pollution data, and statistical summaries.

Hyper parameter Selection.ipynb:

Purpose: Perform hyperparameter selection for the machine learning models using grid search.
Contents: Setup of machine learning models, execution of grid search for hyperparameter tuning, and selection of the best models.

RandomForest.ipynb:

Purpose: Run the main models for both datasets.
Contents: Setup for machine learning models, execution of random forest models.

Analysis model runs.ipynb:

Purpose: Analyze the performance of different model runs, comparing models with and without expert knowledge integration.
Contents: Model evaluation metrics, feature importance analysis, generalizability assessment, and learning curve analysis.

Requirements

To run the notebooks, you need to install the required Python libraries. Below is a list of the main libraries used: - pandas==1.3.3 - numpy==1.21.2 - scikit-learn==0.24.2 - matplotlib==3.4.3 - seaborn==0.11.2

Installation

git clone https://github.com/JasperBraakman/Thesis_JasperBraakman.git

Install dependencies:

pip install -r requirements.txt

Owner

Login: JasperBraakman
Kind: user

Repositories: 1
Profile: https://github.com/JasperBraakman

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Jasper"
  given-names: "Braakman"
title: "Thesis_JasperBraakman"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2024-04-19
url: "https://github.com/JasperBraakman/Thesis_JasperBraakman"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science