Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: JasperBraakman
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 43.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

Soil Pollution Prediction using Machine Learning

Overview

This project aims to enhance soil pollution prediction by integrating expert-defined risk zones with machine learning models. The study evaluates the impact of expert knowledge on predictive performance through systematic data preparation, model structuring, evaluation, and interpretation. Using the Netherlands as a case study, this research investigates various stages and methods of incorporating expert-defined risk zones to improve soil pollution assessments.

Notebooks

The project comprises four main Jupyter Notebooks, each focusing on different aspects of the study:

EDA.ipynb:

  • Purpose: Conduct exploratory data analysis (EDA) to understand the datasets and their distributions.
  • Contents: Data loading, visualization of soil pollution data, and statistical summaries.

Hyper parameter Selection.ipynb:

  • Purpose: Perform hyperparameter selection for the machine learning models using grid search.
  • Contents: Setup of machine learning models, execution of grid search for hyperparameter tuning, and selection of the best models.

RandomForest.ipynb:

  • Purpose: Run the main models for both datasets.
  • Contents: Setup for machine learning models, execution of random forest models.

Analysis model runs.ipynb:

  • Purpose: Analyze the performance of different model runs, comparing models with and without expert knowledge integration.
  • Contents: Model evaluation metrics, feature importance analysis, generalizability assessment, and learning curve analysis.

Requirements

To run the notebooks, you need to install the required Python libraries. Below is a list of the main libraries used: - pandas==1.3.3 - numpy==1.21.2 - scikit-learn==0.24.2 - matplotlib==3.4.3 - seaborn==0.11.2

Installation

git clone https://github.com/JasperBraakman/Thesis_JasperBraakman.git

Install dependencies:

pip install -r requirements.txt

Owner

  • Login: JasperBraakman
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Jasper"
  given-names: "Braakman"
title: "Thesis_JasperBraakman"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2024-04-19
url: "https://github.com/JasperBraakman/Thesis_JasperBraakman"

GitHub Events

Total
Last Year