https://github.com/alraunez/psd_analysis_machinelearning
Algorithms to predicts and assess the succesfulness of predicting hydraulic conductivity based on particle size distributions through six AI algorithms.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
Algorithms to predicts and assess the succesfulness of predicting hydraulic conductivity based on particle size distributions through six AI algorithms.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
PSD Analysis using Machine Learning to identify hydraulic conductivity
This repository accompanies the manuscript "Predicting saturated hydraulic conductivity from particle size distributions using machine learning" (de Rijk et al. 2024; https://doi.org/10.1007/s00477-024-02861-6).
The repository provides routines to perform particle size distribution (PSD) analysis, particularly workflows to estimate hydraulic conductivity with six Machine Learning (ML) algorithms: - Decision Tree (DT) - Random Forest (RF) - XGBoost (XG) - Linear Regression (LR) - Support Vector Regression (SVR) - Artificial Neural Network (ANN)
The package also includes methods for identification of properties, like grain diameter percentiles (d10, d50, d60 etc) and for calculation of hydraulic conductivity through empirical formulas.
The algorithms are tested on soil sample data from the "TopIntegraal" project provided by TNO. (data not yet avaialbe due to license issues, planned to be provided soon)
Structure
README.md- description of the projectLICENSE- the default license is MITrequirements.txt- requirements for pip to install all needed packages (see below)data/- contains the data used in this project, extracted from the TopIntegral data set, including PSD data and hydraulic conductivity of 4593 samples and porosity measurements where available:data_PSD_Kf_por.csv- measured quantities for the 4593 samples: each row contains one sample; columns are: sieve size fractions (=PSD data) in micrometer (column header starting with F), measured hydraulic conductivity from permeameter in m/d (column Kf), log-transformed hydraulic conductivity (column logK), porosity measurements (column porosity, for those sample where available) and the specification of the lithoclass from TopIntegraal (column litho_measured)data_PSD_Kf_por_props.csv- same asdata_PSD_Kf_por.csvplus two columns on soil classes and main lithology (re-)determined from the PSDdata_PSD_Kf_por_props_Kemp.csv- same asdata_PSD_Kf_por_props.csvplus five columns on estimates of hydraulic conductivity with empirical methods (column headers specify method type)
results/- results of processed data (algorithm performance) and plots used in publication:Kemp_all.csv- estimated Kf values of all samples for 15 empirical methods, inlcuding specification of applicabilityData_analysis/- results of analysis of PSD for samples:data_PSD_props.csv- results of PSD analysis for all samples (grain diameters, percentage sand/silt/lutum, lithoclass)data_full_stats.csv- statistical results (mean, std, percentiles,...) of properties (Kf, percentage sand/silt/lutum, ...) for all samplesdata_sand_stats.csv- statistical results (mean, std, percentiles,...) of properties (Kf, percentage sand/silt/lutum, ...) for subset of sand samplesdata_silt_stats.csv- statistical results (mean, std, percentiles,...) of properties (Kf, percentage sand/silt/lutum, ...) for subset of silt samplesdata_clay_stats.csv- statistical results (mean, std, percentiles,...) of properties (Kf, percentage sand/silt/lutum, ...) for subset of clay samplesdata_por_stats.csv- statistical results (mean, std, percentiles,...) of properties (Kf, percentage sand/silt/lutum, porosity, ...) for subset of samples with porosity
ML_Performance/- performance measure R^2 or MSE for all 6 ML algorithms on training, testing and all sample for:Performance_PSD_Kf_topall_r2.csv- feature varialble PSD to target variable Kf for data set "Top-All" (R^2)Performance_PSD_Kf_topall_mse.csv- feature varialble PSD to target variable Kf for data set "Top-All" (MSE)Performance_PSD_Kf_sand_r2.csv- feature varialble PSD to target variable Kf for data set "Top-Sand" (R^2)Performance_PSD_Kf_sand_mse.csv- feature varialble PSD to target variable Kf for data set "Top-Sand" (MSE)Performance_PSD_Kf_silt_r2.csv- feature varialble PSD to target variable Kf for data set "Top-Silt" (R^2)Performance_PSD_Kf_silt_mse.csv- feature varialble PSD to target variable Kf for data set "Top-Silt" (MSE)Performance_PSD_Kf_clay_r2.csv- feature varialble PSD to target variable Kf for data set "Top-Clay" (R^2)Performance_PSD_Kf_clay_mse.csv- feature varialble PSD to target variable Kf for data set "Top-Clay" (MSE)Performance_PSD_Kf_por_r2.csv- feature varialble PSD to target variable porosity for data set "Top-Por" (R^2)Performance_PSD_Kf_por_mse.csv- feature varialble PSD to target variable porosity for data set "Top-Por" (MSE)Performance_dX_Kf_topall_r2.csv- feature varialble grain diameters (d_X) to target variable Kf for data set "Top-All" (R^2)Performance_dX_Kf_topall_mse.csv- feature varialble grain diameters (d_X) to target variable Kf for data set "Top-All" (MSE)Performance_dX_Kf_por_r2.csv- feature varialble grain diameters (d_X) to target variable Kf for data set "Top-Por" (R^2)Performance_dX_Kf_por_mse.csv- feature varialble grain diameters (d_X) to target variable Kf for data set "Top-Por" (MSE)Performance_dX_por_Kf_por_r2.csv- feature varialble grain diameters (d_X) and porosity to target variable Kf for data set "Top-Por" (R^2)Performance_dX_por_Kf_por_mse.csv- feature varialble grain diameters (d_X) and porosity to target variable Kf for data set "Top-Por" (MSE)Performance_PSD_por_por_r2.csv- feature varialble PSD to target variable porosity for data set "Top-Por" (R^2)Performance_PSD_por_por_mse.csv- feature varialble PSD to target variable porosity for data set "Top-Por" (MSE)
Figures_paper/- Figures of results as displayed in the main manuscript of accompanying publication:Fig01_Bar_NSE_PSD_Kf_topall.pdfFig02_Bar_NSE_PSD_Kf_soiltypes.pdfFig03_Scatter_Measured_topall.pdfFig04_Scatter_RF_Barr.pdfFig05_Feature_importance_RF_topall.pdfFig06_Scatter_Measured_dX.pdfFig07_Bar_NSE_features.pdf
Figures_SI/- Figures of results as displayed in the supporting information of accompanying publication:SI_Fig_Bar_NSE_dX_Kf_por.pdfSI_Fig_Bar_NSE_dX_Kf_topall.pdfSI_Fig_Bar_NSE_dX_por_Kf_por.pdfSI_Fig_Bar_NSE_PSD_Kf_clay.pdfSI_Fig_Bar_NSE_PSD_Kf_por.pdfSI_Fig_Bar_NSE_PSD_Kf_sand.pdfSI_Fig_Bar_NSE_PSD_Kf_silt.pdfSI_Fig_Bar_NSE_PSD_por_por.pdfSI_Fig_FeatureImportance_RF_soils.pdfSI_Fig_FeatureImportance_topall.pdfSI_Fig_Histogram_Kf.pdfSI_Fig_Scatter_Kemp.pdfSI_Fig_Scatter_Measured_clay.pdfSI_Fig_Scatter_Measured_dX_por_Kf.pdfSI_Fig_Scatter_Measured_PSD_por.pdfSI_Fig_Scatter_Measured_sand.pdfSI_Fig_Scatter_Measured_silt.pdf
src/- contains all scripts used for data analyses and plotting of resultsPSD_Analysis.py- script containing class "PSD_Analysis" for analysis of PSD (e.g. calculation of dX values, lithoclass)PSD_K_empirical.py- script containing class "PSDtoK_Empirical" to calculate Kf from 15 different empirical formulas based on PSD informationPSD_2K_ML.py- script containing class "PSD2KML" to perform machine learning on data setdata_dictionaries.py- script containing dictionaries with hyperparameters for the 6 ML algorithms, all feature/target combination, all data(sub)sets00_data_processing.py- preprocessing of raw data to transform into dataframe stored in csv file with standard format01_sample_data_statistics.py- Script performing data analysis of PSD and derived quantities (e.g. d10, d50, d60 etc) for all sub-datasets results are saved to "./results/Data_analysis/"02_K_empiricial.py- script calculating Kf from PSD information using empirical formulas implemented in class "PSDKempirical" for the Top-Integral data set03_ML_Hyperparam.py- Script performing hyperparameter testing for list of algorithms and selected data set (based on soil type)03_ML_Hyperparam_GridSearch.py- Script performing hyperparameter testing using GridSearch for a selected algorithm and data set type03_ML_Hyperparam_skopt.py- - Script performing hyperparameter testing using SKopt for a selected algorithm and data set type:04_ML_TrainingPerformance.py- Script evaluating performance of all six ML algorithms after training04_ML_TrainingPerformance_all.py- Script evaluating performance of a selected ML algorithmsF01_Bar_NSE_AllAlgorithms_TopAll.py- reproducing Figure 1 of the manuscriptF01_Bar_NSE_AllAlgorithms_single.py- reproducing each subplot of Figure 1 of the manuscriptF02_Bar_NSE_AllAlgorithms_soils.py- reproducing Figure 2 of the manuscriptF03_Scatter_vs_Measured.py- reproducing Figure 3 of the manuscriptF03_Scatter_vs_Measured_single.py- reproducing subplots of Figure 3 of the manuscriptF04_Scatter_vs_Empiricial.py- reproducing Figure 4 of the manuscriptF05_FeatureImportance.py- reproducing Figure 5 of the manuscriptF06_Scatter_vs_Measured_dX.py- reproducing Figure 6 of the manuscriptF07_Bar_NSE_AllAlgorithms_features.py- reproducing Figure 7 of the manuscriptSI_Bar_NSE_AllAlgorithms.py- reproducing figures with barplots of the SISI_Fig_FeatureImportance_RF_soils.py- reproducing figures on feature importance of the SISI_Fig_FeatureImportance_topall.py- reproducing figures on feature importance of the SISI_Histogram_Kf_ML.py- producing figure with histograms of estimated Kf of the SISI_Histogram_Measured_soils.py- reproducing figure of histograms of measured Kf of the SISI_plot_PSD.py- reproducing figure with PSD curves of the SISI_Scatter_Kemp.py- reproducing figure of scatter plots on empirical formulas of the SISI_Scatter_vs_Measured.py- reproducing figures on scatterplots of Kf of the SISI_Scatter_vs_Measured_por.py- reproducing figures on scatterplots of porosity of the SI
Python environment
To locally run the scripts, clone the repository and (optionally) create a virtual environment. You can do that by running these commands in a terminl:
sh
cd path/to/project_folder/PSD_Analysis_MachineLearning
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt
(to activate the environment on Windows, use the command venv\Scripts\activate).
Contact
You can contact us via a.zech@uu.nl.
License
MIT © 2025
Owner
- Login: AlrauneZ
- Kind: user
- Website: https://www.uu.nl/staff/AZech?t=0
- Repositories: 2
- Profile: https://github.com/AlrauneZ
GitHub Events
Total
- Delete event: 2
- Push event: 9
- Pull request event: 6
- Create event: 1
Last Year
- Delete event: 2
- Push event: 9
- Pull request event: 6
- Create event: 1
Dependencies
- matplotlib *
- numpy *
- openpyxl *
- pandas *
- python =3.10
- scikit-learn =1.0
- scikit-optimize =0.9
- scipy *
- xgboost *