https://github.com/co822ee/pcr-globwb_error-correction
This includes all scripts and analysis results for the research where random forests are used to correct errors in PCR-GLOBWB discharge predictions.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Repository
This includes all scripts and analysis results for the research where random forests are used to correct errors in PCR-GLOBWB discharge predictions.
Basic Info
- Host: GitHub
- Owner: co822ee
- License: gpl-3.0
- Language: R
- Default Branch: master
- Size: 227 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Error-correction of streamflow predictions from a global hydrological model using random forests
This is the repository of the research article "Error-correction of streamflow predictions from a global hydrological model using random forests" by Youchen Shen, Jessica Ruijsch, Meng Lu, Edwin Sutanudjaja, Derek Karssenberg from Utrecht University, the Netherlands.
Here includes all observed data and simulated PCR-GLOBWB data, scripts for modelling and analysis for the research where random forests serve as an error-correction model for PCR-GLOBWB streamflow predictions.
The folder of casestudy/R/RdiffStation gives all the R and python scripts for this research project.
Raw data
Raw data is available in the directory case_study/R/data/rawData. * observed discharge (m^3/s) obtained from the Global Runoff Data Center (GRDC) in csv format * simulated discharge (m^3/s) from the PCR-GLOBWB (calibrated & uncalibrated) in netCDF format * meteorological driving variables averaged over the upstream grids of the gauging station in netCDF format * key hydrological state variables from the PCR-GLOBWB (calibrated & uncalibrated) averaged over the upstream grids of the gauging station in netCDF format
Before starting
case_study/R/data/rawData/stationLatLon.csv indicates the location and basic geographic information of each station. This data was obtained from GRDC station catalogues (acessed date: 28/04/2020). With this station information, the following scripts are run for all stations that are indicated in this csv file and that have PCR-GLOBWB calibrated predictions in the folder casestudy/R/data/rawdata/PCR-Discharge/calibrated/.
Data preprocessing
- All python scripts extract variables values at desired locations, given latitude (x) and longitude (y), from the netCDF data.
0preprocessDataallProcesses.R preprocesses the csv files created by the python in the previous step and creates new csv files that can be used for implementing RFs later. In the script you can choose to preprocess data from either the calibrated PCR-GLOBWB model or the one without calibration.
The csv files generated by this script will be put in the data/preprocess folder and the results for simulation data from PCR-GLOBWB with different calibration configurations will be put in different subfolders.
Implementing RF for all model configurations
2RFallRFConfigurations.R implements RFs to correct streamflow prediction errors in PCR-GLOBWB.
At the top of the script you can choose
- whether to implement for the calibrated/uncalibrated PCR-GLOBWB model
- whether to include state variables as predictors in the random forests
This script runs the random forests for all stations provided and implements the following processes:
- tuning parameter and creating csv files of the tuning results (hypergrid[station].csv)
- determining optimal parameters based on OOB RMSEs
- creating csv files of goodness of fit values using both PCR-GLOBWB predictions and ones updated by the random forests (rfeval.csv giving absolute goodness-of-fit values and rfeval_r.csv giving relative goodness-of-fit values)
- creating csv files of variable importance results (variable_importance.csv)
- creating csv files of streamflow predictions updated by the random forests (rfresult[station].csv)
The csv files generated by the script will be put in the data/analysis folder, and the results from different model configurations will be put in different subfolders.
Visualizing results
3visualizationRF_allConfiguration.R generates the barplot showing model performance of all model configurations (either pure PCR-GLOBWB or hybrid models with PCR-GLOBWB followed by RF error-correction).
3visualizehydrograph_all.R generates not only time series plot of (observed & simulated) discharge and residuals for all configurations of the hybrid models but also scatterplots of residuals/predictions vs observations.
All the graphs generated by the above visualization scripts will be exported in the graph/RFresult_all folder.
Function scripts
R scripts with names starting with 'function_' facilitate the model implementation.
Contact
For any question regarding the source codes and data, please contact the correspondance author: Ms. Youchen Shen via email: co822ee@gmail.com or y.shen@uu.nl