https://github.com/andreartelt/a-benchmark-for-physics-informed-machine-learning-of-chlorine-states-in-water-distribution-networks

This repository contains the benchmark and the implementation of the experiments from the paper "A Benchmark for Physics-informed Machine Learning of Chlorine States in Water Distribution Networks" by Luca Hermes, André Artelt, Stelios Vrachimis, Marios Polycarpou, and Barbara Hammer

Keywords

benchmarks epanet water-distribution-networks water-quality

Last synced: 5 months ago · JSON representation

Repository

This repository contains the benchmark and the implementation of the experiments from the paper "A Benchmark for Physics-informed Machine Learning of Chlorine States in Water Distribution Networks" by Luca Hermes, André Artelt, Stelios Vrachimis, Marios Polycarpou, and Barbara Hammer

Basic Info

Host: GitHub
Owner: andreArtelt
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 215 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

benchmarks epanet water-distribution-networks water-quality

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme License

README.md

A Benchmark for Physics-informed Machine Learning of Chlorine States in Water Distribution Networks

This repository contains the benchmark and the implementation of the experiments from the paper A Benchmark for Physics-informed Machine Learning of Chlorine States in Water Distribution Networks by Luca Hermes, Andr Artelt, Stelios Vrachimis, Marios Polycarpou, and Barbara Hammer.

Abstract

Ensuring high-quality drinking water is a critical responsibility of water utilities, with chlorine being the main disinfectant typically used. Accurate estimation of chlorine concentrations in the dynamic environment of water distribution networks (WDNs) is essential to ensure safe water supply. This work introduces a comprehensive and carefully created benchmark for training and evaluation of chlorine concentration estimation methodologies in WDNs. The benchmark includes a diverse dataset of 18,000 scenarios of the widely studied 'Hanoi', 'Net1', and the more recent and complex 'CY-DBP' water networks, featuring various chlorine injection patterns to capture diverse physical dynamics. To provide baseline evaluations, we propose and evaluate two neural surrogate models for chlorine state estimation: a physics-informed Graph Neural Network (GNN) and a physics-guided Recurrent Neural Network (RNN).

Benchmark for Chlorine State Estimation

Data Set

This benchmark data set concerns the transport, decay, and mixing of chlorine in water distribution networks (WDNs). It contains three different WDNs (Net1, Hanoi, and CY-DBP), each with 1000 different scenarios (different demand patterns and network parameters) and different Chlorine injection patterns. The data set was created and simulated by using EPyT-Flow.

For the 'Net1' and 'Hanoi' networks, we build on the LeakDB data set and generate 1,000 scenarios based on Net1 and another 1,000 scenarios based on the Hanoi network -- the demand patterns of those scenarios are the same as in LeakDB and each scenario is 30 days long (30min time steps). For the CY-DBP network, we mimic the uncertainties from LeakDB to generate a 1,000 similar but slightly different scenarios.

In each scenario, a chlorine pump (specifying the chlorine concentration of the external inflow) is installed at the reservoir -- note that the CY-DBP has two reservoirs. The chlorine concentration over time is monitored at each node and link (pipe) in the network. Furthermore, the flow rate at each link (pipe) is monitored as well. All bulk and wall reaction coefficients are set to zero i.e., only the transport, decay, and mixing of chlorine are simulated. For each scenario, three different Chlorine injection patterns were applied: a spike pattern, a periodic wave-like pattern, and a random pattern -- see Figure below.

Furthermore, for each case there also exist a scenario where the demand pattern was randomized in order to break any correlation between demand pattern and Chlorine concentration -- see paper for details.

In total, the data set contains 18,0000 scenarios. The data generation process is implemented in create_data.py.

How to load the data set in Python

The data set can be loaded by utilizing the DataLoader class. Besides loading the data itself, the DataLoader class also provides a method (load_network_topology()) for loading the topology of the WDN as an epyt_flow.topology.NetworkTopology instance.

```python

Load all 1000 Hanoi scenarios with the spike injection pattern

Goal: Predict Cl concentration at node "2"

d = DataLoader() X, y = d.loaddatafromscenarios(netdesc="Hanoi", randomdemands=False, clinjectionpatterndesc="spike", targetnodeid="2")

Load the topology of the Hanoi network

hanoitopology = d.loadnetworktopology(netdesc="Hanoi") ```

Evaluation

We propose assessing performance using various, carefully chosen metrics. Here, we aim for easy-to-interpret metrics as well as metrics that are well-suited for the specific characteristics of the proposed benchmark. For a single node prediction, denoted as $\hat{y}i$, this can be extended to multiple nodes by, for example, averaging over all nodes $i$: - Non-negativity of the predicted chlorine concentrations -- i.e. evaluating a trivial physical plausibility of the predicted concentrations: $\sum{t=1}^{T} \Bbb{1}(\hat{y}_i(t) \geq 0)$

\item **Upper bound of a physically plausible chlorine concentration.** This depends on the maximum concentration at the injection points during past time steps, which may still influence the system. This concept is referred to as the memory of the system and relates to the maximum time required for water from the injection location to reach any node in the network. Given that flow rates are assumed to be known in this study, the maximum transport times for all nodes can be explicitly calculated. This also relates to the accuracy of predictions for each node, as a longer transport time implies a greater range of past inputs that can affect the output, introducing more uncertainty into the model.
$\frac{1}{T-K}\sum_{t=K}^T \Bbb{1}\left(\hat{y}_i(t) \leq \underset{k\in[t-K, t]}{\max}(y_r(k))\right)$
where $K$ refers to the maximum transport time in the water network, and $y_r$ refers to the chlorine concentration at the injection location over time -- note that this generalizes for multiple injection sources.

The Mean-absolute-error (MAE) as a standardized and easy-to-interpret error metric: $\frac{1}{T}\sum{t=1}^{T} \left|\hat{y}i(t) - y_i(t)\right|$ where $T$ refers to the length of the time horizon -- i.e. length of the simulated scenario.
The running MAE for evaluating the performance over time by accumulating the performance up to some given time point. By this, we evaluate whether the performance remains stable and robust throughout the entire duration.

The running MAE is a function that maps a time horizon $k\leq T$ to accumulated performance (i.e. MAE): $f(k) = \frac{1}{k}\sum{t=1}^{k} \left|\hat{y}i(t) - y_i(t)\right|$
The amount of chlorine concentration which the model is over/underestimating -- i.e. indicating a bias for over or undershooting the true concentration. For this purpose, we sum up all positive and negative errors, and take their difference. A result close to zero indicates no bias, whereas a positive/negative result indicates a bias for over/underestimating the true concentration: $\left(\sum{t=1}^T \max(0, ei(t))\right) - \left(\sum{t=1}^T \max(0, -1 \cdot ei(t))\right)$ where $ei(t) = \hat{y}i(t) - y_i(t)$

In addition to averaging scores across all nodes, we recommend comparing scores for each node individually. This approach allows us to assess whether errors and performance are uniformly distributed throughout the WDN or if certain nodes pose greater challenges than others. For example, a method's performance might be influenced by the node's distance from the reservoir (chlorine injection point) -- i.e. predicting concentrations at nodes located farther away could be more challenging than at nodes closer to the injection site. One could also group nodes according to their transport delay -- i.e. the time (e.g. time steps) it takes for a substance to travel from the injection point to the node of interest. Note that the transport delay might differ from the spatial distance (e.g. shortest path).

How to do the evaluation in Python

The evaluation metrics are implemented in the Evaluator class. Besides methods for computing the individual evaluation metrics, there also exist a method evaluate_predictions() for computing all evaluation metrics.

```python

Load test data

Xtest, ytest = ....

Get chlorine concentration at injection node

cl_injection = ....

Predict Cl states

ytestpred = ....

Evaluate predictions using all proposed metrics

print(Evaluator.evaluatepredictions(ytestpred, ytest, cl_injection)) ```

How to Run the Experiments from the Paper

Make sure to first download the data set, unpack everything, and put it into "data" in the root directory of this folder.

The experiments regarding the RNN are implemented in runexprnn.py (and experiments_rnn.py) -- all configurations can be run by executing the slurm scripts runexprnn_net1.job.sbatch, runexprnn_hanoi.job.sbatch, and runexprnn_cydbp.job.sbatch.

The experiments regarding the GNN are implemented in graph_pde.py -- all configurations can be run by executing the bash script rungnnexperiments.sh.

License

MIT license - See LICENSE.

How to Cite?

@article{machinelearningchlorinestateestimationbenchmark2025, author = {Luca Hermes and Andr Artelt, Stelios G. Vrachimis and Marios M. Polycarpou and Barbara Hammer}, title = {{A Benchmark for Physics-informed Machine Learning of Chlorine States in Water Distribution Networks}}, year = {2025}, journal = {SN Computer Science}, volume = {6}, doi = {10.1007/s42979-025-04008-y}, url = {https://doi.org/10.1007/s42979-025-04008-y} }

Owner

Name: André Artelt
Login: andreArtelt
Kind: user
Location: Germany
Company: Bielefeld University

Repositories: 3
Profile: https://github.com/andreArtelt

PhD student

GitHub Events

Total

Watch event: 1
Push event: 3

Last Year

Watch event: 1
Push event: 3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/andreartelt/a-benchmark-for-physics-informed-machine-learning-of-chlorine-states-in-water-distribution-networks

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

A Benchmark for Physics-informed Machine Learning of Chlorine States in Water Distribution Networks

Abstract

Benchmark for Chlorine State Estimation

Data Set

How to load the data set in Python

Load all 1000 Hanoi scenarios with the spike injection pattern

Goal: Predict Cl concentration at node "2"

Load the topology of the Hanoi network

Evaluation

How to do the evaluation in Python

Load test data

Get chlorine concentration at injection node

Predict Cl states

Evaluate predictions using all proposed metrics

How to Run the Experiments from the Paper

License

How to Cite?

Owner

GitHub Events

Total

Last Year