udiff

Library for different numerical differentiation methods.

https://github.com/forstertim/udiff

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com, science.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Library for different numerical differentiation methods.

Basic Info
  • Host: GitHub
  • Owner: forstertim
  • Language: Python
  • Default Branch: master
  • Size: 250 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Citation

README.md

udiff

This package specifically targets applications in kinetic model development for chemical and biochemical reactions. Examples are given in Forster et al. (2024).

The situation considered is the following: The modeler has access to noisy concentration data (i.e., from a batch reactor). A model structure for the dynamics is chosen by the modeler (i.e., $\frac{dy}{dt}=k\cdot y$). The modeler then needs to estimate the parameters (i.e., the rate constant $k$ in this example).

One possible way to estimate the parameters in such a situation is to calculate the derivatives $\frac{dy}{dt}$ from the noisy data. With this, the equation of the model can be solved for the parameters. However, this is not trivial in case data is scarce and if the data is noisy.

The performance of the model heavily depends on the accuracy of the derivative calculation in the first step. This package includes a collection of tools for simplifying this process. It includes a smoothing and a differentiation module. Both of them are discussed below.

Installation

If you are a git user, try to install the package using the following command: pip install git+https://github.com/forstertim/udiff.git

Alternatively, you can clone the repository and install it. To easily add your own case studies, install in developer mode (-e in the pip install command): git clone https://github.com/forstertim/udiff.git cd simulator pip install -e .

If you do not use git, above, click the button Code and select Download ZIP. Unzip the files and store the folder in a place of your choice. Open a terminal inside that folder (you should see the setup.py file). Run the following command in the terminal: pip install -e .

By using pip list, you should see the package installed in your environment (together with the path to the corresponding folder if you installed in developer mode).

Included tools

Smoothing

As mentioned above, the presence of noise can make the calculation of derivatives very hard. This smoothing module includes functions that first create a smooth version of the noisy data. Currently, the concept is to fit an analytical function through the data. The reason for this is that this function can then be derived analytically.

The following methods are available:

  • Polyfit: A polynomial is fit through the data points. The user can indicate the maximum degree of the polynomial. The algorithm searches for the best Bayesian Information Criterion (BIC) and returns the coefficients of the most appropriate polynomial that can be derived analytically. This package includes an improved approach applied in a work for kinetic model development for biochemical reactions Forster et al. (2023) and Forster et al. (2024).

  • Symbolic regression fit: Symbolic regression searches for an algebraic function that fits the data in an appropriate manner. Currently, the Bayesian Machine Scientist (BMS) Guimerà et al. (2020) is implemented in this package. The user can define allowed unary/binary operators (i.e., +, -, div, mul, etc.) and a number of iteration steps (Markov-chain Monte Carlo steps, MCMC). The function returns a sympy expression that can subsequently be derived analytically.

  • Fit of function library: This module is currently under development. The user can use a library of functions. The algorithm will fit all of those functions and returns the most appropriate one based on a specific fitting metric (currently, the root mean squared error, RMSE, is chosen as such). The function returns a sympy expression that can subsequently be derived analytically.

Differentiation

Differentiation via the sympy package (after using a function library or symbolic regression fit) or via polynomial differentiation (using a polyfit).

In addition, numerical differentiation techniques can be used. If discrete data is handed to the differentiator object, no smoothing is performed and the data is directly numerically differentiated. Currently, the following options are implemented:

  • Forward finite differences (FD): The classical forward finite difference method is applied.

Examples

Polynomial fit to exponential function

The example code is stored in examples/main_polynomial_fit.py.

Import some required packages and get the smoothing/differentiating modules of udiff: python import numpy as np import random random.seed(0) import matplotlib.pyplot as plt from udiff.smooth import smooth_polynomial from udiff.differentiate import differentiator

Now, we can generate some data and add some noise to it: python X = np.linspace(1e-3, np.pi, nsamples) Y_real = X**2 + 3*np.exp(X) np.random.seed(0) noise = np.random.normal(0,0.1,len(Y_real)) Y = Y_real * (1+noise) dY = 2*X + 3*np.exp(X)

Then, we can create an instance of the polynomial smoothing object. The x- and y- data have to be handed over. python obj = smooth_polynomial(x=X, y=Y)

Subsequently, the polynomials are fit. We choose a maximum degree of d=8. This means, polynomials with degrees 1, 2, ..., 8 are fit and the best fit is returned according to the lowest BIC. The best fit can be plotted. python obj.fit_polynomial(maxdeg=8) obj.plot_fit()

Fig 1. Example of a polynomial fit through noisy data.

Fig 1. Example of a polynomial fit through noisy data.

Finally, we perform the analytical differentiation of that found polynomial and compare the accuracy (the R2 value is anyway calculated in the 'comparecalculatedandrealderivatives' function) by plotting the real and calculated derivatives and observed-vs-predicted plots: python diffobj = differentiator(obj) diffobj.differentiate() diffobj.compare_calculated_and_real_derivatives(calc_diff=diffobj.y_diff, real_diff=dY, type_of_comp=['OVP', 'profile']) Fig 2. Calculated derivatives with underlying ground truth.

Fig 2. Calculated derivatives with underlying ground truth.

Fig 3. Observed-vs-predicted (OVP) plot of the derivatives.

Fig 3. Observed-vs-predicted (OVP) plot of the derivatives.

The plots are visualized with plt.show().

BMS fit to exponential

The example code is stored in examples/main_bms_fit.py.

Similar to the example above, one can create an instance of the bms smoothing object. The x- and y- data have to be handed over. python obj = smooth_bms(x=X, y=Y)

Subsequently, a BMS model is fit. Several stopping criteria are available: * nsteps: The number of MCMC steps. As soon as the indicated value (integer) is reached, the fit stops and returns the best fit identified up to that point. * maxtime: The maximum time in seconds. If the indicated value (integer) is reached, the fit stops and returns the best fit identified up to that point. * minr2: The minimum value of the coefficient of determination (R2). If the indicated value (float) is reached, the fit stops and returns the best fit identified up to that point. Additionally, the name of the pickle file can be given if the model should be saved.

python obj.fit_bms(nsteps=150, maxtime=100, minr2=0.98, savename='bms_series1') obj.plot_fit()

Fig 4. Example of a BMS fit through noisy data.

Fig 4. Example of a BMS fit through noisy data.

Finally, we perform the analytical differentiation of that found polynomial and compare the accuracy (the R2 value is anyway calculated in the 'comparecalculatedandrealderivatives' function) by plotting the real and calculated derivatives and observed-vs-predicted plots: python diffobj = differentiator(obj) diffobj.differentiate() diffobj.compare_calculated_and_real_derivatives(calc_diff=diffobj.y_diff, real_diff=dY, type_of_comp=['OVP', 'profile']) Fig 5. Calculated derivatives with underlying ground truth.

Fig 5. Calculated derivatives with underlying ground truth.

Fig 6. Observed-vs-predicted (OVP) plot of the derivatives.

Fig 6. Observed-vs-predicted (OVP) plot of the derivatives.

The plots are again visualized with plt.show().

Bioprocess

The example code is stored in examples/main_bms_fit_bioprocess.py.

After creating some data by solving an ODE model and adding some noise, each species of the bioprocess can be analysed individually. Here, the x- and y- data are the time vector and the concentration profile of each species. The underlying ground truth of the derivatives are also calculated: python data = datagen() # <- Check class in examples/main_bms_fit_bioprocess.py data.solve_ODE_model() data.addnoise_per_species() data.evaluate_true_derivatives()

Subsequently, a BMS model is fit to each concentration profile individually, using the same stopping criteria as above: ```python

Create empty arrray for fitted profiles

data.ysmooth = np.zeros(data.y.shape) data.ydiff = np.zeros(data.y.shape) data.xdifffd = np.zeros((data.y.shape[0]-1, data.y.shape[1])) data.ydifffd= np.zeros((data.y.shape[0]-1, data.y.shape[1]))

Fit profiles

for specid in range(data.y.shape[1]): # Each species individually X = data.tspan Y = data.ynoisy[:, specid] # Create new object obj = smoothbms(x=X, y=Y, scaling=False) # Fit profiles obj.fitbms(nsteps=1e4, maxtime=3.6e3, minr2=0.999, showupdate=True, updateeverynseconds=200) # Store data data.ysmooth[:, specid] = obj.ysmooth # Differentiate algebraic equation diffobj = differentiator(obj) diffobj.differentiate() # Store data data.ydiff[:, specid] = diffobj.ydiff # Calculate finite differences for comparison yderFD = [] xderFD = [] dt = (data.tspan[1] - data.tspan[0])/2 for t in range(len(data.tspan)-1): xderFD.append(data.tspan[t] + dt) yderFD.append((Y[t+1] - Y[t])/(data.tspan[t+1] - data.tspan[t])) data.xdifffd[:, specid] = np.array(xderFD) data.ydifffd[:, specid] = np.array(yder_FD) ``` As a comparison to the calculated derivatives by the symbolic regression approach, the forward finite difference method is applied. The results are compared in the following plots.

Fig 7. Example of a BMS fit through noisy bioprocess data.

Fig 7. Example of a BMS fit through noisy bioprocess data. The top row shows concentration profiles (ground truth (dashed black line), noisy observations (black circles), BMS fit (red solid line)). The bottom row shows the derivatives (ground truth (dashed black line), forward finite differences (blue line with diamond), BMS calculations (red line with crosses)).

Citation

If you use this package in your research, please cite the following publication:

Forster et al., Machine learning uncovers analytical kinetic models of bioprocesses. 2024. Chemical Engineering Science. URL.

Additional References

Forster et al., Modeling of bioprocesses via MINLP-based symbolic regression of S-system formalisms. 2023. Computers & Chemical Engineering. URL

Guimerà et al., A Bayesian machine scientist to aid in the solution of challenging scientific problems. 2020. Sci. Adv.. URL

Owner

  • Name: Tim Forster
  • Login: forstertim
  • Kind: user
  • Location: Zurich
  • Company: ETH Zurich

PhD student in the sustainable process systems engineering laboratory at ETH Zurich.

Citation (CITATION.cff)

cff-version: 1.2.0
authors:
- family-names: "Forster"
  given-names: "Tim"
title: "In-Silico Data Generation in Python (InSiDa-Py)"
url: "https://github.com/forstertim/insidapy"
preferred-citation:
  type: article
  authors:
  - family-names: "Forster"
    given-names: "Tim"
  - family-names: "Vázquez"
    given-names: "Daniel"
  - family-names: "Mueller"
    given-names: "Claudio"
  - family-names: "Guillén-Gosálbez"
    given-names: "Gonzalo"
  doi: "10.1016/j.ces.2024.120606"
  journal: "Chemical Engineering Science"
  title: "Machine learning uncovers analytical kinetic models of bioprocesses"
  issue: 120606
  year: 2024

GitHub Events

Total
Last Year

Dependencies

setup.py pypi
  • matplotlib *
  • numpy *
  • prettytable *
  • scikit-learn *
  • scipy *
  • seaborn *