pydaddy

Python package to discover stochastic differential equations from time series data

https://github.com/tee-lab/pydaddy

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, nature.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary

Keywords

python sde stochastic-differential-equations time-series time-series-analysis

Last synced: 11 months ago · JSON representation ·

Repository

Python package to discover stochastic differential equations from time series data

Basic Info

Host: GitHub
Owner: tee-lab
License: gpl-3.0
Language: Python
Default Branch: master
Homepage:
Size: 131 MB

Statistics

Stars: 96
Watchers: 7
Forks: 12
Open Issues: 0
Releases: 5

Topics

python sde stochastic-differential-equations time-series time-series-analysis

Created over 6 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

README.html














README


























































PyDaddy
  
Discovering stochastic dynamical equations from ecological time
series data, together with an easy to use Python package.

Citation to the manuscript
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner,
B., Danny Raj, M., & Guttal, V., “Discovering stochastic dynamical
equations from ecological time series data”, arXiv preprint arXiv:2205.02645, to appear
in The American Naturalist.


Citation to the package
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner,
David B., Danny Raj, M., & Guttal, V. (2024). PyDaddy: A Python
Package for Discovering SDEs from Time Series Data (Version 1.1.1)
[Computer software]. https://github.com/tee-lab/PyDaddy


Authors and Contact Details

Corresponding Authors
Arshed Nabeel, IISc Mathematics Initiative and Centre for Ecological
Sciences, Indian Institute of Science, Bengaluru, Karnatata, 560012,
India. Email: arshed@iisc.ac.in and arshed.nabeel@gmail.com
Danny Raj M, Dept of Applied Mechanics and Biomedical Engineering,
IIT Madras, Chennai, 600036, India. Email: danny@iitm.ac.in
Vishwesha Guttal, Centre for Ecological Sciences, Indian Institute of
Science, Bengaluru, Karnatata, 560012, India. Email: guttal@iisc.ac.in


Code and Data Contributors
Codes and package were written by Arshed Nabeel and Ashwin
Karichannavar.
The package uses two datasets from previously published papers, which
are also made available as part of the package.

Fish dataset: Jhawar, Jitesh, et al. “Noise-induced schooling of
fish.” Nature Physics 16.4 (2020): 488-493. https://www.nature.com/articles/s41567-020-0787-y
Cell migration dataset: Brückner, David B., et al. “Stochastic
nonlinear dynamics of confined cell migration in two-state systems.”
Nature Physics 15.6 (2019): 595-601. https://www.nature.com/articles/s41567-019-0445-4




Overview of the study and package
PyDaddy is an open source package which is a key
contribution of the manuscript Nabeel et al, arXiv:2205.02645. The basic
scientific premise for this package is to discover the nature of
stochasticity in ecological time series datasets. It is well known that
the stochasticity can affect the dynamics of ecological systems in
counter-intuitive ways. Without understanding the equations (typically,
in the form of stochastic differential equations or SDEs, in short) that
govern the dynamics of populations or ecosystems, it’s challenging to
determine the impact of randomness on real datasets. In this manuscript
and accompanying package, we introduce a methodology for discovering
equations (SDEs) that transforms time series data of state variables
into stochastic differential equations. This approach merges traditional
stochastic calculus with modern equation-discovery techniques. We
showcase the generality of our method through various applications and
discuss its limitations and potential pitfalls, offering diagnostic
measures to address these challenges.
PyDaddy is a comprehensive and easy to use python
package to discover data-derived stochastic differential equations from
time series data. PyDaddy takes the time series of state variable \(x\), scalar or 2-dimensional vector, as
input and discovers an SDE of the form:
\[ \frac{dx}{dt} = f(x) + g(x) \cdot
\eta(t) \]
where \(\eta(t)\) is Gaussian white
noise. The function \(f\) is called the
drift, and governs the deterministic part of the dynamics.
\(g^2\) is called the
diffusion and governs the stochastic part of the dynamics.











An example summary plot generated by PyDaddy, for a
vector time series dataset.



PyDaddy also provides a range of functionality such as
equation-learning for the drift and diffusion functions using sparse
regresssion, a suite of diagnostic functions, etc.. For more details on
how to use the package, check out the example
notebooks and documentation.


Workflow and Documentation
The workflow of the package is summarised by the schematic given
below - which is also the Fig 1 of the manuscript (https://arxiv.org/abs/2205.02645).
A detailed workflow of the package along with detailed instructions
on various features are included as Supplementary Information Section S2
of the manuscript.
Detailed documentation of the PyDaddy package can be found at documentation.











Schematic illustration of PyDaddy functionality.





Getting Started Without Package Installation
We provide a number of easy to use scripts for the ease of learning
and using the package, and with an aim that our manuscript is easily
reproducible.
First, we emphasise that PyDaddy can be executed online on Google Colab, without
having to install it on your local machine. To run PyDaddy on Colab,
open a notebook on Colab. Paste the following code on a notebook cell
and run it:
%pip install git+https://github.com/tee-lab/PyDaddy.git
This sets up PyDaddy in the notebook environment.


Example Scripts/Jupyter Notebooks
There are several example scripts / Jupyter
notebooks provided, which can be used to familiarize yourself
with various features and functionalities of PyDaddy. These can be
executed on Colab. In the list below, we mention the path to location of
each notebook as well as a link to the google colab notebook; the latter
does not require installing either python or package on your system.

Notebook 1
(notebooks/1_getting_started.ipynb): Getting
started with scalar data: Introduction to the basic functionalities
of PyDaddy, using a 1-dimensional dataset.


Notebook 2
(notebooks/2_getting_started_vector.ipynb): Getting
started with vector data: Introduction to the basic functionalities
of PyDaddy on 2-dimensional datasets.
Notebook 3
(notebooks/3_advanced_function_fitting.ipynb): Advanced
function fitting: PyDaddy can discover analytical expressions for
the drift and diffusion functions. This notebook describes how to
customize the fitting procedure to obtain best results.
Notebook 4
(notebooks/4_sdes_from_simulated_timeseries.ipynb): Recovering
SDEs from synthetic time series: This notebook generates a simulated
time series from a user-specified SDE, and uses PyDaddy to recover the
drift and diffusion functions from the simulated time series.
Notebook 5
(notebooks/5_exporting_data.ipynb): Exporting
data: Demonstrates how to export the recovered drift and diffusion
data as CSV files or Pandas data-frames.
Notebook 6
(notebooks/6_non_poly_function_fitting.ipynb): Fitting
non-polynomial functions: PyDaddy fits polynomial functions to drift
and diffusion by default. This behaviour can be customized, this
notebook illustrates how to do this.
Notebook 9
(notebooks/9_higher_dimensions.ipynb): [Demonstration
with a 3-dimensional system] An example to demonstrate that, in
principle, the method of stochastic equation discovery can be extended
to higher dimensions.

(See below for Notebooks 7 and 8).


Real datasets and Scripts/Jupyter Notebooks
There are also two notebooks that use PyDaddy to discover SDEs from
real-world datasets.

Fish Schooling Dataset 
(pydaddy/data/fish_data/ectroplus.csv) : The fish
dataset contains the 2D polarisation vector time series of a fish school
(15 fish). Two columns in the csv file represent the x- and y-components
of the polarisation vector, respectively and each row corresponds to a
time stamp, with consecutive rows separated by a time frame of 0.04
seconds. The full dataset is available at a previously published
repository: https://zenodo.org/records/3632470. For more details
about the dataset, see the manuscript Jhawar et al - https://doi.org/10.1038/s41567-020-0787-y

Notebook 7
(notebooks/7_example_fish_school.ipynb): Example
analysis - fish schooling: An example analysis of a fish schooling
dataset (Jhawar et. al., Nature Physics, 2020) using PyDaddy.

Cell Migration Dataset 
(pydaddy/data/cell_data/trajectories_x_pattern5.txt):
The confine cell migration dataset contains tracked trajectories of 149
cells, tracked for upto 300 time steps each, with one data point every
15 minutes. The data is provided as a plain text file. Each row
corresponds to the time series of one cell. For more details about the
dataset, see https://doi.org/10.1038/s41567-019-0445-4.

Notebook 8
(notebooks/8_example_cell_migration.ipynb): Example
analysis - cell migration: An example analysis of a confined cell
migration dataset (Brückner et. al., Nature Physics, 2019) using
PyDaddy.




Folder structure
The zipped folder of codes and data is structured as follows:

parent folder has licence, citation, readme.md, etc files
doc and its subfolders contain python codes and
style files relevant to the package. Edit these only you are a developer
and are proficient with python.
notebooks contains nine well commented/documented
jupyter-notebooks/scripts which help the readers to familiarise with the
usage of the package.
pydaddy/data folder contains three subfolders
containing key real and model datasets:

cell_data
fish_data
model_data

pydaddy and its subfolders contain various codes
related to python package. Edit these only you are a developer and are
proficient with python.



Package Installation
PyDaddy is available both on PyPI and Anaconda Cloud, and can be
installed on any system with a Python 3 environment. If you don’t have
Python 3 installed on your system, we recommend using Anaconda or Miniconda. See
the PyDaddy package
documentation for detailed installation instructions.

Using pip
  
To install the latest stable release version of PyDaddy, use:
pip install pydaddy
To install the latest development version of PyDaddy, use:
pip install git+https://github.com/tee-lab/PyDaddy.git


Using anaconda



To install using conda, Anaconda or Miniconda need
to be installed first. Once this is done, use the following command.
conda install -c tee-lab pydaddy




Detailed Package Documentation
For more information about PyDaddy, check out the package documentation.


Citation
If you are using this package in your research, please cite the
repository and the associated paper as follows:
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner,
David B., Danny Raj, M., & Guttal, V. (2024). PyDaddy: A Python
Package for Discovering SDEs from Time Series Data (Version 1.1.1)
[Computer software]. https://github.com/tee-lab/PyDaddy, DOI: To Do.
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner,
B., Danny Raj, M., & Guttal, V., “Discovering stochastic dynamical
equations from ecological time series data”, arXiv preprint arXiv:2205.02645, to appear
in The American Naturalist.


Funding
This study was partially funded by Science and Engineering Research
Board, Department of Science and Technology, Government of India to
Vishwesha Guttal.


Licence
PyDaddy is distributed under the GNU
General Public License v3.0.


An example summary plot generated by PyDaddy, for a vector time series dataset.


Schematic illustration of PyDaddy functionality.

Owner

Name: TEE Lab
Login: tee-lab
Kind: organization
Location: India

Website: http://teelabiisc.wordpress.com
Twitter: vishuguttal
Repositories: 6
Profile: https://github.com/tee-lab

Reposities of Codes on Projects/Publications by TEE-LAB, CES, IISc

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Nabeel"
  given-names: "Arshed"
  orcid: "https://orcid.org/0000-0001-9750-9070"
  
- family-names: "Karichannavar"
  given-names: "Ashwin"

- family-names: "Palathingal"
  given-names: "Shuaib"

- family-names: "Jhawar"
  given-names: "Jitesh"
  orcid: "https://orcid.org/0000-0002-8774-2351"
 
- family-names: "Brückner"
  given-names: "David B."
  orcid: "https://orcid.org/0000-0001-7205-2975"

- family-names: "Danny Raj"
  given-names: "Masila"
  orcid: "https://orcid.org/0000-0002-6983-0390"

- family-names: "Guttal"
  given-names: "Vishwesha"
  orcid: "https://orcid.org/0000-0002-2677-857X"
  
title: "PyDaddy: A Python Package for Discovering SDEs from Time Series Data"
version: 1.1.1

date-released: 2022-05-05
url: "https://github.com/tee-lab/PyDaddy"

GitHub Events

Total

Watch event: 5
Push event: 2
Pull request event: 2
Fork event: 2
Create event: 1

Last Year

Watch event: 5
Push event: 2
Pull request event: 2
Fork event: 2
Create event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 3
Total pull requests: 63
Average time to close issues: 1 day
Average time to close pull requests: about 9 hours
Total issue authors: 3
Total pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.02
Merged pull requests: 63
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Beliavsky (1)
blipblipgo (1)
LRydin (1)

Pull Request Authors

arshednabeel (54)
ashwinkk23 (15)
dependabot[bot] (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (1)

Packages

Total packages: 1
Total downloads:
- pypi 20 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 2
Total maintainers: 1

pypi.org: pydaddy

Package to analyse stochastic time series data

Homepage: https://github.com/tee-lab/pydaddy
Documentation: https://pydaddy.readthedocs.io/
License: GNU General Public License v3 (GPLv3)
Latest release: 1.0.0
published almost 4 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 20 Last month

Rankings

Stargazers count: 7.7%

Dependent packages count: 10.0%

Forks count: 11.4%

Average: 18.5%

Dependent repos count: 21.7%

Downloads: 41.9%

Maintainers (1)

tee-lab

Last synced: 11 months ago

Dependencies

environment.yml conda

jupyterlab
notebook
pip
python 3.7.*

docs/requirements.txt pypi

ipykernel *
matplotlib *
myst-nb *
nbconvert *
nbsphinx *
notebook *
numpy *
pandas *
plotly *
scikit-learn *
scipy *
sdeint *
seaborn *
setuptools *
sphinx-rtd-theme >=1.0.0
sphinxcontrib-contentui *
tqdm *

requirements.txt pypi

matplotlib *
nbconvert *
notebook *
numpy *
pandas *
plotly *
scikit-learn *
scipy *
sdeint *
seaborn *
setuptools *
tqdm *