pydaddy
Python package to discover stochastic differential equations from time series data
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, nature.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary
Keywords
Repository
Python package to discover stochastic differential equations from time series data
Basic Info
Statistics
- Stars: 96
- Watchers: 7
- Forks: 12
- Open Issues: 0
- Releases: 5
Topics
Metadata Files
README.html
README PyDaddy
Discovering stochastic dynamical equations from ecological time series data, together with an easy to use Python package.
Citation to the manuscript
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, B., Danny Raj, M., & Guttal, V., “Discovering stochastic dynamical equations from ecological time series data”, arXiv preprint arXiv:2205.02645, to appear in The American Naturalist.
Citation to the package
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, David B., Danny Raj, M., & Guttal, V. (2024). PyDaddy: A Python Package for Discovering SDEs from Time Series Data (Version 1.1.1) [Computer software]. https://github.com/tee-lab/PyDaddy
Overview of the study and package
PyDaddy is an open source package which is a key contribution of the manuscript Nabeel et al, arXiv:2205.02645. The basic scientific premise for this package is to discover the nature of stochasticity in ecological time series datasets. It is well known that the stochasticity can affect the dynamics of ecological systems in counter-intuitive ways. Without understanding the equations (typically, in the form of stochastic differential equations or SDEs, in short) that govern the dynamics of populations or ecosystems, it’s challenging to determine the impact of randomness on real datasets. In this manuscript and accompanying package, we introduce a methodology for discovering equations (SDEs) that transforms time series data of state variables into stochastic differential equations. This approach merges traditional stochastic calculus with modern equation-discovery techniques. We showcase the generality of our method through various applications and discuss its limitations and potential pitfalls, offering diagnostic measures to address these challenges.
PyDaddy is a comprehensive and easy to use python package to discover data-derived stochastic differential equations from time series data. PyDaddy takes the time series of state variable \(x\), scalar or 2-dimensional vector, as input and discovers an SDE of the form:
\[ \frac{dx}{dt} = f(x) + g(x) \cdot \eta(t) \]
where \(\eta(t)\) is Gaussian white noise. The function \(f\) is called the drift, and governs the deterministic part of the dynamics. \(g^2\) is called the diffusion and governs the stochastic part of the dynamics.
An example summary plot generated by PyDaddy, for a vector time series dataset. PyDaddy also provides a range of functionality such as equation-learning for the drift and diffusion functions using sparse regresssion, a suite of diagnostic functions, etc.. For more details on how to use the package, check out the example notebooks and documentation.
Workflow and Documentation
The workflow of the package is summarised by the schematic given below - which is also the Fig 1 of the manuscript (https://arxiv.org/abs/2205.02645).
A detailed workflow of the package along with detailed instructions on various features are included as Supplementary Information Section S2 of the manuscript.
Detailed documentation of the PyDaddy package can be found at documentation.
Schematic illustration of PyDaddy functionality. Getting Started Without Package Installation
We provide a number of easy to use scripts for the ease of learning and using the package, and with an aim that our manuscript is easily reproducible.
First, we emphasise that PyDaddy can be executed online on Google Colab, without having to install it on your local machine. To run PyDaddy on Colab, open a notebook on Colab. Paste the following code on a notebook cell and run it:
%pip install git+https://github.com/tee-lab/PyDaddy.gitThis sets up PyDaddy in the notebook environment.
Example Scripts/Jupyter Notebooks
There are several example scripts / Jupyter notebooks provided, which can be used to familiarize yourself with various features and functionalities of PyDaddy. These can be executed on Colab. In the list below, we mention the path to location of each notebook as well as a link to the google colab notebook; the latter does not require installing either python or package on your system.
- Notebook 1 (notebooks/1_getting_started.ipynb): Getting started with scalar data: Introduction to the basic functionalities of PyDaddy, using a 1-dimensional dataset.
- Notebook 2 (notebooks/2_getting_started_vector.ipynb): Getting started with vector data: Introduction to the basic functionalities of PyDaddy on 2-dimensional datasets.
- Notebook 3 (notebooks/3_advanced_function_fitting.ipynb): Advanced function fitting: PyDaddy can discover analytical expressions for the drift and diffusion functions. This notebook describes how to customize the fitting procedure to obtain best results.
- Notebook 4 (notebooks/4_sdes_from_simulated_timeseries.ipynb): Recovering SDEs from synthetic time series: This notebook generates a simulated time series from a user-specified SDE, and uses PyDaddy to recover the drift and diffusion functions from the simulated time series.
- Notebook 5 (notebooks/5_exporting_data.ipynb): Exporting data: Demonstrates how to export the recovered drift and diffusion data as CSV files or Pandas data-frames.
- Notebook 6 (notebooks/6_non_poly_function_fitting.ipynb): Fitting non-polynomial functions: PyDaddy fits polynomial functions to drift and diffusion by default. This behaviour can be customized, this notebook illustrates how to do this.
- Notebook 9 (notebooks/9_higher_dimensions.ipynb): [Demonstration with a 3-dimensional system] An example to demonstrate that, in principle, the method of stochastic equation discovery can be extended to higher dimensions.
(See below for Notebooks 7 and 8).
Real datasets and Scripts/Jupyter Notebooks
There are also two notebooks that use PyDaddy to discover SDEs from real-world datasets.
- Fish Schooling Dataset (pydaddy/data/fish_data/ectroplus.csv) : The fish dataset contains the 2D polarisation vector time series of a fish school (15 fish). Two columns in the csv file represent the x- and y-components of the polarisation vector, respectively and each row corresponds to a time stamp, with consecutive rows separated by a time frame of 0.04 seconds. The full dataset is available at a previously published repository: https://zenodo.org/records/3632470. For more details about the dataset, see the manuscript Jhawar et al - https://doi.org/10.1038/s41567-020-0787-y
- Notebook 7 (notebooks/7_example_fish_school.ipynb): Example analysis - fish schooling: An example analysis of a fish schooling dataset (Jhawar et. al., Nature Physics, 2020) using PyDaddy.
- Cell Migration Dataset (pydaddy/data/cell_data/trajectories_x_pattern5.txt): The confine cell migration dataset contains tracked trajectories of 149 cells, tracked for upto 300 time steps each, with one data point every 15 minutes. The data is provided as a plain text file. Each row corresponds to the time series of one cell. For more details about the dataset, see https://doi.org/10.1038/s41567-019-0445-4.
- Notebook 8 (notebooks/8_example_cell_migration.ipynb): Example analysis - cell migration: An example analysis of a confined cell migration dataset (Brückner et. al., Nature Physics, 2019) using PyDaddy.
Folder structure
The zipped folder of codes and data is structured as follows:
- parent folder has licence, citation, readme.md, etc files
- doc and its subfolders contain python codes and style files relevant to the package. Edit these only you are a developer and are proficient with python.
- notebooks contains nine well commented/documented jupyter-notebooks/scripts which help the readers to familiarise with the usage of the package.
- pydaddy/data folder contains three subfolders containing key real and model datasets:
- cell_data
- fish_data
- model_data
- pydaddy and its subfolders contain various codes related to python package. Edit these only you are a developer and are proficient with python.
Package Installation
PyDaddy is available both on PyPI and Anaconda Cloud, and can be installed on any system with a Python 3 environment. If you don’t have Python 3 installed on your system, we recommend using Anaconda or Miniconda. See the PyDaddy package documentation for detailed installation instructions.
Using pip
![]()
![]()
To install the latest stable release version of PyDaddy, use:
pip install pydaddyTo install the latest development version of PyDaddy, use:
pip install git+https://github.com/tee-lab/PyDaddy.gitDetailed Package Documentation
For more information about PyDaddy, check out the package documentation.
Citation
If you are using this package in your research, please cite the repository and the associated paper as follows:
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, David B., Danny Raj, M., & Guttal, V. (2024). PyDaddy: A Python Package for Discovering SDEs from Time Series Data (Version 1.1.1) [Computer software]. https://github.com/tee-lab/PyDaddy, DOI: To Do.
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, B., Danny Raj, M., & Guttal, V., “Discovering stochastic dynamical equations from ecological time series data”, arXiv preprint arXiv:2205.02645, to appear in The American Naturalist.
Funding
This study was partially funded by Science and Engineering Research Board, Department of Science and Technology, Government of India to Vishwesha Guttal.
Licence
PyDaddy is distributed under the GNU General Public License v3.0.
Owner
- Name: TEE Lab
- Login: tee-lab
- Kind: organization
- Location: India
- Website: http://teelabiisc.wordpress.com
- Twitter: vishuguttal
- Repositories: 6
- Profile: https://github.com/tee-lab
Reposities of Codes on Projects/Publications by TEE-LAB, CES, IISc
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Nabeel" given-names: "Arshed" orcid: "https://orcid.org/0000-0001-9750-9070" - family-names: "Karichannavar" given-names: "Ashwin" - family-names: "Palathingal" given-names: "Shuaib" - family-names: "Jhawar" given-names: "Jitesh" orcid: "https://orcid.org/0000-0002-8774-2351" - family-names: "Brückner" given-names: "David B." orcid: "https://orcid.org/0000-0001-7205-2975" - family-names: "Danny Raj" given-names: "Masila" orcid: "https://orcid.org/0000-0002-6983-0390" - family-names: "Guttal" given-names: "Vishwesha" orcid: "https://orcid.org/0000-0002-2677-857X" title: "PyDaddy: A Python Package for Discovering SDEs from Time Series Data" version: 1.1.1 date-released: 2022-05-05 url: "https://github.com/tee-lab/PyDaddy"
GitHub Events
Total
- Watch event: 5
- Push event: 2
- Pull request event: 2
- Fork event: 2
- Create event: 1
Last Year
- Watch event: 5
- Push event: 2
- Pull request event: 2
- Fork event: 2
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 63
- Average time to close issues: 1 day
- Average time to close pull requests: about 9 hours
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.02
- Merged pull requests: 63
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Beliavsky (1)
- blipblipgo (1)
- LRydin (1)
Pull Request Authors
- arshednabeel (54)
- ashwinkk23 (15)
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 20 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 2
- Total maintainers: 1
pypi.org: pydaddy
Package to analyse stochastic time series data
- Homepage: https://github.com/tee-lab/pydaddy
- Documentation: https://pydaddy.readthedocs.io/
- License: GNU General Public License v3 (GPLv3)
-
Latest release: 1.0.0
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- jupyterlab
- notebook
- pip
- python 3.7.*
- ipykernel *
- matplotlib *
- myst-nb *
- nbconvert *
- nbsphinx *
- notebook *
- numpy *
- pandas *
- plotly *
- scikit-learn *
- scipy *
- sdeint *
- seaborn *
- setuptools *
- sphinx-rtd-theme >=1.0.0
- sphinxcontrib-contentui *
- tqdm *
- matplotlib *
- nbconvert *
- notebook *
- numpy *
- pandas *
- plotly *
- scikit-learn *
- scipy *
- sdeint *
- seaborn *
- setuptools *
- tqdm *