pySRURGS - a python package for symbolic regression by uniform random global search

pySRURGS - a python package for symbolic regression by uniform random global search - Published in JOSS (2019)

https://github.com/pysrurgs/pysrurgs

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

binary-tree enumeration python python3 random symbolic-regression

Keywords from Contributors

genetic-programming global-optimization random-search tree-structure

Scientific Fields

Engineering Computer Science - 60% confidence
Mathematics Computer Science - 43% confidence
Last synced: 4 months ago · JSON representation

Repository

Symbolic regression by uniform random global search

Basic Info
  • Host: GitHub
  • Owner: pySRURGS
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 8.57 MB
Statistics
  • Stars: 13
  • Watchers: 3
  • Forks: 3
  • Open Issues: 4
  • Releases: 3
Topics
binary-tree enumeration python python3 random symbolic-regression
Created over 6 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

Binoculars

Symbolic Regression by Uniform Random Global Search

Build Status status Coverage Status License: GPL v3 python versions DOI

Symbolic regression is a type of data analysis problem where you search for the equation of best fit for a numerical dataset. This package does this task by randomly, with uniform probability of selection, guessing candidate solutions and evaluating them. The No Free Lunch Theorem argues that random search should be equivalent to other approaches like Genetic Programming when assessing algorithm performing over all possible problems. This software should be useful for data analysts and researchers working on symbolic regression problems.

Features

  1. Robust parameter fitting
  2. Memoization for speed
  3. Avoids many arithmetically equivalent equations
  4. Loads data from CSV files
  5. Results saved to SQLite file.
  6. Results of new runs are added to results of old runs.
  7. User specified number of fitting parameters.
  8. User specified number of permitted unique binary trees, which determine the possible equation forms
  9. User specified permitted functions of arity 1 or 2
  10. Can also run an exhaustive/brute-force search
  11. Can be run in deterministic mode for reproducibility
  12. Developed and tested on Python 3.6

Getting Started

It's a python3 script. Download it and run it via a terminal.

Installing

Clone the repo then install the prerequisites.

git clone https://github.com/pySRURGS/pySRURGS.git cd pySRURGS pip install -r requirements.txt --user

Command line help

python3 pySRURGS.py -h

The above command should render the following:

``` usage: pySRURGS.py [-h] [-memoizefuncs] [-count] [-benchmarks] [-deterministic] [-plotting] [-exhaustive] [-funcsaritytwo FUNCSARITYTWO] [-funcsarityone FUNCSARITYONE] [-maxnumfitparams MAXNUMFITPARAMS] [-maxpermittedtrees MAXPERMITTEDTREES] [-pathtodb PATHTODB] [-pathtoweights PATHTO_WEIGHTS] train iters

positional arguments: train absolute or relative file path to the csv file housing the training data. The rightmost column of the CSV file should be the dependent variable. iters the number of equations to be attempted in this run

optional arguments: -h, --help show this help message and exit -memoizefuncs memoize the computations. If you are running large iters and you do not have massive ram, do not use this option. (default: False) -count Instead of doing symbolic regression, just count out how many possible equations for this configuration. No other processing performed. (default: False) -benchmarks Instead of doing symbolic regression, generate the 100 benchmark problems. No other processing performed. (default: False) -deterministic If set, the pseudorandom number generator will act in a predictable manner and pySRURGS will produce reproducible results. (default: False) -plotting plot the best model against the data to ./image/plot.png and ./image/plot.svg - note only works for univariate datasets (default: False) -exhaustive instead of running pure random search, do an exhaustive search. Be careful about running this as it may run forever. iters gets ignored. (default: False) -funcsaritytwo FUNCSARITYTWO a comma separated string listing the functions of arity two you want to be considered. Permitted:add,sub,mul,div,pow (default: add,sub,mul,div,pow) -funcsarityone FUNCSARITYONE a comma separated string listing the functions of arity one you want to be considered. Permitted:sin,cos,tan,exp,log,sinh,cosh,tanh (default: None) -maxnumfitparams MAXNUMFITPARAMS the maximum number of fitting parameters permitted in the generated models (default: 3) -maxpermittedtrees MAXPERMITTEDTREES the number of unique binary trees that are permitted in the generated models - binary trees define the form of the equation, increasing this number tends to increase the complexity of generated equations (default: 1000) -pathtodb PATHTODB the absolute or relative path to the database file where we will save results. If not set, will save database file to ./db directory with same name as the csv file. (default: None) -pathtoweights PATHTO_WEIGHTS the absolute or relative path to the CSV file where we store the weights for each point in the dataset. The CSV file should be a single column of non-negative numerical data without a header. If not set, weights are equal to one for all data points. (default: None) ```

Important details

All your data needs to be numeric. Your CSV file should have a header. Inside the csv, the dependent variable should be the rightmost column. Do not use special characters or spaces in variable names and start variable names with a letter (but not the letter 'p'). The fitting parameters are displayed as the letter 'p' followed by an integer.

An example

A sample problem is provided. The filename denotes the true equation.

``` $ winpty python pySRURGS.py -maxnumfitparams 3 -maxpermittedtrees 1000 -plotting ./csv/quarticpolynomial.csv 2000 Making sure we meet the iters value Normalized Mean Squared Error R^2 Equation, simplified Parameters


                4.24058e-05  0.999999  ((p0 + p1)/(p0*x))**(-p0 + p1 + x) + (p0*x + p0 + p2)**(p2*x/p1)        4.47E+00,3.47E-01,2.25E-01
                0.000141492  0.999996  (-2*p2*x*(p0*(p1 - x) + x) + (p0*p1*(p1 + x))**x*(p1 - x))/(p1 - x)     1.80E+00,1.34E+00,7.04E-02
                0.000154517  0.999996  x*(p2/p0)**x*(-p0 + p2)**p2 - (x**p1)**p1 + 1                           4.72E-01,1.17E+00,1.69E+00
                0.0001829    0.999995  -(p0 + x)*(p1*p2*(p2 - x) - (p2 - x)**x)/(p0**2*p1*p2)                  -2.11E+01,-9.23E-03,1.24E+01
                0.00021193   0.999995  ((p1**x)**p1 + (x**(-p0 + p2))**p2)*((p1 + x**p2)**x)**(-p0*(p0 - p1))  4.09E-01,1.61E+00,1.75E+00

```

Example performance

Another example - using relative weights for data points

Suppose you value some data points more than others, and you wish to ensure that those valued data points have greater weight than other data points. You can specify a path to a weights CSV file, which will inform the code the relative weight of the data points. An example where some of the data has no weight and the remainder have equal weight is shown below and is included in the repository.

winpty python pySRURGS.py -plotting -path_to_weights ./csv/weights.csv ./csv/weights_data.csv 300

plot for data point weights feature

Database file

The database file is in Sqlite3 format, and we access it using the SqliteDict package. For example. if we have already run some computations against the quartic_polynomial example, then we can run the following to inspect the results.

``` import pySRURGS from resultclass import Result # Result needs to be in the namespace. from sqlitedict import SqliteDict pathtocsv = './csv/quarticpolynomial.csv' pathtodb = './db/quarticpolynomial.db' SRconfig = pySRURGS.SymbolicRegressionConfig(pathtocsv, pathtodb) resultlist = pySRURGS.getresultlist(SRconfig) resultlist.sort()

after running sort, zero^th element is the best result

bestresult = resultlist.results[0] print("R^2:", bestresult.R2, "Equation:", bestresult.simpleequation, "Unsimplified Equation:", bestresult.equation) resultlist.print(dataset.y_data) ```

API

Documentation

Author

Sohrab Towfighi

License

This project is licensed under the GPL 3.0 License - see the LICENSE file for details

How to Cite

If you use this software in your research, then please cite our papers.

Towfighi (2019). pySRURGS - a python package for symbolic regression by uniform random global search. Journal of Open Source Software, 4(41), 1675, https://doi.org/10.21105/joss.01675

Towfighi (2020). Symbolic Regression by Uniform Random Global Search. SN Applied Sciences 2: 34. https://doi.org/10.1007/s42452-019-1734-3

Community

If you would like to contribute to the project or you need help, then please create an issue.

With regards to community suggested changes, I would comment as to whether it would be within the scope of the project to include the suggested changes. If both parties are in agreement, whomever is interested in developing the changes can make a pull request, or I will implement the suggested changes.

Acknowledgments

  • Luther Tychonievich created the algorithm mapping integers to full binary trees: link, web archived link.
  • The icon is from the GNOME desktop icons project and the respective artists. Taken from link, web archived link. License: GPL version 2.0.

Owner

  • Login: pySRURGS
  • Kind: user
  • Location: Vancouver, Canada

This is a projects page by Sohrab Towfighi.

JOSS Publication

pySRURGS - a python package for symbolic regression by uniform random global search
Published
September 20, 2019
Volume 4, Issue 41, Page 1675
Authors
Sohrab Towfighi ORCID
University of Toronto, Faculty of Medicine
Editor
Monica Bobra ORCID
Tags
symbolic regression regression analysis random search genetic programming

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,724
  • Total Committers: 5
  • Avg Commits per committer: 344.8
  • Development Distribution Score (DDS): 0.292
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
sohrabtowfighi s****i@h****m 1,220
pySRURGS 5****S 388
sohrabtowfighi s****i@g****m 104
Ubuntu t****d@t****t 8
Sohrab Towfighi T****S@S****T 4

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 21
  • Total pull requests: 3
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 9
  • Total pull request authors: 1
  • Average comments per issue: 3.24
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • pySRURGS (8)
  • hbarthels (4)
  • BisUndefined (3)
  • sohrabtowfighi (1)
  • Lolojr (1)
  • Fabssbtr (1)
  • qinchunxiong (1)
  • anthonyrollett (1)
  • AoifeHughes (1)
Pull Request Authors
  • dependabot[bot] (3)
Top Labels
Issue Labels
enhancement (4) refactor (1) help wanted (1) good first issue (1)
Pull Request Labels
dependencies (3)

Dependencies

experiments/requirements.txt pypi
  • deap *
  • dropbox *
  • pymysql *
  • scoop *
  • sh *
  • sshtunnel *
requirements.txt pypi
  • lmfit *
  • matplotlib *
  • mpmath *
  • numpy *
  • pandas *
  • parmap *
  • pbs *
  • pytest *
  • pytest-cov *
  • python-coveralls *
  • scipy *
  • sh *
  • sqlitedict *
  • sympy *
  • tabulate *
  • tqdm *
requirements_versions.txt pypi
  • lmfit ==0.9.13
  • matplotlib ==3.1.1
  • mpmath ==0.19
  • numpy ==1.17.0
  • pandas ==0.25.0
  • parmap ==1.5.2
  • pbs ==0.110
  • pytest ==5.1.1
  • pytest-cov ==2.7.1
  • python-coveralls ==2.9.3
  • scipy ==1.3.1
  • sh ==1.12.14
  • sqlitedict ==1.6.0
  • sympy ==1.0
  • tabulate ==0.7.7
  • tqdm ==4.19.8