pySRURGS - a python package for symbolic regression by uniform random global search

pySRURGS - a python package for symbolic regression by uniform random global search - Published in JOSS (2019)

https://github.com/pysrurgs/pysrurgs

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org, zenodo.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

binary-tree enumeration python python3 random symbolic-regression

Keywords from Contributors

genetic-programming global-optimization random-search tree-structure

Scientific Fields

Engineering Computer Science - 60% confidence

Mathematics Computer Science - 43% confidence

Last synced: 6 months ago · JSON representation

Repository

Symbolic regression by uniform random global search

Basic Info

Host: GitHub
Owner: pySRURGS
License: gpl-3.0
Language: Python
Default Branch: master
Homepage:
Size: 8.57 MB

Statistics

Stars: 13
Watchers: 3
Forks: 3
Open Issues: 4
Releases: 3

Topics

binary-tree enumeration python python3 random symbolic-regression

Created over 6 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

Symbolic Regression by Uniform Random Global Search

Symbolic regression is a type of data analysis problem where you search for the equation of best fit for a numerical dataset. This package does this task by randomly, with uniform probability of selection, guessing candidate solutions and evaluating them. The No Free Lunch Theorem argues that random search should be equivalent to other approaches like Genetic Programming when assessing algorithm performing over all possible problems. This software should be useful for data analysts and researchers working on symbolic regression problems.

Features

Robust parameter fitting
Memoization for speed
Avoids many arithmetically equivalent equations
Loads data from CSV files
Results saved to SQLite file.
Results of new runs are added to results of old runs.
User specified number of fitting parameters.
User specified number of permitted unique binary trees, which determine the possible equation forms
User specified permitted functions of arity 1 or 2
Can also run an exhaustive/brute-force search
Can be run in deterministic mode for reproducibility
Developed and tested on Python 3.6

Getting Started

It's a python3 script. Download it and run it via a terminal.

Installing

Clone the repo then install the prerequisites.

git clone https://github.com/pySRURGS/pySRURGS.git cd pySRURGS pip install -r requirements.txt --user

Command line help

python3 pySRURGS.py -h

The above command should render the following:

``` usage: pySRURGS.py [-h] [-memoizefuncs] [-count] [-benchmarks] [-deterministic] [-plotting] [-exhaustive] [-funcsaritytwo FUNCSARITYTWO] [-funcsarityone FUNCSARITYONE] [-maxnumfitparams MAXNUMFITPARAMS] [-maxpermittedtrees MAXPERMITTEDTREES] [-pathtodb PATHTODB] [-pathtoweights PATHTO_WEIGHTS] train iters

positional arguments: train absolute or relative file path to the csv file housing the training data. The rightmost column of the CSV file should be the dependent variable. iters the number of equations to be attempted in this run

optional arguments: -h, --help show this help message and exit -memoizefuncs memoize the computations. If you are running large iters and you do not have massive ram, do not use this option. (default: False) -count Instead of doing symbolic regression, just count out how many possible equations for this configuration. No other processing performed. (default: False) -benchmarks Instead of doing symbolic regression, generate the 100 benchmark problems. No other processing performed. (default: False) -deterministic If set, the pseudorandom number generator will act in a predictable manner and pySRURGS will produce reproducible results. (default: False) -plotting plot the best model against the data to ./image/plot.png and ./image/plot.svg - note only works for univariate datasets (default: False) -exhaustive instead of running pure random search, do an exhaustive search. Be careful about running this as it may run forever. iters gets ignored. (default: False) -funcsaritytwo FUNCSARITYTWO a comma separated string listing the functions of arity two you want to be considered. Permitted:add,sub,mul,div,pow (default: add,sub,mul,div,pow) -funcsarityone FUNCSARITYONE a comma separated string listing the functions of arity one you want to be considered. Permitted:sin,cos,tan,exp,log,sinh,cosh,tanh (default: None) -maxnumfitparams MAXNUMFITPARAMS the maximum number of fitting parameters permitted in the generated models (default: 3) -maxpermittedtrees MAXPERMITTEDTREES the number of unique binary trees that are permitted in the generated models - binary trees define the form of the equation, increasing this number tends to increase the complexity of generated equations (default: 1000) -pathtodb PATHTODB the absolute or relative path to the database file where we will save results. If not set, will save database file to ./db directory with same name as the csv file. (default: None) -pathtoweights PATHTO_WEIGHTS the absolute or relative path to the CSV file where we store the weights for each point in the dataset. The CSV file should be a single column of non-negative numerical data without a header. If not set, weights are equal to one for all data points. (default: None) ```

Important details

All your data needs to be numeric. Your CSV file should have a header. Inside the csv, the dependent variable should be the rightmost column. Do not use special characters or spaces in variable names and start variable names with a letter (but not the letter 'p'). The fitting parameters are displayed as the letter 'p' followed by an integer.

An example

A sample problem is provided. The filename denotes the true equation.

``` $ winpty python pySRURGS.py -maxnumfitparams 3 -maxpermittedtrees 1000 -plotting ./csv/quarticpolynomial.csv 2000 Making sure we meet the iters value Normalized Mean Squared Error R^2 Equation, simplified Parameters

                4.24058e-05  0.999999  ((p0 + p1)/(p0*x))**(-p0 + p1 + x) + (p0*x + p0 + p2)**(p2*x/p1)        4.47E+00,3.47E-01,2.25E-01
                0.000141492  0.999996  (-2*p2*x*(p0*(p1 - x) + x) + (p0*p1*(p1 + x))**x*(p1 - x))/(p1 - x)     1.80E+00,1.34E+00,7.04E-02
                0.000154517  0.999996  x*(p2/p0)**x*(-p0 + p2)**p2 - (x**p1)**p1 + 1                           4.72E-01,1.17E+00,1.69E+00
                0.0001829    0.999995  -(p0 + x)*(p1*p2*(p2 - x) - (p2 - x)**x)/(p0**2*p1*p2)                  -2.11E+01,-9.23E-03,1.24E+01
                0.00021193   0.999995  ((p1**x)**p1 + (x**(-p0 + p2))**p2)*((p1 + x**p2)**x)**(-p0*(p0 - p1))  4.09E-01,1.61E+00,1.75E+00

```

Example performance

Another example - using relative weights for data points

Suppose you value some data points more than others, and you wish to ensure that those valued data points have greater weight than other data points. You can specify a path to a weights CSV file, which will inform the code the relative weight of the data points. An example where some of the data has no weight and the remainder have equal weight is shown below and is included in the repository.

winpty python pySRURGS.py -plotting -path_to_weights ./csv/weights.csv ./csv/weights_data.csv 300

plot for data point weights feature

Database file

The database file is in Sqlite3 format, and we access it using the SqliteDict package. For example. if we have already run some computations against the quartic_polynomial example, then we can run the following to inspect the results.

``` import pySRURGS from resultclass import Result # Result needs to be in the namespace. from sqlitedict import SqliteDict pathtocsv = './csv/quarticpolynomial.csv' pathtodb = './db/quarticpolynomial.db' SRconfig = pySRURGS.SymbolicRegressionConfig(pathtocsv, pathtodb) resultlist = pySRURGS.getresultlist(SRconfig) resultlist.sort()

after running sort, zero^th element is the best result

bestresult = resultlist.results[0] print("R^2:", bestresult.R2, "Equation:", bestresult.simpleequation, "Unsimplified Equation:", bestresult.equation) resultlist.print(dataset.y_data) ```

API

Documentation

Author

Sohrab Towfighi

License

This project is licensed under the GPL 3.0 License - see the LICENSE file for details

How to Cite

If you use this software in your research, then please cite our papers.

Towfighi (2019). pySRURGS - a python package for symbolic regression by uniform random global search. Journal of Open Source Software, 4(41), 1675, https://doi.org/10.21105/joss.01675

Towfighi (2020). Symbolic Regression by Uniform Random Global Search. SN Applied Sciences 2: 34. https://doi.org/10.1007/s42452-019-1734-3

Community

If you would like to contribute to the project or you need help, then please create an issue.

With regards to community suggested changes, I would comment as to whether it would be within the scope of the project to include the suggested changes. If both parties are in agreement, whomever is interested in developing the changes can make a pull request, or I will implement the suggested changes.

Acknowledgments

Luther Tychonievich created the algorithm mapping integers to full binary trees: link, web archived link.
The icon is from the GNOME desktop icons project and the respective artists. Taken from link, web archived link. License: GPL version 2.0.

Owner

Login: pySRURGS
Kind: user
Location: Vancouver, Canada

Repositories: 5
Profile: https://github.com/pySRURGS

This is a projects page by Sohrab Towfighi.

JOSS Publication

pySRURGS - a python package for symbolic regression by uniform random global search

Published

September 20, 2019

DOI

10.21105/joss.01675

Volume 4, Issue 41, Page 1675

Authors

Sohrab Towfighi

University of Toronto, Faculty of Medicine

Editor

Monica Bobra

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 1,724
Total Committers: 5
Avg Commits per committer: 344.8
Development Distribution Score (DDS): 0.292

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
sohrabtowfighi	s**i@h**m	1,220
pySRURGS	5****S	388
sohrabtowfighi	s**i@g**m	104
Ubuntu	t**d@t**t	8
Sohrab Towfighi	T**S@S**T	4

Committer Domains (Top 20 + Academic)

smh.smhroot.net: 1 test.ejfpszihzt5ujgj52edlizyp1b.bx.internal.cloudapp.net: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 21
Total pull requests: 3
Average time to close issues: about 2 months
Average time to close pull requests: N/A
Total issue authors: 9
Total pull request authors: 1
Average comments per issue: 3.24
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

pySRURGS (8)
hbarthels (4)
BisUndefined (3)
sohrabtowfighi (1)
Lolojr (1)
Fabssbtr (1)
qinchunxiong (1)
anthonyrollett (1)
AoifeHughes (1)

Pull Request Authors

dependabot[bot] (3)

Top Labels

Issue Labels

enhancement (4) refactor (1) help wanted (1) good first issue (1)

Pull Request Labels

dependencies (3)

Dependencies

experiments/requirements.txt pypi

deap *
dropbox *
pymysql *
scoop *
sh *
sshtunnel *

requirements.txt pypi

lmfit *
matplotlib *
mpmath *
numpy *
pandas *
parmap *
pbs *
pytest *
pytest-cov *
python-coveralls *
scipy *
sh *
sqlitedict *
sympy *
tabulate *
tqdm *

requirements_versions.txt pypi

lmfit ==0.9.13
matplotlib ==3.1.1
mpmath ==0.19
numpy ==1.17.0
pandas ==0.25.0
parmap ==1.5.2
pbs ==0.110
pytest ==5.1.1
pytest-cov ==2.7.1
python-coveralls ==2.9.3
scipy ==1.3.1
sh ==1.12.14
sqlitedict ==1.6.0
sympy ==1.0
tabulate ==0.7.7
tqdm ==4.19.8

pySRURGS - a python package for symbolic regression by uniform random global search

Science Score: 93.0%

Keywords

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Symbolic Regression by Uniform Random Global Search

Features

Getting Started

Installing

Command line help

Important details

An example

Another example - using relative weights for data points

Database file

after running sort, zero^th element is the best result

API

Author

License

How to Cite

Community

Acknowledgments

Owner

JOSS Publication

pySRURGS - a python package for symbolic regression by uniform random global search

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies