pySRURGS - a python package for symbolic regression by uniform random global search
pySRURGS - a python package for symbolic regression by uniform random global search - Published in JOSS (2019)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Scientific Fields
Repository
Symbolic regression by uniform random global search
Basic Info
Statistics
- Stars: 13
- Watchers: 3
- Forks: 3
- Open Issues: 4
- Releases: 3
Topics
Metadata Files
README.md

Symbolic Regression by Uniform Random Global Search
Symbolic regression is a type of data analysis problem where you search for the equation of best fit for a numerical dataset. This package does this task by randomly, with uniform probability of selection, guessing candidate solutions and evaluating them. The No Free Lunch Theorem argues that random search should be equivalent to other approaches like Genetic Programming when assessing algorithm performing over all possible problems. This software should be useful for data analysts and researchers working on symbolic regression problems.
Features
- Robust parameter fitting
- Memoization for speed
- Avoids many arithmetically equivalent equations
- Loads data from CSV files
- Results saved to SQLite file.
- Results of new runs are added to results of old runs.
- User specified number of fitting parameters.
- User specified number of permitted unique binary trees, which determine the possible equation forms
- User specified permitted functions of arity 1 or 2
- Can also run an exhaustive/brute-force search
- Can be run in deterministic mode for reproducibility
- Developed and tested on Python 3.6
Getting Started
It's a python3 script. Download it and run it via a terminal.
Installing
Clone the repo then install the prerequisites.
git clone https://github.com/pySRURGS/pySRURGS.git
cd pySRURGS
pip install -r requirements.txt --user
Command line help
python3 pySRURGS.py -h
The above command should render the following:
``` usage: pySRURGS.py [-h] [-memoizefuncs] [-count] [-benchmarks] [-deterministic] [-plotting] [-exhaustive] [-funcsaritytwo FUNCSARITYTWO] [-funcsarityone FUNCSARITYONE] [-maxnumfitparams MAXNUMFITPARAMS] [-maxpermittedtrees MAXPERMITTEDTREES] [-pathtodb PATHTODB] [-pathtoweights PATHTO_WEIGHTS] train iters
positional arguments: train absolute or relative file path to the csv file housing the training data. The rightmost column of the CSV file should be the dependent variable. iters the number of equations to be attempted in this run
optional arguments:
-h, --help show this help message and exit
-memoizefuncs memoize the computations. If you are running large
iters and you do not have massive ram, do not use
this option. (default: False)
-count Instead of doing symbolic regression, just count out
how many possible equations for this configuration. No
other processing performed. (default: False)
-benchmarks Instead of doing symbolic regression, generate the 100
benchmark problems. No other processing performed.
(default: False)
-deterministic If set, the pseudorandom number generator will act in
a predictable manner and pySRURGS will produce
reproducible results. (default: False)
-plotting plot the best model against the data to
./image/plot.png and ./image/plot.svg - note only
works for univariate datasets (default: False)
-exhaustive instead of running pure random search, do an
exhaustive search. Be careful about running this as it
may run forever. iters gets ignored. (default:
False)
-funcsaritytwo FUNCSARITYTWO
a comma separated string listing the functions of
arity two you want to be considered.
Permitted:add,sub,mul,div,pow (default:
add,sub,mul,div,pow)
-funcsarityone FUNCSARITYONE
a comma separated string listing the functions of
arity one you want to be considered.
Permitted:sin,cos,tan,exp,log,sinh,cosh,tanh (default:
None)
-maxnumfitparams MAXNUMFITPARAMS
the maximum number of fitting parameters permitted in
the generated models (default: 3)
-maxpermittedtrees MAXPERMITTEDTREES
the number of unique binary trees that are permitted
in the generated models - binary trees define the form
of the equation, increasing this number tends to
increase the complexity of generated equations
(default: 1000)
-pathtodb PATHTODB
the absolute or relative path to the database file
where we will save results. If not set, will save
database file to ./db directory with same name as the
csv file. (default: None)
-pathtoweights PATHTO_WEIGHTS
the absolute or relative path to the CSV file where we
store the weights for each point in the dataset. The
CSV file should be a single column of non-negative
numerical data without a header. If not set, weights
are equal to one for all data points. (default: None)
```
Important details
All your data needs to be numeric. Your CSV file should have a header. Inside the csv, the dependent variable should be the rightmost column. Do not use special characters or spaces in variable names and start variable names with a letter (but not the letter 'p'). The fitting parameters are displayed as the letter 'p' followed by an integer.
An example
A sample problem is provided. The filename denotes the true equation.
``` $ winpty python pySRURGS.py -maxnumfitparams 3 -maxpermittedtrees 1000 -plotting ./csv/quarticpolynomial.csv 2000 Making sure we meet the iters value Normalized Mean Squared Error R^2 Equation, simplified Parameters
4.24058e-05 0.999999 ((p0 + p1)/(p0*x))**(-p0 + p1 + x) + (p0*x + p0 + p2)**(p2*x/p1) 4.47E+00,3.47E-01,2.25E-01
0.000141492 0.999996 (-2*p2*x*(p0*(p1 - x) + x) + (p0*p1*(p1 + x))**x*(p1 - x))/(p1 - x) 1.80E+00,1.34E+00,7.04E-02
0.000154517 0.999996 x*(p2/p0)**x*(-p0 + p2)**p2 - (x**p1)**p1 + 1 4.72E-01,1.17E+00,1.69E+00
0.0001829 0.999995 -(p0 + x)*(p1*p2*(p2 - x) - (p2 - x)**x)/(p0**2*p1*p2) -2.11E+01,-9.23E-03,1.24E+01
0.00021193 0.999995 ((p1**x)**p1 + (x**(-p0 + p2))**p2)*((p1 + x**p2)**x)**(-p0*(p0 - p1)) 4.09E-01,1.61E+00,1.75E+00
```
Another example - using relative weights for data points
Suppose you value some data points more than others, and you wish to ensure that those valued data points have greater weight than other data points. You can specify a path to a weights CSV file, which will inform the code the relative weight of the data points. An example where some of the data has no weight and the remainder have equal weight is shown below and is included in the repository.
winpty python pySRURGS.py -plotting -path_to_weights ./csv/weights.csv ./csv/weights_data.csv 300

Database file
The database file is in Sqlite3 format, and we access it using the SqliteDict package. For example. if we have already run some computations against the quartic_polynomial example, then we can run the following to inspect the results.
``` import pySRURGS from resultclass import Result # Result needs to be in the namespace. from sqlitedict import SqliteDict pathtocsv = './csv/quarticpolynomial.csv' pathtodb = './db/quarticpolynomial.db' SRconfig = pySRURGS.SymbolicRegressionConfig(pathtocsv, pathtodb) resultlist = pySRURGS.getresultlist(SRconfig) resultlist.sort()
after running sort, zero^th element is the best result
bestresult = resultlist.results[0] print("R^2:", bestresult.R2, "Equation:", bestresult.simpleequation, "Unsimplified Equation:", bestresult.equation) resultlist.print(dataset.y_data) ```
API
Author
Sohrab Towfighi
License
This project is licensed under the GPL 3.0 License - see the LICENSE file for details
How to Cite
If you use this software in your research, then please cite our papers.
Towfighi (2019). pySRURGS - a python package for symbolic regression by uniform random global search. Journal of Open Source Software, 4(41), 1675, https://doi.org/10.21105/joss.01675
Towfighi (2020). Symbolic Regression by Uniform Random Global Search. SN Applied Sciences 2: 34. https://doi.org/10.1007/s42452-019-1734-3
Community
If you would like to contribute to the project or you need help, then please create an issue.
With regards to community suggested changes, I would comment as to whether it would be within the scope of the project to include the suggested changes. If both parties are in agreement, whomever is interested in developing the changes can make a pull request, or I will implement the suggested changes.
Acknowledgments
- Luther Tychonievich created the algorithm mapping integers to full binary trees: link, web archived link.
- The icon is from the GNOME desktop icons project and the respective artists. Taken from link, web archived link. License: GPL version 2.0.
Owner
- Login: pySRURGS
- Kind: user
- Location: Vancouver, Canada
- Repositories: 5
- Profile: https://github.com/pySRURGS
This is a projects page by Sohrab Towfighi.
JOSS Publication
pySRURGS - a python package for symbolic regression by uniform random global search
Tags
symbolic regression regression analysis random search genetic programmingGitHub Events
Total
Last Year
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| sohrabtowfighi | s****i@h****m | 1,220 |
| pySRURGS | 5****S | 388 |
| sohrabtowfighi | s****i@g****m | 104 |
| Ubuntu | t****d@t****t | 8 |
| Sohrab Towfighi | T****S@S****T | 4 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 21
- Total pull requests: 3
- Average time to close issues: about 2 months
- Average time to close pull requests: N/A
- Total issue authors: 9
- Total pull request authors: 1
- Average comments per issue: 3.24
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- pySRURGS (8)
- hbarthels (4)
- BisUndefined (3)
- sohrabtowfighi (1)
- Lolojr (1)
- Fabssbtr (1)
- qinchunxiong (1)
- anthonyrollett (1)
- AoifeHughes (1)
Pull Request Authors
- dependabot[bot] (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- deap *
- dropbox *
- pymysql *
- scoop *
- sh *
- sshtunnel *
- lmfit *
- matplotlib *
- mpmath *
- numpy *
- pandas *
- parmap *
- pbs *
- pytest *
- pytest-cov *
- python-coveralls *
- scipy *
- sh *
- sqlitedict *
- sympy *
- tabulate *
- tqdm *
- lmfit ==0.9.13
- matplotlib ==3.1.1
- mpmath ==0.19
- numpy ==1.17.0
- pandas ==0.25.0
- parmap ==1.5.2
- pbs ==0.110
- pytest ==5.1.1
- pytest-cov ==2.7.1
- python-coveralls ==2.9.3
- scipy ==1.3.1
- sh ==1.12.14
- sqlitedict ==1.6.0
- sympy ==1.0
- tabulate ==0.7.7
- tqdm ==4.19.8
