SymbolicRegression
Distributed High-Performance Symbolic Regression in Julia
Science Score: 64.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
4 of 23 committers (17.4%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Distributed High-Performance Symbolic Regression in Julia
Basic Info
- Host: GitHub
- Owner: MilesCranmer
- License: apache-2.0
- Language: Julia
- Default Branch: master
- Homepage: https://ai.damtp.cam.ac.uk/symbolicregression/dev
- Size: 29.6 MB
Statistics
- Stars: 723
- Watchers: 16
- Forks: 112
- Open Issues: 39
- Releases: 175
Topics
Metadata Files
README.md
| Latest release | Documentation | Forums | Paper |
|---|---|---|---|
| Build status | Coverage | ||
Contents:
- Quickstart
- Constructing expressions
- Exporting to SymbolicUtils.jl
- Contributors ✨
- Code structure
- Search options
Quickstart
Install in Julia with:
julia
using Pkg
Pkg.add("SymbolicRegression")
MLJ Interface
The easiest way to use SymbolicRegression.jl is with MLJ. Let's see an example:
```julia import SymbolicRegression: SRRegressor import MLJ: machine, fit!, predict, report
Dataset with two named features:
X = (a = rand(500), b = rand(500))
and one target:
y = @. 2 * cos(X.a * 23.5) - X.b ^ 2
with some noise:
y = y .+ randn(500) .* 1e-3
model = SRRegressor( niterations=50, binaryoperators=[+, -, *], unaryoperators=[cos], ) ```
Now, let's create and train this model on our data:
```julia mach = machine(model, X, y)
fit!(mach) ```
You will notice that expressions are printed
using the column names of our table. If,
instead of a table-like object,
a simple array is passed
(e.g., X=randn(100, 2)),
x1, ..., xn will be used for variable names.
Let's look at the expressions discovered:
julia
report(mach)
Finally, we can make predictions with the expressions on new data:
julia
predict(mach, X)
This will make predictions using the expression
selected by model.selection_method,
which by default is a mix of accuracy and complexity.
You can override this selection and select an equation from the Pareto front manually with:
julia
predict(mach, (data=X, idx=2))
where here we choose to evaluate the second equation.
For fitting multiple outputs, one can use MultitargetSRRegressor
(and pass an array of indices to idx in predict for selecting specific equations).
For a full list of options available to each regressor, see the API page.
Low-Level Interface
The heart of SymbolicRegression.jl is the
equation_search function.
This takes a 2D array and attempts
to model a 1D array using analytic functional forms.
Note: unlike the MLJ interface,
this assumes column-major input of shape [features, rows].
```julia import SymbolicRegression: Options, equation_search
X = randn(2, 100) y = 2 * cos.(X[2, :]) + X[1, :] .^ 2 .- 2
options = Options( binaryoperators=[+, *, /, -], unaryoperators=[cos, exp], populations=20 )
halloffame = equation_search( X, y, niterations=40, options=options, parallelism=:multithreading ) ```
You can view the resultant equations in the dominating Pareto front (best expression seen at each complexity) with:
```julia import SymbolicRegression: calculateparetofrontier
dominating = calculateparetofrontier(halloffame) ```
This is a vector of PopMember type - which contains the expression along with the cost.
We can get the expressions with:
julia
trees = [member.tree for member in dominating]
Each of these equations is an Expression{T} type for some constant type T (like Float32).
These expression objects are callable – you can simply pass in data:
julia
tree = trees[end]
output = tree(X)
Constructing expressions
Expressions are represented under-the-hood as the Node type which is developed
in the DynamicExpressions.jl package.
The Expression type wraps this and includes metadata about operators and variable names.
You can manipulate and construct expressions directly. For example:
```julia using SymbolicRegression: Options, Expression, Node
options = Options(; binaryoperators=[+, -, *, /], unaryoperators=[cos, exp, sin] ) operators = options.operators variablenames = ["x1", "x2", "x3"] x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variablenames) for i=1:3]
tree = cos(x1 - 3.2 * x2) - x1 * x1 ```
This tree has Float64 constants, so the type of the entire tree
will be promoted to Node{Float64}.
We can convert all constants (recursively) to Float32:
julia
float32_tree = convert(Expression{Float32}, tree)
We can then evaluate this tree on a dataset:
```julia X = rand(Float32, 3, 100)
tree(X) ```
This callable format is the easy-to-use version which will
automatically set all values to NaN if there were any
Inf or NaN during evaluation. You can call the raw evaluation
method with eval_tree_array:
julia
output, did_succeed = eval_tree_array(tree, X)
where did_succeed explicitly declares whether the evaluation was successful.
Exporting to SymbolicUtils.jl
We can view the equations in the dominating Pareto frontier with:
julia
dominating = calculate_pareto_frontier(hall_of_fame)
We can convert the best equation to SymbolicUtils.jl with the following function:
```julia import SymbolicRegression: nodetosymbolic
eqn = nodetosymbolic(dominating[end].tree) println(simplify(eqn*5 + 3)) ```
We can also print out the full pareto frontier like so:
```julia import SymbolicRegression: computecomplexity, stringtree
println("Complexity\tMSE\tEquation")
for member in dominating complexity = computecomplexity(member, options) loss = member.loss string = stringtree(member.tree, options)
println("$(complexity)\t$(loss)\t$(string)")
end ```
Contributors ✨
We are eager to welcome new contributors! If you have an idea for a new feature, don't hesitate to share it on the issues page or forums.
Mark Kittisopikul 💻 💡 🚇 📦 📣 👀 🔧 ⚠️ |
T Coxon 🐛 💻 🔌 💡 🚇 🚧 👀 🔧 ⚠️ 📓 |
Dhananjay Ashok 💻 🌍 💡 🚧 ⚠️ |
Johan Blåbäck 🐛 💻 💡 🚧 📣 👀 ⚠️ 📓 |
JuliusMartensen 🐛 💻 📖 🔌 💡 🚇 🚧 📦 📣 👀 🔧 📓 |
ngam 💻 🚇 📦 👀 🔧 ⚠️ |
Kaze Wong 🐛 💻 💡 🚇 🚧 📣 👀 🔬 📓 |
Christopher Rackauckas 🐛 💻 🔌 💡 🚇 📣 👀 🔬 🔧 ⚠️ 📓 |
Patrick Kidger 🐛 💻 📖 🔌 💡 🚧 📣 👀 🔬 🔧 ⚠️ 📓 |
Okon Samuel 🐛 💻 📖 🚧 💡 🚇 👀 ⚠️ 📓 |
William Booth-Clibborn 💻 🌍 📖 📓 🚧 👀 🔧 ⚠️ |
Pablo Lemos 🐛 💡 📣 👀 🔬 📓 |
Jerry Ling 🐛 💻 📖 🌍 💡 📣 👀 📓 |
Charles Fox 🐛 💻 💡 🚧 📣 👀 🔬 📓 |
Johann Brehmer 💻 📖 💡 📣 👀 🔬 ⚠️ 📓 |
Marius Millea 💻 💡 📣 👀 📓 |
Coba 🐛 💻 💡 👀 📓 |
Pietro Monticone 🐛 📖 💡 |
Mateusz Kubica 📖 💡 |
Jay Wadekar 🐛 💡 📣 🔬 |
Anthony Blaom, PhD 🚇 💡 👀 |
Jgmedina95 🐛 💡 👀 |
Michael Abbott 💻 💡 👀 🔧 |
Oscar Smith 💻 💡 |
Eric Hanson 💡 📣 📓 |
Henrique Becker 💻 💡 👀 |
qwertyjl 🐛 📖 💡 📓 |
Rik Huijzer 💡 🚇 |
Hongyu Wang 💡 📣 🔬 |
Saurav Maheshkar 🔧 |
Code structure
SymbolicRegression.jl is organized roughly as follows. Rounded rectangles indicate objects, and rectangles indicate functions.
(if you can't see this diagram being rendered, try pasting it into mermaid-js.github.io/mermaid-live-editor)
mermaid
flowchart TB
op([Options])
d([Dataset])
op --> ES
d --> ES
subgraph ES[equation_search]
direction TB
IP[sr_spawner]
IP --> p1
IP --> p2
subgraph p1[Thread 1]
direction LR
pop1([Population])
pop1 --> src[s_r_cycle]
src --> opt[optimize_and_simplify_population]
opt --> pop1
end
subgraph p2[Thread 2]
direction LR
pop2([Population])
pop2 --> src2[s_r_cycle]
src2 --> opt2[optimize_and_simplify_population]
opt2 --> pop2
end
pop1 --> hof
pop2 --> hof
hof([HallOfFame])
hof --> migration
pop1 <-.-> migration
pop2 <-.-> migration
migration[migrate!]
end
ES --> output([HallOfFame])
The HallOfFame objects store the expressions with the lowest loss seen at each complexity.
The dependency structure of the code itself is as follows:
mermaid
stateDiagram-v2
AdaptiveParsimony --> Mutate
AdaptiveParsimony --> Population
AdaptiveParsimony --> RegularizedEvolution
AdaptiveParsimony --> SearchUtils
AdaptiveParsimony --> SingleIteration
AdaptiveParsimony --> SymbolicRegression
CheckConstraints --> Mutate
CheckConstraints --> SymbolicRegression
Complexity --> CheckConstraints
Complexity --> HallOfFame
Complexity --> LossFunctions
Complexity --> MLJInterface
Complexity --> Mutate
Complexity --> PopMember
Complexity --> Population
Complexity --> SearchUtils
Complexity --> SingleIteration
Complexity --> SymbolicRegression
ConstantOptimization --> ExpressionBuilder
ConstantOptimization --> Mutate
ConstantOptimization --> SingleIteration
Core --> AdaptiveParsimony
Core --> CheckConstraints
Core --> Complexity
Core --> ConstantOptimization
Core --> DimensionalAnalysis
Core --> ExpressionBuilder
Core --> ExpressionBuilder
Core --> HallOfFame
Core --> InterfaceDynamicExpressions
Core --> LossFunctions
Core --> MLJInterface
Core --> Migration
Core --> Mutate
Core --> MutationFunctions
Core --> PopMember
Core --> Population
Core --> Recorder
Core --> RegularizedEvolution
Core --> SearchUtils
Core --> SingleIteration
Core --> SymbolicRegression
Dataset --> Core
DimensionalAnalysis --> LossFunctions
ExpressionBuilder --> SymbolicRegression
HallOfFame --> ExpressionBuilder
HallOfFame --> MLJInterface
HallOfFame --> SearchUtils
HallOfFame --> SingleIteration
HallOfFame --> SymbolicRegression
HallOfFame --> deprecates
InterfaceDynamicExpressions --> ExpressionBuilder
InterfaceDynamicExpressions --> HallOfFame
InterfaceDynamicExpressions --> LossFunctions
InterfaceDynamicExpressions --> SymbolicRegression
InterfaceDynamicQuantities --> Dataset
InterfaceDynamicQuantities --> MLJInterface
LossFunctions --> ConstantOptimization
LossFunctions --> ExpressionBuilder
LossFunctions --> ExpressionBuilder
LossFunctions --> Mutate
LossFunctions --> PopMember
LossFunctions --> Population
LossFunctions --> SingleIteration
LossFunctions --> SymbolicRegression
MLJInterface --> SymbolicRegression
Migration --> SymbolicRegression
Mutate --> RegularizedEvolution
MutationFunctions --> ExpressionBuilder
MutationFunctions --> Mutate
MutationFunctions --> Population
MutationFunctions --> SymbolicRegression
MutationFunctions --> deprecates
MutationWeights --> Core
MutationWeights --> Options
MutationWeights --> OptionsStruct
Operators --> Core
Operators --> Options
Options --> Core
OptionsStruct --> Core
OptionsStruct --> Options
OptionsStruct --> Options
PopMember --> ConstantOptimization
PopMember --> ExpressionBuilder
PopMember --> HallOfFame
PopMember --> Migration
PopMember --> Mutate
PopMember --> Population
PopMember --> SearchUtils
PopMember --> SingleIteration
PopMember --> SymbolicRegression
Population --> ExpressionBuilder
Population --> Migration
Population --> RegularizedEvolution
Population --> SearchUtils
Population --> SingleIteration
Population --> SymbolicRegression
ProgramConstants --> Core
ProgramConstants --> Dataset
ProgramConstants --> Operators
ProgressBars --> SearchUtils
ProgressBars --> SymbolicRegression
Recorder --> Mutate
Recorder --> RegularizedEvolution
Recorder --> SingleIteration
Recorder --> SymbolicRegression
RegularizedEvolution --> SingleIteration
SearchUtils --> SymbolicRegression
SingleIteration --> SymbolicRegression
Utils --> ConstantOptimization
Utils --> Dataset
Utils --> DimensionalAnalysis
Utils --> HallOfFame
Utils --> InterfaceDynamicExpressions
Utils --> MLJInterface
Utils --> Migration
Utils --> Operators
Utils --> Options
Utils --> PopMember
Utils --> Population
Utils --> RegularizedEvolution
Utils --> SearchUtils
Utils --> SingleIteration
Utils --> SymbolicRegression
Bash command to generate dependency structure from src directory (requires vim-stream):
bash
echo 'stateDiagram-v2'
IFS=$'\n'
for f in *.jl; do
for line in $(cat $f | grep -e 'import \.\.' -e 'import \.' -e 'using \.' -e 'using \.\.'); do
echo $(echo $line | vims -s 'dwf:d$' -t '%s/^\.*//g' '%s/Module//g') $(basename "$f" .jl);
done;
done | vims -l 'f a--> ' | sort
Search options
See https://ai.damtp.cam.ac.uk/symbolicregression/stable/api/#Options
Owner
- Name: Miles Cranmer
- Login: MilesCranmer
- Kind: user
- Location: Cambridge, UK
- Company: University of Cambridge
- Website: astroautomata.com
- Twitter: MilesCranmer
- Repositories: 219
- Profile: https://github.com/MilesCranmer
Assistant Professor at University of Cambridge. Works on AI for the physical sciences.
Citation (CITATION.md)
# Citing
To cite SymbolicRegression.jl or PySR, please use the following BibTeX entry:
```bibtex
@misc{cranmerInterpretableMachineLearning2023,
title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
url = {http://arxiv.org/abs/2305.01582},
doi = {10.48550/arXiv.2305.01582},
urldate = {2023-07-17},
publisher = {arXiv},
author = {Cranmer, Miles},
month = may,
year = {2023},
note = {arXiv:2305.01582 [astro-ph, physics:physics]},
keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}
```
To cite symbolic distillation of neural networks, the following BibTeX entry can be used:
```bibtex
@article{cranmerDiscovering2020,
title={Discovering Symbolic Models from Deep Learning with Inductive Biases},
author={Miles Cranmer and Alvaro Sanchez-Gonzalez and Peter Battaglia and Rui Xu and Kyle Cranmer and David Spergel and Shirley Ho},
journal={NeurIPS 2020},
year={2020},
eprint={2006.11287},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| MilesCranmer | m****r@g****m | 3,024 |
| CompatHelper Julia | c****y@j****g | 25 |
| Johan Blåbäck | j****k@r****t | 19 |
| Kaze Wong | k****s@g****m | 9 |
| dependabot[bot] | 4****] | 9 |
| Kaze Wong | k****g@j****u | 8 |
| AlCap23 | j****n@g****m | 8 |
| Atharva Sehgal | a****l@g****m | 7 |
| Johann Brehmer | m****l@j****e | 7 |
| pre-commit-ci[bot] | 6****] | 5 |
| github-actions[bot] | 4****] | 4 |
| sheevy | m****a@g****m | 3 |
| foxtran | 3****n | 3 |
| Jerry Ling | p****n@j****v | 2 |
| Coba | c****a@c****u | 2 |
| spaette | 1****e | 2 |
| Chris Rackauckas | a****s@c****m | 2 |
| Charles Fox | c****1@u****u | 1 |
| Rik Huijzer | t****r@r****l | 1 |
| Tim Holy | t****y@g****m | 1 |
| William Moses | gh@w****m | 1 |
| Yi-Xin Liu | l****x@f****n | 1 |
| Pietro Monticone | 3****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 107
- Total pull requests: 414
- Average time to close issues: 10 months
- Average time to close pull requests: 23 days
- Total issue authors: 42
- Total pull request authors: 15
- Average comments per issue: 3.54
- Average comments per pull request: 1.88
- Merged pull requests: 283
- Bot issues: 0
- Bot pull requests: 148
Past Year
- Issues: 41
- Pull requests: 195
- Average time to close issues: 22 days
- Average time to close pull requests: 7 days
- Issue authors: 15
- Pull request authors: 9
- Average comments per issue: 1.17
- Average comments per pull request: 2.23
- Merged pull requests: 143
- Bot issues: 0
- Bot pull requests: 45
Top Authors
Issue Authors
- MilesCranmer (47)
- Moelf (10)
- Shota123-pixel (3)
- gm89uk (3)
- charishma13 (3)
- Jgmedina95 (3)
- zsz00 (2)
- ablaom (2)
- rafaelcuperman (1)
- wenpw (1)
- WhiteGL (1)
- Jiyann (1)
- StevenWhitaker (1)
- maleadt (1)
- ArnoStrouwen (1)
Pull Request Authors
- MilesCranmer (236)
- github-actions[bot] (116)
- dependabot[bot] (20)
- pre-commit-ci[bot] (11)
- atharvas (10)
- Moelf (5)
- Jgmedina95 (2)
- foxtran (2)
- wsmoses (2)
- zwy-Giser (2)
- CreatixChu (2)
- spaette (2)
- liuyxpp (2)
- sweep-ai[bot] (1)
- timholy (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- julia 3,435 total
- Total dependent packages: 1
- Total dependent repositories: 0
- Total versions: 175
juliahub.com: SymbolicRegression
Distributed High-Performance Symbolic Regression in Julia
- Homepage: https://ai.damtp.cam.ac.uk/symbolicregression/dev
- Documentation: https://docs.juliahub.com/General/SymbolicRegression/stable/
- License: Apache-2.0
-
Latest release: 1.12.0
published 8 months ago
Rankings
Dependencies
- actions/checkout v3 composite
- coverallsapp/github-action master composite
- julia-actions/cache v1 composite
- julia-actions/julia-buildpkg v1 composite
- julia-actions/setup-julia v1 composite
- actions/checkout v3 composite
- julia-actions/cache v1 composite
- julia-actions/setup-julia latest composite
- JuliaRegistries/TagBot v1 composite
- actions/checkout v3 composite
- julia-actions/cache v1 composite
- julia-actions/setup-julia v1 composite
- actions/checkout v3 composite
- julia-actions/cache v1 composite
- julia-actions/setup-julia v1 composite
- peter-evans/create-pull-request v3 composite