SymbolicRegression

Distributed High-Performance Symbolic Regression in Julia

https://github.com/milescranmer/symbolicregression.jl

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    4 of 23 committers (17.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

automl data-science distributed-systems equation-discovery evolutionary-algorithms explainable-ai genetic-algorithm interpretable-ml julia machine-learning sciml symbolic symbolic-computation symbolic-regression

Keywords from Contributors

ode numerics neural-sde programming-language pde julialang parallelism differential-equations matrix-exponential sde
Last synced: 4 months ago · JSON representation ·

Repository

Distributed High-Performance Symbolic Regression in Julia

Basic Info
Statistics
  • Stars: 723
  • Watchers: 16
  • Forks: 112
  • Open Issues: 39
  • Releases: 175
Topics
automl data-science distributed-systems equation-discovery evolutionary-algorithms explainable-ai genetic-algorithm interpretable-ml julia machine-learning sciml symbolic symbolic-computation symbolic-regression
Created almost 5 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog License Citation

README.md

SymbolicRegression.jl searches for symbolic expressions which optimize a particular objective. https://github.com/MilesCranmer/SymbolicRegression.jl/assets/7593028/f5b68f1f-9830-497f-a197-6ae332c94ee0
Latest release Documentation Forums Paper
version Dev Discussions Paper
Build status Coverage
CI Coverage Status
Check out [PySR](https://github.com/MilesCranmer/PySR) for a Python frontend. [Cite this software](https://arxiv.org/abs/2305.01582)

Contents:

Quickstart

Install in Julia with:

julia using Pkg Pkg.add("SymbolicRegression")

MLJ Interface

The easiest way to use SymbolicRegression.jl is with MLJ. Let's see an example:

```julia import SymbolicRegression: SRRegressor import MLJ: machine, fit!, predict, report

Dataset with two named features:

X = (a = rand(500), b = rand(500))

and one target:

y = @. 2 * cos(X.a * 23.5) - X.b ^ 2

with some noise:

y = y .+ randn(500) .* 1e-3

model = SRRegressor( niterations=50, binaryoperators=[+, -, *], unaryoperators=[cos], ) ```

Now, let's create and train this model on our data:

```julia mach = machine(model, X, y)

fit!(mach) ```

You will notice that expressions are printed using the column names of our table. If, instead of a table-like object, a simple array is passed (e.g., X=randn(100, 2)), x1, ..., xn will be used for variable names.

Let's look at the expressions discovered:

julia report(mach)

Finally, we can make predictions with the expressions on new data:

julia predict(mach, X)

This will make predictions using the expression selected by model.selection_method, which by default is a mix of accuracy and complexity.

You can override this selection and select an equation from the Pareto front manually with:

julia predict(mach, (data=X, idx=2))

where here we choose to evaluate the second equation.

For fitting multiple outputs, one can use MultitargetSRRegressor (and pass an array of indices to idx in predict for selecting specific equations). For a full list of options available to each regressor, see the API page.

Low-Level Interface

The heart of SymbolicRegression.jl is the equation_search function. This takes a 2D array and attempts to model a 1D array using analytic functional forms. Note: unlike the MLJ interface, this assumes column-major input of shape [features, rows].

```julia import SymbolicRegression: Options, equation_search

X = randn(2, 100) y = 2 * cos.(X[2, :]) + X[1, :] .^ 2 .- 2

options = Options( binaryoperators=[+, *, /, -], unaryoperators=[cos, exp], populations=20 )

halloffame = equation_search( X, y, niterations=40, options=options, parallelism=:multithreading ) ```

You can view the resultant equations in the dominating Pareto front (best expression seen at each complexity) with:

```julia import SymbolicRegression: calculateparetofrontier

dominating = calculateparetofrontier(halloffame) ```

This is a vector of PopMember type - which contains the expression along with the cost. We can get the expressions with:

julia trees = [member.tree for member in dominating]

Each of these equations is an Expression{T} type for some constant type T (like Float32).

These expression objects are callable – you can simply pass in data:

julia tree = trees[end] output = tree(X)

Constructing expressions

Expressions are represented under-the-hood as the Node type which is developed in the DynamicExpressions.jl package. The Expression type wraps this and includes metadata about operators and variable names.

You can manipulate and construct expressions directly. For example:

```julia using SymbolicRegression: Options, Expression, Node

options = Options(; binaryoperators=[+, -, *, /], unaryoperators=[cos, exp, sin] ) operators = options.operators variablenames = ["x1", "x2", "x3"] x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variablenames) for i=1:3]

tree = cos(x1 - 3.2 * x2) - x1 * x1 ```

This tree has Float64 constants, so the type of the entire tree will be promoted to Node{Float64}.

We can convert all constants (recursively) to Float32:

julia float32_tree = convert(Expression{Float32}, tree)

We can then evaluate this tree on a dataset:

```julia X = rand(Float32, 3, 100)

tree(X) ```

This callable format is the easy-to-use version which will automatically set all values to NaN if there were any Inf or NaN during evaluation. You can call the raw evaluation method with eval_tree_array:

julia output, did_succeed = eval_tree_array(tree, X)

where did_succeed explicitly declares whether the evaluation was successful.

Exporting to SymbolicUtils.jl

We can view the equations in the dominating Pareto frontier with:

julia dominating = calculate_pareto_frontier(hall_of_fame)

We can convert the best equation to SymbolicUtils.jl with the following function:

```julia import SymbolicRegression: nodetosymbolic

eqn = nodetosymbolic(dominating[end].tree) println(simplify(eqn*5 + 3)) ```

We can also print out the full pareto frontier like so:

```julia import SymbolicRegression: computecomplexity, stringtree

println("Complexity\tMSE\tEquation")

for member in dominating complexity = computecomplexity(member, options) loss = member.loss string = stringtree(member.tree, options)

println("$(complexity)\t$(loss)\t$(string)")

end ```

Contributors ✨

We are eager to welcome new contributors! If you have an idea for a new feature, don't hesitate to share it on the issues page or forums.

Mark Kittisopikul
Mark Kittisopikul

💻 💡 🚇 📦 📣 👀 🔧 ⚠️
T Coxon
T Coxon

🐛 💻 🔌 💡 🚇 🚧 👀 🔧 ⚠️ 📓
Dhananjay Ashok
Dhananjay Ashok

💻 🌍 💡 🚧 ⚠️
Johan Blåbäck
Johan Blåbäck

🐛 💻 💡 🚧 📣 👀 ⚠️ 📓
JuliusMartensen
JuliusMartensen

🐛 💻 📖 🔌 💡 🚇 🚧 📦 📣 👀 🔧 📓
ngam
ngam

💻 🚇 📦 👀 🔧 ⚠️
Kaze Wong
Kaze Wong

🐛 💻 💡 🚇 🚧 📣 👀 🔬 📓
Christopher Rackauckas
Christopher Rackauckas

🐛 💻 🔌 💡 🚇 📣 👀 🔬 🔧 ⚠️ 📓
Patrick Kidger
Patrick Kidger

🐛 💻 📖 🔌 💡 🚧 📣 👀 🔬 🔧 ⚠️ 📓
Okon Samuel
Okon Samuel

🐛 💻 📖 🚧 💡 🚇 👀 ⚠️ 📓
William Booth-Clibborn
William Booth-Clibborn

💻 🌍 📖 📓 🚧 👀 🔧 ⚠️
Pablo Lemos
Pablo Lemos

🐛 💡 📣 👀 🔬 📓
Jerry Ling
Jerry Ling

🐛 💻 📖 🌍 💡 📣 👀 📓
Charles Fox
Charles Fox

🐛 💻 💡 🚧 📣 👀 🔬 📓
Johann Brehmer
Johann Brehmer

💻 📖 💡 📣 👀 🔬 ⚠️ 📓
Marius Millea
Marius Millea

💻 💡 📣 👀 📓
Coba
Coba

🐛 💻 💡 👀 📓
Pietro Monticone
Pietro Monticone

🐛 📖 💡
Mateusz Kubica
Mateusz Kubica

📖 💡
Jay Wadekar
Jay Wadekar

🐛 💡 📣 🔬
Anthony Blaom, PhD
Anthony Blaom, PhD

🚇 💡 👀
Jgmedina95
Jgmedina95

🐛 💡 👀
Michael Abbott
Michael Abbott

💻 💡 👀 🔧
Oscar Smith
Oscar Smith

💻 💡
Eric Hanson
Eric Hanson

💡 📣 📓
Henrique Becker
Henrique Becker

💻 💡 👀
qwertyjl
qwertyjl

🐛 📖 💡 📓
Rik Huijzer
Rik Huijzer

💡 🚇
Hongyu Wang
Hongyu Wang

💡 📣 🔬
Saurav Maheshkar
Saurav Maheshkar

🔧

Code structure

SymbolicRegression.jl is organized roughly as follows. Rounded rectangles indicate objects, and rectangles indicate functions.

(if you can't see this diagram being rendered, try pasting it into mermaid-js.github.io/mermaid-live-editor)

mermaid flowchart TB op([Options]) d([Dataset]) op --> ES d --> ES subgraph ES[equation_search] direction TB IP[sr_spawner] IP --> p1 IP --> p2 subgraph p1[Thread 1] direction LR pop1([Population]) pop1 --> src[s_r_cycle] src --> opt[optimize_and_simplify_population] opt --> pop1 end subgraph p2[Thread 2] direction LR pop2([Population]) pop2 --> src2[s_r_cycle] src2 --> opt2[optimize_and_simplify_population] opt2 --> pop2 end pop1 --> hof pop2 --> hof hof([HallOfFame]) hof --> migration pop1 <-.-> migration pop2 <-.-> migration migration[migrate!] end ES --> output([HallOfFame])

The HallOfFame objects store the expressions with the lowest loss seen at each complexity.

The dependency structure of the code itself is as follows:

mermaid stateDiagram-v2 AdaptiveParsimony --> Mutate AdaptiveParsimony --> Population AdaptiveParsimony --> RegularizedEvolution AdaptiveParsimony --> SearchUtils AdaptiveParsimony --> SingleIteration AdaptiveParsimony --> SymbolicRegression CheckConstraints --> Mutate CheckConstraints --> SymbolicRegression Complexity --> CheckConstraints Complexity --> HallOfFame Complexity --> LossFunctions Complexity --> MLJInterface Complexity --> Mutate Complexity --> PopMember Complexity --> Population Complexity --> SearchUtils Complexity --> SingleIteration Complexity --> SymbolicRegression ConstantOptimization --> ExpressionBuilder ConstantOptimization --> Mutate ConstantOptimization --> SingleIteration Core --> AdaptiveParsimony Core --> CheckConstraints Core --> Complexity Core --> ConstantOptimization Core --> DimensionalAnalysis Core --> ExpressionBuilder Core --> ExpressionBuilder Core --> HallOfFame Core --> InterfaceDynamicExpressions Core --> LossFunctions Core --> MLJInterface Core --> Migration Core --> Mutate Core --> MutationFunctions Core --> PopMember Core --> Population Core --> Recorder Core --> RegularizedEvolution Core --> SearchUtils Core --> SingleIteration Core --> SymbolicRegression Dataset --> Core DimensionalAnalysis --> LossFunctions ExpressionBuilder --> SymbolicRegression HallOfFame --> ExpressionBuilder HallOfFame --> MLJInterface HallOfFame --> SearchUtils HallOfFame --> SingleIteration HallOfFame --> SymbolicRegression HallOfFame --> deprecates InterfaceDynamicExpressions --> ExpressionBuilder InterfaceDynamicExpressions --> HallOfFame InterfaceDynamicExpressions --> LossFunctions InterfaceDynamicExpressions --> SymbolicRegression InterfaceDynamicQuantities --> Dataset InterfaceDynamicQuantities --> MLJInterface LossFunctions --> ConstantOptimization LossFunctions --> ExpressionBuilder LossFunctions --> ExpressionBuilder LossFunctions --> Mutate LossFunctions --> PopMember LossFunctions --> Population LossFunctions --> SingleIteration LossFunctions --> SymbolicRegression MLJInterface --> SymbolicRegression Migration --> SymbolicRegression Mutate --> RegularizedEvolution MutationFunctions --> ExpressionBuilder MutationFunctions --> Mutate MutationFunctions --> Population MutationFunctions --> SymbolicRegression MutationFunctions --> deprecates MutationWeights --> Core MutationWeights --> Options MutationWeights --> OptionsStruct Operators --> Core Operators --> Options Options --> Core OptionsStruct --> Core OptionsStruct --> Options OptionsStruct --> Options PopMember --> ConstantOptimization PopMember --> ExpressionBuilder PopMember --> HallOfFame PopMember --> Migration PopMember --> Mutate PopMember --> Population PopMember --> SearchUtils PopMember --> SingleIteration PopMember --> SymbolicRegression Population --> ExpressionBuilder Population --> Migration Population --> RegularizedEvolution Population --> SearchUtils Population --> SingleIteration Population --> SymbolicRegression ProgramConstants --> Core ProgramConstants --> Dataset ProgramConstants --> Operators ProgressBars --> SearchUtils ProgressBars --> SymbolicRegression Recorder --> Mutate Recorder --> RegularizedEvolution Recorder --> SingleIteration Recorder --> SymbolicRegression RegularizedEvolution --> SingleIteration SearchUtils --> SymbolicRegression SingleIteration --> SymbolicRegression Utils --> ConstantOptimization Utils --> Dataset Utils --> DimensionalAnalysis Utils --> HallOfFame Utils --> InterfaceDynamicExpressions Utils --> MLJInterface Utils --> Migration Utils --> Operators Utils --> Options Utils --> PopMember Utils --> Population Utils --> RegularizedEvolution Utils --> SearchUtils Utils --> SingleIteration Utils --> SymbolicRegression

Bash command to generate dependency structure from src directory (requires vim-stream):

bash echo 'stateDiagram-v2' IFS=$'\n' for f in *.jl; do for line in $(cat $f | grep -e 'import \.\.' -e 'import \.' -e 'using \.' -e 'using \.\.'); do echo $(echo $line | vims -s 'dwf:d$' -t '%s/^\.*//g' '%s/Module//g') $(basename "$f" .jl); done; done | vims -l 'f a--> ' | sort

Search options

See https://ai.damtp.cam.ac.uk/symbolicregression/stable/api/#Options

Owner

  • Name: Miles Cranmer
  • Login: MilesCranmer
  • Kind: user
  • Location: Cambridge, UK
  • Company: University of Cambridge

Assistant Professor at University of Cambridge. Works on AI for the physical sciences.

Citation (CITATION.md)

# Citing

To cite SymbolicRegression.jl or PySR, please use the following BibTeX entry:

```bibtex
@misc{cranmerInterpretableMachineLearning2023,
    title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
    url = {http://arxiv.org/abs/2305.01582},
    doi = {10.48550/arXiv.2305.01582},
    urldate = {2023-07-17},
    publisher = {arXiv},
    author = {Cranmer, Miles},
    month = may,
    year = {2023},
    note = {arXiv:2305.01582 [astro-ph, physics:physics]},
    keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}
```

To cite symbolic distillation of neural networks, the following BibTeX entry can be used:

```bibtex
@article{cranmerDiscovering2020,
    title={Discovering Symbolic Models from Deep Learning with Inductive Biases},
    author={Miles Cranmer and Alvaro Sanchez-Gonzalez and Peter Battaglia and Rui Xu and Kyle Cranmer and David Spergel and Shirley Ho},
    journal={NeurIPS 2020},
    year={2020},
    eprint={2006.11287},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 3,145
  • Total Committers: 23
  • Avg Commits per committer: 136.739
  • Development Distribution Score (DDS): 0.038
Past Year
  • Commits: 780
  • Committers: 9
  • Avg Commits per committer: 86.667
  • Development Distribution Score (DDS): 0.032
Top Committers
Name Email Commits
MilesCranmer m****r@g****m 3,024
CompatHelper Julia c****y@j****g 25
Johan Blåbäck j****k@r****t 19
Kaze Wong k****s@g****m 9
dependabot[bot] 4****] 9
Kaze Wong k****g@j****u 8
AlCap23 j****n@g****m 8
Atharva Sehgal a****l@g****m 7
Johann Brehmer m****l@j****e 7
pre-commit-ci[bot] 6****] 5
github-actions[bot] 4****] 4
sheevy m****a@g****m 3
foxtran 3****n 3
Jerry Ling p****n@j****v 2
Coba c****a@c****u 2
spaette 1****e 2
Chris Rackauckas a****s@c****m 2
Charles Fox c****1@u****u 1
Rik Huijzer t****r@r****l 1
Tim Holy t****y@g****m 1
William Moses gh@w****m 1
Yi-Xin Liu l****x@f****n 1
Pietro Monticone 3****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 107
  • Total pull requests: 414
  • Average time to close issues: 10 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 42
  • Total pull request authors: 15
  • Average comments per issue: 3.54
  • Average comments per pull request: 1.88
  • Merged pull requests: 283
  • Bot issues: 0
  • Bot pull requests: 148
Past Year
  • Issues: 41
  • Pull requests: 195
  • Average time to close issues: 22 days
  • Average time to close pull requests: 7 days
  • Issue authors: 15
  • Pull request authors: 9
  • Average comments per issue: 1.17
  • Average comments per pull request: 2.23
  • Merged pull requests: 143
  • Bot issues: 0
  • Bot pull requests: 45
Top Authors
Issue Authors
  • MilesCranmer (47)
  • Moelf (10)
  • Shota123-pixel (3)
  • gm89uk (3)
  • charishma13 (3)
  • Jgmedina95 (3)
  • zsz00 (2)
  • ablaom (2)
  • rafaelcuperman (1)
  • wenpw (1)
  • WhiteGL (1)
  • Jiyann (1)
  • StevenWhitaker (1)
  • maleadt (1)
  • ArnoStrouwen (1)
Pull Request Authors
  • MilesCranmer (236)
  • github-actions[bot] (116)
  • dependabot[bot] (20)
  • pre-commit-ci[bot] (11)
  • atharvas (10)
  • Moelf (5)
  • Jgmedina95 (2)
  • foxtran (2)
  • wsmoses (2)
  • zwy-Giser (2)
  • CreatixChu (2)
  • spaette (2)
  • liuyxpp (2)
  • sweep-ai[bot] (1)
  • timholy (1)
Top Labels
Issue Labels
bug (24) feature (importance: high) (9) feature (importance: mid) (9) feature (importance: low) (6) code cleanup (5) documentation (1) wontfix (1) question (1)
Pull Request Labels
dependencies (20) formatting (5) automated pr (5) no changelog (5) github_actions (3) sweep (1)

Packages

  • Total packages: 1
  • Total downloads:
    • julia 3,435 total
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 175
juliahub.com: SymbolicRegression

Distributed High-Performance Symbolic Regression in Julia

  • Versions: 175
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 3,435 Total
Rankings
Stargazers count: 2.0%
Forks count: 3.6%
Average: 9.6%
Dependent repos count: 9.9%
Dependent packages count: 23.0%
Last synced: 4 months ago

Dependencies

.github/workflows/CI.yml actions
  • actions/checkout v3 composite
  • coverallsapp/github-action master composite
  • julia-actions/cache v1 composite
  • julia-actions/julia-buildpkg v1 composite
  • julia-actions/setup-julia v1 composite
.github/workflows/Documentation.yml actions
  • actions/checkout v3 composite
  • julia-actions/cache v1 composite
  • julia-actions/setup-julia latest composite
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
.github/workflows/check-format.yml actions
  • actions/checkout v3 composite
  • julia-actions/cache v1 composite
  • julia-actions/setup-julia v1 composite
.github/workflows/fix-format.yml actions
  • actions/checkout v3 composite
  • julia-actions/cache v1 composite
  • julia-actions/setup-julia v1 composite
  • peter-evans/create-pull-request v3 composite