genepro

A baseline implementation of genetic programming (using trees to encode programs) with some examples of usage.

https://github.com/marcovirgolin/genepro

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 5 committers (40.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary

Keywords

evolutionary-algorithms genetic-algorithm genetic-programming program-synthetis reinforcement-learning symbolic-regression
Last synced: 7 months ago · JSON representation ·

Repository

A baseline implementation of genetic programming (using trees to encode programs) with some examples of usage.

Basic Info
  • Host: GitHub
  • Owner: marcovirgolin
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 9.13 MB
Statistics
  • Stars: 33
  • Watchers: 1
  • Forks: 7
  • Open Issues: 0
  • Releases: 9
Topics
evolutionary-algorithms genetic-algorithm genetic-programming program-synthetis reinforcement-learning symbolic-regression
Created about 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

genepro

art of a juniper, 'ginepro' in Italian
Art of a juniper, "ginepro" in Italian, made with the genetic drawing repo by @anopara.

In brief

genepro is a Python library providing a baseline implementation of genetic programming, an evolutionary algorithm specialized to evolve programs. This library includes a classifier and regressor that are compatible with scitik-learn (see examples of usage below).

Evolving programs are represented as trees. The leaf nodes (also called terminals) of such trees represent some form of input, e.g., a feature for classification or regression, or a type of environmental observation for reinforcement learning. The internal nodes represent possible atomic instructions, e.g., summation, subtraction, multiplication, division, but also if-then-else or similar programming constructs.

Genetic programming operates on a population of trees, typically initialized at random. Every iteration (called generation), promising trees undergo random modifications (e.g., forms of crossover, mutation, and tuning) that result in a population of offspring trees. This new population is then used for the next generation.

animation of genepro finding a symbolic regression solution
Example of 1D symbolic regression (made with this gist)

Installation

For classification or regression, genepro relies only on a few libraries (numpy, joblib, and scikit-learn). However, additional libraries (e.g., gym) are required to run the reinforcement learning example. Thus, you can choose to perform a minimal or full installation.

Minimal installation

To perform a minimal installation, run: pip install genepro

Full installation

For a full installation, clone this repo locally, and make use of the file requirements.txt, as follows: git clone https://github.com/marcovirgolin/genepro cd genepro pip install -r requirements.txt .

Wish to use conda?

A conda virtual enviroment can easily be set up with: git clone https://github.com/marcovirgolin/genepro cd genepro conda env create conda activate genepro pip install .

Examples of usage

Classification and regression

The notebook classification and regression.ipynb shows how to use genepro for classification and regression, via scikit-learn estimators.

These estimators are intended for data sets with a small number of (relevant) features, as the evolved program can be written as a compact (and potentially interpretable) symbolic expression.

... gen: 39, best of gen fitness: -2952.999, best of gen size: 46 gen: 40, best of gen fitness: -2950.453, best of gen size: 44 The mean squared error on the test set is 2964.646 (respective R^2 score is 0.512) Obtained by the (simplified) model: 146.527 + -5.797*(-x_2**2 - 4*x_2 - 3*x_3 + 2*x_4 - x_5 - x_6*(x_4 - x_5) + x_6 - 5*x_8) Example of output of a symbolic regression model discovered for the Diabetes data set.

Reinforcement learning

The notebook gym.ipynb shows how genepro can be used to evolve a controller for the CartPole-v1 environment of the OpenAI gym library.

animation displaying a random cart pole controller
Left: random cart pole controller; Right: evolved symbolic cart pole controller: (x2 + x3) * (x2*x3 + x3 + x4 + 1) * log(abs(x2))^2 * log(abs(x3))^2 < 0.5? 'left' else 'right'

Citation

If you use this software, please cite it with: @software{Virgolin_genepro_2022, author = {Virgolin, Marco}, month = {9}, title = {{genepro}}, url = {https://github.com/marcovirgolin/genepro}, version = {0.1.3}, year = {2024} }

Owner

  • Name: Marco
  • Login: marcovirgolin
  • Kind: user
  • Location: Amsterdam
  • Company: Centrum Wiskunde & Informatica (CWI)

Researcher on Evolutionary and Explainable Machine Learning @ Dutch National Math & CS center (CWI). Pic: stable diffusion + dreambooth.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Virgolin"
  given-names: "Marco"
  orcid: "https://orcid.org/0000-0001-8905-9313"
title: "genepro"
version: 0.1.0
date-released: 2022-09-01
url: "https://github.com/marcovirgolin/genepro"

GitHub Events

Total
  • Watch event: 5
  • Issue comment event: 1
  • Fork event: 1
Last Year
  • Watch event: 5
  • Issue comment event: 1
  • Fork event: 1

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 40
  • Total Committers: 5
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.625
Top Committers
Name Email Commits
Marco m****n@u****m 15
Marco m****o@M****l 13
Marco m****o@d****l 7
Marco m****o@d****l 4
Giorgia Nadizar g****r@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 3
  • Total pull requests: 2
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 17 hours
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 1.33
  • Average comments per pull request: 2.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hengzhe-zhang (1)
  • giorgia-nadizar (1)
  • chenyuxin1999 (1)
Pull Request Authors
  • giorgia-nadizar (1)
  • gandreadis (1)
Top Labels
Issue Labels
Pull Request Labels
enhancement (1)

Dependencies

requirements.txt pypi
  • gym ==0.22.0
  • joblib >=1.1.0
  • matplotlib >=3.5.1
  • numpy >=1.21.0
  • pygame ==2.1.0
  • pyglet ==1.5.21
  • scikit-learn >=1.0.2
  • sympy >=1.9
setup.py pypi
  • joblib >=1.1.0
  • numpy >=1.22.2
  • scikit-learn >=1.0.2
environment.yml conda
  • imagemagick
  • ipykernel
  • pip 21.2.4.*
  • python 3.10.*