evodag

Evolving Directed Acyclic Graph

https://github.com/mgraffg/evodag

Keywords

genetic-programming supervised-learning

Last synced: 6 months ago · JSON representation

Repository

Evolving Directed Acyclic Graph

Basic Info

Host: GitHub
Owner: mgraffg
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 686 KB

Statistics

Stars: 29
Watchers: 5
Forks: 7
Open Issues: 10
Releases: 0

Topics

genetic-programming supervised-learning

Created about 10 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

EvoDAG

Evolving Directed Acyclic Graph (EvoDAG) is a steady-state Genetic Programming system with tournament selection. The main characteristic of EvoDAG is that the genetic operation is performed at the root. EvoDAG was inspired by the geometric semantic crossover proposed by Alberto Moraglio et al. and the implementation performed by Leonardo Vanneschi et al.

EvoDAG is described in the following conference paper EvoDAG: A semantic Genetic Programming Python library Mario Graff, Eric S. Tellez, Sabino Miranda-Jiménez, Hugo Jair Escalante. 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) pp 1-6. A pre-print version can be download from here.

Quick Start

There are two options to use EvoDAG, one is as a library and the other is using the command line interface.

Using EvoDAG as library

Let us assume that X contains the inputs and y contains the classes. Then in order to train an ensemble of 30 EvoDAG and predict X one uses the following instructions:

```python

Importing EvoDAG ensemble

from EvoDAG.model import EvoDAGE

Importing iris dataset from sklearn

from sklearn.datasets import load_iris

Reading data

data = load_iris() X = data.data y = data.target

train the model

m = EvoDAGE(nestimators=30, njobs=4).fit(X, y)

predict X using the model

hy = m.predict(X) ```

Using EvoDAG from the command line

Let us assume one wants to create a classifier of iris dataset. The first step is to download the dataset from the UCI Machine Learning Repository

bash curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

Training EvoDAG

Let us assume, you do not want to optimise the EvoDAG parameters, so the default parameters are used when flag -P is not present, i.e.,

bash EvoDAG-train -C -m model.evodag -n 100 -u 4 iris.data -C flag indicates that it is a classification problem, and -R is for regression problems; there are different default parameters for each type of problems.

The performance of EvoDAG without optimising the parameters is presented in the last column of the performance table.

Predict

Once the model is obtained, it is time to use it; given that iris.data was not split into a training and test set, let us assume that iris.data contains some unknown data. In order to predict iris.data one would do:

bash EvoDAG-predict -m model.evodag -o iris.predicted iris.data

where -o indicates the file name used to store the predictions, -m contains the model, and iris.data is the test set.

Performance

The next table presents the average performance in terms of the balance error rate (BER) of different classifiers found in scikit-learn on nine classification problems (these benchmarks can be found: matlab and text). The best performance among each classification dataset is in bold face to facilitate the reading. EvoDAG is trained using the commands describe in Quick Start Section.

|Classifier|Average rank|banana | thyroid | diabetis | heart | ringnorm | twonorm |german| image | waveform| |-------|------:|------:|-------:|----:|--------:|--------:|------:|-----:|--------:|---------:| |EvoDAG|2.4|12.0|7.7|25.0|16.1|1.5|2.3|28.3|3.6|10.8| |SVC|5.3|11.6|8.1|29.8|17.7|1.8|2.7|33.3|8.9|10.7| |GaussianNB|5.9|41.6|11.5|28.6|16.3|1.4|2.4|30.6|36.7|12.2| |GradientBoosting|6.1|13.8|8.2|28.5|21.1|6.6|5.7|31.1|2.1|13.6| |MLP|6.4|18.8|9.3|28.6|18.3|11.0|2.8|32.0|3.4|11.4| |NearestCentroid|7.4|46.3|22.5|28.2|16.7|24.1|2.3|27.7|37.1|13.0| |AdaBoost|8.3|28.2|8.7|29.3|22.5|7.0|5.4|31.6|3.2|14.6| |ExtraTrees|8.7|13.5|6.6|33.0|21.1|8.4|7.8|36.9|2.4|16.5| |LinearSVC|8.8|50.0|16.9|28.5|16.8|25.2|3.6|32.4|18.6|14.2| |LogisticRegression|9.1|50.0|20.2|28.3|17.0|25.3|2.9|32.1|18.7|13.7| |RandomForest|9.1|13.6|7.9|31.9|21.4|9.6|8.9|36.7|2.1|16.9| |KNeighbors|9.4|11.9|12.1|32.1|18.6|43.7|3.8|36.2|5.3|13.8| |BernoulliNB|11.6|45.5|32.5|31.9|16.6|28.1|5.8|32.8|39.0|14.3| |DecisionTree|11.8|15.2|9.3|33.9|27.1|19.2|20.9|36.5|3.4|20.8| |SGD|14.0|50.3|17.7|34.0|22.2|32.6|3.9|38.5|25.1|17.6| |PassiveAggressive|14.1|49.0|19.5|34.7|23.3|31.5|3.9|38.7|26.2|17.1| |Perceptron|14.4|50.2|18.2|34.6|21.6|33.5|3.9|37.5|26.7|19.0|

The predictions of EvoDAG were obtained using the following script:

``bash dirname=evodag-EvoDAG-params --version | awk '{print $2}'` [ ! -d $dirname ] && mkdir $dirname

echo Haciendo EvoDAG-params --version

for train in csv/train_data.csv; do test=python -c "import sys; print(sys.argv[1].replace('train', 'test'))" $train; output=basename ${test} .csv predict=${dirname}/${output}.predict model=${dirname}/${output}.model if [ ! -f $model ] then EvoDAG-train -C -u 32 -m $model -t $test $train fi if [ ! -f $predict ] then EvoDAG-predict -u 32 -m $model -o $predict $test fi done ```

The predictions sklearn classifiers were obtained using the following code:

```python from sklearn.linearmodel import LogisticRegression, SGDClassifier, Perceptron from sklearn.linearmodel import PassiveAggressiveClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.neighbors.nearestcentroid import NearestCentroid from sklearn.svm import SVC, LinearSVC from sklearn.naivebayes import GaussianNB, BernoulliNB from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import GradientBoostingClassifier from sklearn.neural_network import MLPClassifier from glob import glob import numpy as np

def predict(train, test, alg): X = np.loadtxt(train, delimiter=',') Xtrain = X[:, :-1] ytrain = X[:, -1] Xtest = np.loadtxt(test, delimiter=',') m = alg().fit(Xtrain, ytrain) return m.predict(Xtest)

ALG = [LogisticRegression, SGDClassifier, Perceptron, PassiveAggressiveClassifier, SVC, LinearSVC, KNeighborsClassifier, NearestCentroid, GaussianNB, BernoulliNB, DecisionTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier, GradientBoostingClassifier, MLPClassifier]

for dataset in ['banana', 'thyroid', 'diabetis', 'heart', 'ringnorm', 'twonorm', 'german', 'image', 'waveform']: for train in glob('csv/%straindata*.csv' % dataset): test = train.replace('train', 'test') hy_alg = [predict(train, test, alg) for alg in ALG] ```

Citing EvoDAG

If you like EvoDAG, and it is used in a scientific publication, I would appreciate citations to either the conference paper or the book chapter:

EvoDAG: A semantic Genetic Programming Python library Mario Graff, Eric S. Tellez, Sabino Miranda-Jiménez, Hugo Jair Escalante. 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) pp 1-6. bibtex @inproceedings{graff_evodag:_2016, title = {{EvoDAG}: {A} semantic {Genetic} {Programming} {Python} library}, shorttitle = {{EvoDAG}}, doi = {10.1109/ROPEC.2016.7830633}, abstract = {Genetic Programming (GP) is an evolutionary algorithm that has received a lot of attention lately due to its success in solving hard real-world problems. Lately, there has been considerable interest in GP's community to develop semantic genetic operators, i.e., operators that work on the phenotype. In this contribution, we describe EvoDAG (Evolving Directed Acyclic Graph) which is a Python library that implements a steady-state semantic Genetic Programming with tournament selection using an extension of our previous crossover operators based on orthogonal projections in the phenotype space. To show the effectiveness of EvoDAG, it is compared against state-of-the-art classifiers on different benchmark problems, experimental results indicate that EvoDAG is very competitive.}, booktitle = {2016 {IEEE} {International} {Autumn} {Meeting} on {Power}, {Electronics} and {Computing} ({ROPEC})}, author = {Graff, M. and Tellez, E. S. and Miranda-Jiménez, S. and Escalante, H. J.}, month = nov, year = {2016}, keywords = {directed graphs, Electronic mail, EvoDAG, evolving directed acyclic graph, Genetic algorithms, GP community, Libraries, semantic genetic operators, semantic genetic programming Python library, Semantics, Sociology, Statistics, Steady-state, steady-state semantic genetic programming, Training}, pages = {1--6} }

Semantic Genetic Programming for Sentiment Analysis Mario Graff, Eric S. Tellez, Hugo Jair Escalante, Sabino Miranda-Jiménez. NEO 2015 Volume 663 of the series Studies in Computational Intelligence pp 43-65. bibtex @incollection{graff_semantic_2017, series = {Studies in {Computational} {Intelligence}}, title = {Semantic {Genetic} {Programming} for {Sentiment} {Analysis}}, copyright = {©2017 Springer International Publishing Switzerland}, isbn = {9783319440026 9783319440033}, url = {http://link.springer.com/chapter/10.1007/978-3-319-44003-3_2}, abstract = {Sentiment analysis is one of the most important tasks in text mining. This field has a high impact for government and private companies to support major decision-making policies. Even though Genetic Programming (GP) has been widely used to solve real world problems, GP is seldom used to tackle this trendy problem. This contribution starts rectifying this research gap by proposing a novel GP system, namely, Root Genetic Programming, and extending our previous genetic operators based on projections on the phenotype space. The results show that these systems are able to tackle this problem being competitive with other state-of-the-art classifiers, and, also, give insight to approach large scale problems represented on high dimensional spaces.}, language = {en}, number = {663}, urldate = {2016-09-20}, booktitle = {{NEO} 2015}, publisher = {Springer International Publishing}, author = {Graff, Mario and Tellez, Eric S. and Escalante, Hugo Jair and Miranda-Jiménez, Sabino}, editor = {Schütze, Oliver and Trujillo, Leonardo and Legrand, Pierrick and Maldonado, Yazmin}, year = {2017}, note = {DOI: 10.1007/978-3-319-44003-3\_2}, keywords = {Artificial Intelligence (incl. Robotics), Big Data/Analytics, Computational intelligence, Computer Imaging, Vision, Pattern Recognition and Graphics, Genetic programming, optimization, Semantic Crossover, sentiment analysis, Text mining}, pages = {43--65} }

EvoDAG from command line

Let us assume one wants to create a classifier of iris dataset. The first step is to download the dataset from the UCI Machine Learning Repository

bash curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

Random search on the EvoDAG's parameters space

In order to boost the performance of EvoDAG, it is recommended to optimize the parameters used by EvoDAG. In order to free the user from this task, EvoDAG can perform a random search on the parameter space. EvoDAG selects the best configuration found on the random search. This can be performed as follows:

bash__ EvoDAG-params -C -P params.evodag -r 734 -u 4 iris.data

where -C indicates that the task is classification, -P indicates the file name where the parameters sampled are stored, -r specifies the number of samples (all the experiments presented here sampled 734 points which corresponded, in early versions, the 0.1% of the search space), -u indicates the number of cpu cores, and iris.data is the dataset.

params.evodag looks like:

json [ { "Add": 30, "Cos": false, "Div": true, "Exp": true, "Fabs": true, "If": true, "Ln": true, "Max": 5, "Min": 30, "Mul": 0, "Sigmoid": true, "Sin": false, "Sq": true, "Sqrt": true, "classifier": true, "early_stopping_rounds": 2000, "fitness": [ 0.0, 0.0, 0.0 ], "popsize": 500, "seed": 0, "unique_individuals": true }, ...

where fitness is the balance error rate on a validation set, which is randomly taken from the training set, in this case the 20% of iris.data.

Training EvoDAG

At this point, we are in the position to train a model. Let us assume one would like to create an ensemble of 10 classifiers on iris.data. The following command performs this action:

bash EvoDAG-train -P params.evodag -m model.evodag -n 100 -u 4 iris.data

where -m specifies the file name used to store the model, -n is the size of the ensemble, -P receives EvoDAG's parameters, -u is the number of cpu cores, and iris.data is the dataset.

Predict using EvoDAG model

At this point, EvoDAG has been trained and the model is stored in model.evodag, the next step is to use this model to predict some unknown data. Given that iris.data was not split into a training and test set, let us assume that iris.data contains some unknown data. In order to predict iris.data one would do:

bash EvoDAG-predict -m model.evodag -o iris.predicted iris.data

where -o indicates the file name used to store the predictions, -m contains the model, and iris.data has the test set.

iris.predicted looks like:

Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa ...

Install EvoDAG

Installing evodag from the conda-forge channel can be achieved by adding conda-forge to your channels with: bash conda config --add channels conda-forge conda config --set channel_priority strict conda install evodag or with mamba:

mamba install evodag

Install using pip
bash pip install EvoDAG

Using source code

Clone the repository
git clone https://github.com/mgraffg/EvoDAG.git
Install the package as usual
python setup.py install
To install only for the use then
python setup.py install --user

Owner

Name: Mario Graff
Login: mgraffg
Kind: user
Location: Aguascalientes, México
Company: CONACYT, INFOTEC Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación

Website: https://mgraffg.github.io
Twitter: mgraffg
Repositories: 2
Profile: https://github.com/mgraffg

Mario Graff is a CONACYT (México Council of Science and Technology) Researcher working at INFOTEC.

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 579
Total Committers: 6
Avg Commits per committer: 96.5
Development Distribution Score (DDS): 0.116

Past Year

Commits: 9
Committers: 1
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Mario Graff	m**g@i**g	512
Mario Graff	m**g@u**m	43
ClaudiaSanchez	c**9@h**m	10
Mario Graff	m**f@i**x	10
Mario Graff	1**g@u**m	3
Mario Graff	m**g@g**m	1

Committer Domains (Top 20 + Academic)

infotec.mx: 1 ieee.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 34
Total pull requests: 24
Average time to close issues: about 1 month
Average time to close pull requests: about 6 hours
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 0.74
Average comments per pull request: 1.0
Merged pull requests: 23
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mgraffg (32)
manzar96 (1)
alemol (1)

Pull Request Authors

mgraffg (22)
ClaudiaSanchez (2)

Top Labels

Issue Labels

enhancement (23) bug (4)

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 139 last-month

Total dependent packages: 2
(may contain duplicates)
Total dependent repositories: 2
(may contain duplicates)
Total versions: 90
Total maintainers: 1

pypi.org: evodag

Evolving Directed Acyclic Graph

Homepage: https://github.com/mgraffg/EvoDAG
Documentation: https://evodag.readthedocs.io/
License: Apache Software License
Latest release: 0.17.2
published about 3 years ago

Versions: 89
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 139 Last month

Rankings

Dependent packages count: 4.7%

Forks count: 11.9%

Stargazers count: 12.3%

Downloads: 12.5%

Average: 12.6%

Dependent repos count: 21.7%

Maintainers (1)

mgraffg

Last synced: 6 months ago

conda-forge.org: evodag

Homepage: https://github.com/mgraffg/EvoDAG
License: Apache-2.0
Latest release: 0.17.1
published over 3 years ago

Versions: 1
Dependent Packages: 1
Dependent Repositories: 1

Rankings

Dependent repos count: 24.3%

Dependent packages count: 29.0%

Average: 37.4%

Stargazers count: 47.5%

Forks count: 48.9%

Last synced: 6 months ago

evodag

Science Score: 33.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

EvoDAG

Quick Start

Using EvoDAG as library

Importing EvoDAG ensemble

Importing iris dataset from sklearn

Reading data

train the model

predict X using the model

Using EvoDAG from the command line

Training EvoDAG

Predict

Performance

Citing EvoDAG

EvoDAG from command line

Random search on the EvoDAG's parameters space

Training EvoDAG

Predict using EvoDAG model

Install EvoDAG

Using source code

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: evodag

Rankings

Maintainers (1)

conda-forge.org: evodag

Rankings

Dependencies