GATree

GATree: Evolutionary decision tree classifier in Python - Published in JOSS (2024)

https://github.com/lahovniktadej/gatree

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

decision-tree evolutionary-algorithm genetic-algorithm machine-learning
Last synced: 4 months ago · JSON representation ·

Repository

Evolutionary decision trees

Basic Info
Statistics
  • Stars: 11
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 7
Topics
decision-tree evolutionary-algorithm genetic-algorithm machine-learning
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

GATree

GATree

PyPI version PyPI - Python Version PyPI - Downloads Downloads GATree Documentation status

Repository size License GitHub commit activity Percentage of issues still open Average time to resolve an issue GitHub contributors

DOI JOSS

📋 About 📦 Installation🚀 Usage🧬 Genetic Operators🫂 Community Guidelines📜 License

📋 About

GATree is a Python library designed for implementing evolutionary decision trees using a standard genetic algorithm approach. The library provides functionalities for selection, mutation, and crossover operations within the decision tree structure, allowing users to evolve and optimise decision trees for various classification and clustering tasks. 🌲🧬

The library's core objective is to empower users in creating and fine-tuning decision trees through an evolutionary process, opening avenues for innovative approaches to classification and clustering problems. GATree enables the dynamic growth and adaptation of decision trees, offering a flexible and powerful tool for machine learning enthusiasts and practitioners. 🚀🌿

GATree is currently limited to classification and clustering tasks, with support for regression tasks planned for future releases. 💡

📦 Installation

pip

To install GATree using pip, run the following command: bash pip install gatree

🚀 Usage

The following example demonstrates how to perform classification of the iris dataset using GATree. More examples can be found in the examples directory.

```python import pandas as pd from sklearn.datasets import loadiris from sklearn.modelselection import traintestsplit from sklearn.metrics import accuracy_score from gatree.methods.gatreeclassifier import GATreeClassifier

Load the iris dataset

iris = loadiris() X = pd.DataFrame(iris.data, columns=iris.featurenames) y = pd.Series(iris.target, name='target')

Split the dataset into training and testing sets

Xtrain, Xtest, ytrain, ytest = traintestsplit( X, y, testsize=0.2, randomstate=¸10)

Create and fit the GATree classifier

gatree = GATreeClassifier(njobs=16, randomstate=32) gatree.fit(X=Xtrain, y=ytrain, populationsize=100, maxiter=100)

Make predictions on the testing set

ypred = gatree.predict(Xtest)

Evaluate the accuracy of the classifier

print(accuracyscore(ytest, y_pred)) ```

🧬 Genetic Operators in GATree

The genetic algorithm for decision trees in GATree involves several key operators: selection, elitism, crossover, and mutation. Each of these operators plays a crucial role in the evolution and optimisation of the decision trees. Below is a detailed description of each operator within the context of the GATree class.

Selection

Selection is the process of choosing parent trees from the current population to produce offspring for the next generation. By default, GATree class uses tournament selection, a method where a subset of the population is randomly chosen, and the best individual from this subset is selected.

Elitism

Elitism ensures that the best-performing individuals (trees) from the current generation are carried over to the next generation without any modification. This guarantees that the quality of the population does not decrease from one generation to the next.

Crossover

Crossover is a genetic operator used to combine the genetic information of two parent trees to generate new offspring. This enables exploration, which helps in creating diversity in the population and combining good traits from both parents.

Mutation

Mutation introduces random changes to a tree to maintain genetic diversity and explore new solutions. This helps in avoiding local optima by introducing new genetic structures.

🫂 Community Guidelines

Contributing

To contribure to the software, please read the contributing guidelines.

Reporting Issues

If you encounter any issues with the library, please report them using the issue tracker. Include a detailed description of the problem, including the steps to reproduce the problem, the stack trace, and details about your operating system and software version.

Seeking Support

If you need support, please first refer to the documentation. If you still require assistance, please open an issue on the issue tracker with the question tag. For private inquiries, you can contact us via e-mail at tadej.lahovnik1@um.si or saso.karakatic@um.si.

📜 License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

Owner

  • Name: Tadej Lahovnik
  • Login: lahovniktadej
  • Kind: user
  • Location: Maribor, Slovenia
  • Company: UM FERI

JOSS Publication

GATree: Evolutionary decision tree classifier in Python
Published
August 12, 2024
Volume 9, Issue 100, Page 6748
Authors
Tadej Lahovnik ORCID
University of Maribor, Maribor, Slovenia
Sašo Karakatič ORCID
University of Maribor, Maribor, Slovenia
Editor
Kelly Rowland ORCID
Tags
genetic algorithm evolutionary algorithm classifier machine learning

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Lahovnik
  given-names: Tadej
  orcid: "https://orcid.org/0009-0005-9689-2991"
- family-names: Karakatič
  given-names: Sašo
  orcid: "https://orcid.org/0000-0003-4441-9690"
doi: 10.5281/zenodo.13307404
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Lahovnik
    given-names: Tadej
    orcid: "https://orcid.org/0009-0005-9689-2991"
  - family-names: Karakatič
    given-names: Sašo
    orcid: "https://orcid.org/0000-0003-4441-9690"
  date-published: 2024-08-12
  doi: 10.21105/joss.06748
  issn: 2475-9066
  issue: 100
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6748
  title: "GATree: Evolutionary decision tree classifier in Python"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06748"
  volume: 9
title: "GATree: Evolutionary decision tree classifier in Python"

GitHub Events

Total
  • Create event: 1
  • Release event: 1
  • Issues event: 3
  • Watch event: 3
  • Push event: 3
  • Pull request event: 5
  • Fork event: 2
Last Year
  • Create event: 1
  • Release event: 1
  • Issues event: 3
  • Watch event: 3
  • Push event: 3
  • Pull request event: 5
  • Fork event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 157
  • Total Committers: 3
  • Avg Commits per committer: 52.333
  • Development Distribution Score (DDS): 0.083
Past Year
  • Commits: 28
  • Committers: 2
  • Avg Commits per committer: 14.0
  • Development Distribution Score (DDS): 0.429
Top Committers
Name Email Commits
Tadej Lahovnik t****k@s****i 144
zala-lahovnik z****k@g****m 12
karakatic k****c@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 21
  • Total pull requests: 12
  • Average time to close issues: 29 days
  • Average time to close pull requests: about 6 hours
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.86
  • Average comments per pull request: 0.17
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 4
  • Average time to close issues: 16 days
  • Average time to close pull requests: about 18 hours
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.5
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • lahovniktadej (17)
Pull Request Authors
  • lahovniktadej (14)
  • zala-lahovnik (7)
Top Labels
Issue Labels
enhancement (13) documentation (3) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 20 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
pypi.org: gatree
  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 20 Last month
Rankings
Dependent packages count: 10.1%
Average: 38.3%
Dependent repos count: 66.5%
Maintainers (1)
Last synced: 4 months ago