pyracer

Unofficial Python implementation of the RACER classification algorithm

https://github.com/adversarian/racer

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, springer.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary

Keywords

artificial-intelligence classification-algorithm machine-learning racer

Last synced: 6 months ago · JSON representation ·

Repository

Unofficial Python implementation of the RACER classification algorithm

Basic Info

Host: GitHub
Owner: Adversarian
License: mit
Language: Python
Default Branch: main
Homepage: https://pyracer.readthedocs.io/en/latest/
Size: 90.8 KB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 11

Topics

artificial-intelligence classification-algorithm machine-learning racer

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

PyRACER

PyRACER is an unofficial Python implementation of the RACER classification algorithm described by Basiri et. al, 2019. RACER is designed specifically for discrete datasets and therefore uses the entropy-based MDLP discretization algorithm by Fayyad and Irani, 1993 for binary tasks and an optimal binning strategy for the multiclass case. The code is also heavily documented for ease of use.

Please consider citing this work if you intend to use it in an academic setting.

Installation

A new release will be made available on PyPI every time new features are added or bugs are fixed so you can simply use pip to install the package: bash $ pip install pyracer

If you would like to develop the package for your own use case however, you may clone this repository and then simply install the requirements. Reading the documentation prior to this is strongly advised, however, as you may find native support for your specific task in the private methods already available. bash $ git clone https://github.com/Adversarian/RACER/ $ cd RACER $ pip install -r requirements-dev.txt

Otherwise, you may also try to monkey-patch the class in your own code if you find that solution more appealing.

Usage

PyRACER is designed to be consistent with Scikit-learn estimator API which makes it very easy to use.

The following example demonstrates the use of RACER on the Zoo dataset. Take a look at examples for more use cases.

Data Obtention and Cleaning

```python from RACER import RACER, RACERPreprocessor from sklearn.modelselection import traintest_split import pandas as pd

dataset from https://archive.ics.uci.edu/ml/machine-learning-databases/zoo/

df = pd.readcsv( "datasets/zoo.data", names=[ "animalname", "hair", "feathers", "eggs", "milk", "airborne", "aquatic", "predator", "toothed", "backbone", "breathes", "venomous", "fins", "legs", "tail", "domestic", "catsize", "type", ], )

X = df.drop(columns=['animal_name', 'type']).astype('category') Y = df[['type']].astype('category') ```

RACER Preprocessing Step

RACER requires a preprocessing step to be performed on the data prior to splitting into test and train portions. This step discretizes continous features and then converts each feature into a dummy encoded variable. Note that since different discretization methods are used for multiclass and binary classification tasks you need to either specify the task using the target keyword argument or leave it to default to "auto" which attempts to infer your task when you call fit_transform(X,y) from the number of unique values in y.

RACERPreprocessor now also supports separate fit and transform functions but it is still recommended to use fit_transform or perform fit on the entire dataset prior to splitting. This ensures that new unseen values are not left out of the transformation at test time. ```python X, Y = RACERPreprocessor(target="multiclass").fit_transform(X, Y)

Xtrain, Xtest, Ytrain, Ytest = traintestsplit(X,Y, randomstate=1, testsize=0.3) ```

Fitting RACER on the Dataset

RACER provides a benchmark keyword argument that can be used to time the fit method. Moreover, the hyperparameter alpha can be set using its respective keyword argument. (Note that beta is uniquely determined as 1.0 - alpha and is therefore not exposed through a keyword argument) python racer = RACER(alpha=0.95, benchmark=True) racer.fit(X_train, Y_train)

Now you may access the public methods available within the racer object such as score and display_rules. For example: ```python

racer.score(Xtest, Ytest) ... 0.8709677419354839

racer.display_rules() ... Algorithm Parameters: ... - Alpha: 0.95 ... - Time to fit: 0.008133015999987947s ... ... Final Rules (8 total): (if --> then (label) | fitness) ... [111011011111111101011011111000111111] --> 1000000 | 0.9685714285714285 ... [100101101111111001011010010000011111] --> 0100000 | 0.9607142857142856 ... [101001101001110101101101100000011111] --> 0001000 | 0.9571428571428571 ... [101011101011011010111110101011111011] --> 0000001 | 0.9542857142857143 ... [111001101110101010011110000010101010] --> 0000010 | 0.9535714285714285 ... [101011101011111101111110101000011011] --> 0010000 | 0.9528571428571428 ... [101001101001110101011110001000101010] --> 0000100 | 0.9521428571428571 ... [101001101010101010011010100000101010] --> 0000001 | 0.9507142857142856 ```

To Do

~Add another example notebook featuring Scikit-learn's built-in datasets.~

~Replace pandas.get_dummies() with Scikit-learn's OneHotEncoder for better consistency.~

Unify discretization algorithms for all tasks.

Better docs!

Issues and Feature Requests

Found a problem within the implementation or an inconsistency with the original algorithm? Or maybe you would like to request a feature? Please feel free to submit a PR or create a new issue.

Official Paper

bibtex @Article{Basiri2019, author="Basiri, Javad and Taghiyareh, Fattaneh and Faili, Heshaam", title="RACER: accurate and efficient classification based on rule aggregation approach", journal="Neural Computing and Applications", year="2019", month="Mar", day="01", volume="31", number="3", pages="895--908", issn="1433-3058", doi="10.1007/s00521-017-3117-2", url="https://doi.org/10.1007/s00521-017-3117-2" }

Owner

Login: Adversarian
Kind: user

Repositories: 1
Profile: https://github.com/Adversarian

Citation (CITATION.cff)

# YAML 1.2
---
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Tashakkor"
  given-names: "Arian"
  orcid: "https://orcid.org/0009-0000-6806-3217"
title: "PyRACER - Unofficial Python implementation of the RACER classification algorithm."
url: "https://github.com/Adversarian/RACER"
doi: 10.5281/zenodo.8174037
date-released: 2023
version: 1.2.1

GitHub Events

Total

Last Year

Packages

Total packages: 1
Total downloads:
- pypi 21 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 10
Total maintainers: 1

pypi.org: pyracer

Unofficial Python implementation of the RACER classification algorithm.

Homepage: https://github.com/Adversarian/RACER
Documentation: https://pyracer.readthedocs.io/
License: MIT
Latest release: 1.2.1
published over 2 years ago

Versions: 10
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 21 Last month

Rankings

Dependent packages count: 7.5%

Downloads: 7.7%

Average: 29.5%

Forks count: 30.2%

Stargazers count: 32.2%

Dependent repos count: 69.8%

Maintainers (1)

Minuano

Last synced: 6 months ago

Dependencies

.github/workflows/python-publish.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

setup.py pypi

numpy *
optbinning *
pandas *
scikit-learn *

.github/workflows/pr-tests.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite

docs/requirements.txt pypi

numpy ==1.23.5
optbinning ==0.17.3
pandas ==2.0.3
scikit-learn ==1.2.2
sphinxnotes-strike *

requirements-dev.txt pypi

numpy ==1.23.5 development
optbinning ==0.17.3 development
pandas ==2.0.3 development
pytest * development
scikit-learn ==1.2.2 development

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science