pymurtree

Python wrapper for the MurTree project

https://github.com/jurra/pymurtree

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Python wrapper for the MurTree project

Basic Info
  • Host: GitHub
  • Owner: jurra
  • License: mit
  • Language: C++
  • Default Branch: main
  • Size: 4.15 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 2
  • Open Issues: 1
  • Releases: 4
Created almost 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog License Citation

README.md

PyMurTree

DISCLAIMER: This codebase is currently in alpha version, meaning that the main branch version is made available for testing mainly by project members. Please note that this codebase is still under development and may contain bugs or errors. Users are advised to exercise caution when using this codebase and to report any issues or feedback to the developers so they can be addressed in future releases.

PyMurTree is a Python wrapper for the MurTree project. The MurTree algorithm constructs optimal classification trees that minimize the misclassification score of a given dataset while respecting constraints on depth and number of feature nodes. The sparse objective, which penalizes each node added in the tree, is also supported.


Citation

This package is based on the methods and algorithms described in:

"MurTree: Optimal Decision Trees via Dynamic Programming and Search"
by Emir Demirović, Anna Lukina, Emmanuel Hebrard, Jeffrey Chan, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao, and Peter J. Stuckey
Journal of Machine Learning Research (JMLR), 2022.
Available online


Installation

Before attempting to install pymurtree, make sure your system has the following software available: Python version 3.7 or higher and pip.

Install from source using pip

```bash git clone https://github.com/MurTree/pymurtree.git cd pymurtree

Optional: build pymurtree within a virtual environment

python3 --version # check your python version

sudo apt install python-venv

python3 -m venv env

. env/bin/activate

Install

pip install .

To install the dev version:

pip install .[dev]

```

Building and running the tests

For building and running the tests, you will need the following software: pytest, a C++ compiler, and CMake version 3.14 or higher.

```bash

Run the Python tests

pytest

Run the C++ tests

cd tests/cpptests mkdir build cd build cmake .. make ctest ```

Usage

API

The full API specification is available in the repo's Wiki.

pymurtree is implemented as a thin Python wrapper around the main C++ MurTree application. The main functionality of MurTree is exposed in pymurtree via the OptimalDecisionTreeClassifier class. Utility functions to load training datasets and export the tree in text and dot formats are also included in the python package.

OptimalDecisionTreeClassifier class - constructor: initialize the parameters of the model
- fit: fit a decision tree classifier to the given training dataset - predict: predict the labels for a set of features - score: return the accuracy on the given test data and labels - depth: return the depth of the tree - num_nodes: return the number of nodes of the tree - export_text: export decision tree in text format - export_dot: export decision tree in DOT format

Utility functions - read_from_file: read features and labels from file into a pandas dataframe - load_data: read features and labels from file into a numpy array

Example

After installing pymurtree you can use it in your Python code by importing the package. Here's an example of how to build a decision tree classifier from a training dataset, make predictions and export the tree for visualization with graphviz:

```python import pymurtree import numpy

Create training data

x = numpy.array([[0, 1, 0, 1], [1, 0, 0, 1], [1, 1, 0, 0]]) # features y = numpy.array([5, 5, 4]) # labels

Build tree classifier

model = pymurtree.OptimalDecisionTreeClassifier() model.fit(x, y, maxdepth=4, maxnum_nodes=5, time=400)

Predict labels for a new set of features

ft = numpy.array([[1, 0, 0, 1], [0, 0, 1, 1], [1, 0, 1, 0]]) labels = model.predict(ft)

Visualize tree

model.export_text()

Export tree in DOT format for visualization with graphviz

model.export_dot() ```

Datasets

A collection of datsets compatible with pymurtree is available in https://github.com/MurTree/murtree-data

Contributing

There are different ways in which you can contribute to pymurtree: - Try the package and let us know if its useful for your work. - If its useful, please star the repo. - Report a bug or request a feature by opening an issue. - Contribute to the codebase by opening a pull request. - Currently pymurtree works only on Linux, any help picking open issues and fixing them for other platforms is welcome.

License

MIT LICENSE

Owner

  • Name: Jose Urra
  • Login: jurra
  • Kind: user

Code apprentice

Citation (CITATION.cff)

---
message: If you use this software, please cite it as below.
cff-version: 1.2.0
title: pymurtree
abstract: Python bindings for MurTree C++ library
version: 0.0.1
date-released: '2022-09-18'
authors:
- name: Yasel Quintero
- name: Jose Urra
keywords:
- MurTree
- Optimal decision tree
- Python bindings
type: software
license: MIT
url: https://github.com/MurTree/pymurtree.git
...

GitHub Events

Total
Last Year

Dependencies

pyproject.toml pypi
  • numpy >=1.18.0
  • pandas >=1.0.0