https://github.com/a11to1n3/pheatpruner

https://github.com/a11to1n3/pheatpruner

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: a11to1n3
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 10.7 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created almost 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

PHeatPruner

PHeatPruner is a Python function designed to prune variables in multivariate time-series datasets using persistent homology analysis. It also offers an optional sheafification process to enhance the feature set, making it useful for dimensionality reduction while maintaining the essential structure for machine learning tasks.

Installation

To use PHeatPruner, you need to install the following dependencies:

  • numpy
  • pandas
  • tqdm
  • gudhi
  • matplotlib
  • scikit-learn
  • shap
  • aeon

You can install these using pip:

bash pip install numpy pandas tqdm gudhi matplotlib scikit-learn shap aeon

Usage

Here’s an example (also in here) of how to use PHeatPruner with a dataset from the UEA Archive:

```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from tqdm import tqdm from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classificationreport, confusionmatrix, ConfusionMatrixDisplay from aeon.datasets import load_classification from src.PHeatPruner import PHeatPruner import shap

Load the dataset

dataset = "NATOPS" # or any other dataset in the UEA Archive Xtrain, ytrain = loadclassification(dataset, split="train") Xtest, ytest = loadclassification(dataset, split="test")

Encode labels as integer indices

ytrain = [np.where(np.unique(y_train) == label)[0][0] for label in ytrain] ytraindf = pd.DataFrame(ytrain) ytest = [np.where(np.unique(ytest) == label)[0][0] for label in ytest] ytestdf = pd.DataFrame(y_test)

Prune the dataset using PHeatPruner

prunedXtrain, prunedXtest = PHeatPruner(Xtrain, Xtest)

Train a RandomForestClassifier on the pruned data

rfclf = RandomForestClassifier(nestimators=100, randomstate=42) rfclf.fit(prunedXtrain, ytraindf) predictions = rfclf.predict(prunedX_test)

Display the confusion matrix

print("Confusion Matrix:") print(confusionmatrix(ytestdf, predictions)) ConfusionMatrixDisplay.frompredictions(ytestdf, predictions) plt.title('Confusion Matrix') plt.show()

Print the classification report

print("Classification Report:") print(classificationreport(ytest_df, predictions))

Explain the model using SHAP

explainer = shap.TreeExplainer(rfclf) explanation = explainer(prunedXtest) shap.plots.beeswarm(explanation[:, :, 0], maxdisplay=40) plt.show()

Re-prune the data with sheafification

prunedXtrainsheaf, prunedXtestsheaf = PHeatPruner(Xtrain, Xtest, sheafification=True)

Retrain the model on the sheafified data

rfclf.fit(prunedXtrainsheaf, ytraindf) predictionssheaf = rfclf.predict(prunedXtest_sheaf)

Explain the sheafified data model using SHAP

explanationsheaf = explainer(prunedXtestsheaf) shap.plots.beeswarm(explanationsheaf[:, :, 0], maxdisplay=40) plt.show()

Display the confusion matrix for the sheafified data

print("Confusion Matrix (Sheafified Data):") print(confusionmatrix(ytestdf, predictionssheaf)) ConfusionMatrixDisplay.frompredictions(ytestdf, predictionssheaf) plt.title('Confusion Matrix (Sheafified Data)') plt.show()

Print the classification report for the sheafified data

print("Classification Report (Sheafified Data):") print(classificationreport(ytestdf, predictionssheaf)) ```

Note

  • Persistent Homology: This technique determines the optimal epsilon threshold for pruning variables based on the topological features of the dataset.
  • Sheafification: An optional process that enhances the feature set by considering higher-order interactions among the variables.

LICENSE

This project is licensed under the Apache License - see the LICENSE file for details.

Owner

  • Name: Anh-Duy Pham
  • Login: a11to1n3
  • Kind: user

GitHub Events

Total
  • Push event: 3
Last Year
  • Push event: 3