https://github.com/a11to1n3/pheatpruner

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: a11to1n3
License: apache-2.0
Language: Python
Default Branch: main
Size: 10.7 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Created almost 2 years ago · Last pushed 11 months ago

Metadata Files

Readme License

PHeatPruner

PHeatPruner is a Python function designed to prune variables in multivariate time-series datasets using persistent homology analysis. It also offers an optional sheafification process to enhance the feature set, making it useful for dimensionality reduction while maintaining the essential structure for machine learning tasks.

Installation

To use PHeatPruner, you need to install the following dependencies:

numpy
pandas
tqdm
gudhi
matplotlib
scikit-learn
shap
aeon

You can install these using pip:

bash pip install numpy pandas tqdm gudhi matplotlib scikit-learn shap aeon

Usage

Here’s an example (also in here) of how to use PHeatPruner with a dataset from the UEA Archive:

```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from tqdm import tqdm from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classificationreport, confusionmatrix, ConfusionMatrixDisplay from aeon.datasets import load_classification from src.PHeatPruner import PHeatPruner import shap

Load the dataset

dataset = "NATOPS" # or any other dataset in the UEA Archive Xtrain, ytrain = loadclassification(dataset, split="train") Xtest, ytest = loadclassification(dataset, split="test")

Encode labels as integer indices

ytrain = [np.where(np.unique(y_train) == label)[0][0] for label in ytrain] ytraindf = pd.DataFrame(ytrain) ytest = [np.where(np.unique(ytest) == label)[0][0] for label in ytest] ytestdf = pd.DataFrame(y_test)

Prune the dataset using PHeatPruner

prunedXtrain, prunedXtest = PHeatPruner(Xtrain, Xtest)

Train a RandomForestClassifier on the pruned data

rfclf = RandomForestClassifier(nestimators=100, randomstate=42) rfclf.fit(prunedXtrain, ytraindf) predictions = rfclf.predict(prunedX_test)

Display the confusion matrix

print("Confusion Matrix:") print(confusionmatrix(ytestdf, predictions)) ConfusionMatrixDisplay.frompredictions(ytestdf, predictions) plt.title('Confusion Matrix') plt.show()

Print the classification report

print("Classification Report:") print(classificationreport(ytest_df, predictions))

Explain the model using SHAP

explainer = shap.TreeExplainer(rfclf) explanation = explainer(prunedXtest) shap.plots.beeswarm(explanation[:, :, 0], maxdisplay=40) plt.show()

Re-prune the data with sheafification

prunedXtrainsheaf, prunedXtestsheaf = PHeatPruner(Xtrain, Xtest, sheafification=True)

Retrain the model on the sheafified data

rfclf.fit(prunedXtrainsheaf, ytraindf) predictionssheaf = rfclf.predict(prunedXtest_sheaf)

Explain the sheafified data model using SHAP

explanationsheaf = explainer(prunedXtestsheaf) shap.plots.beeswarm(explanationsheaf[:, :, 0], maxdisplay=40) plt.show()

Display the confusion matrix for the sheafified data

print("Confusion Matrix (Sheafified Data):") print(confusionmatrix(ytestdf, predictionssheaf)) ConfusionMatrixDisplay.frompredictions(ytestdf, predictionssheaf) plt.title('Confusion Matrix (Sheafified Data)') plt.show()

Print the classification report for the sheafified data

print("Classification Report (Sheafified Data):") print(classificationreport(ytestdf, predictionssheaf)) ```

Note

Persistent Homology: This technique determines the optimal epsilon threshold for pruning variables based on the topological features of the dataset.
Sheafification: An optional process that enhances the feature set by considering higher-order interactions among the variables.

LICENSE

This project is licensed under the Apache License - see the LICENSE file for details.

Owner

Name: Anh-Duy Pham
Login: a11to1n3
Kind: user

Website: https://a11to1n3.github.io/blog/
Repositories: 1
Profile: https://github.com/a11to1n3

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science