https://github.com/a11to1n3/pheatpruner
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: a11to1n3
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 10.7 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
PHeatPruner
PHeatPruner is a Python function designed to prune variables in multivariate time-series datasets using persistent homology analysis. It also offers an optional sheafification process to enhance the feature set, making it useful for dimensionality reduction while maintaining the essential structure for machine learning tasks.
Installation
To use PHeatPruner, you need to install the following dependencies:
numpypandastqdmgudhimatplotlibscikit-learnshapaeon
You can install these using pip:
bash
pip install numpy pandas tqdm gudhi matplotlib scikit-learn shap aeon
Usage
Here’s an example (also in here) of how to use PHeatPruner with a dataset from the UEA Archive:
```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from tqdm import tqdm from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classificationreport, confusionmatrix, ConfusionMatrixDisplay from aeon.datasets import load_classification from src.PHeatPruner import PHeatPruner import shap
Load the dataset
dataset = "NATOPS" # or any other dataset in the UEA Archive Xtrain, ytrain = loadclassification(dataset, split="train") Xtest, ytest = loadclassification(dataset, split="test")
Encode labels as integer indices
ytrain = [np.where(np.unique(y_train) == label)[0][0] for label in ytrain] ytraindf = pd.DataFrame(ytrain) ytest = [np.where(np.unique(ytest) == label)[0][0] for label in ytest] ytestdf = pd.DataFrame(y_test)
Prune the dataset using PHeatPruner
prunedXtrain, prunedXtest = PHeatPruner(Xtrain, Xtest)
Train a RandomForestClassifier on the pruned data
rfclf = RandomForestClassifier(nestimators=100, randomstate=42) rfclf.fit(prunedXtrain, ytraindf) predictions = rfclf.predict(prunedX_test)
Display the confusion matrix
print("Confusion Matrix:") print(confusionmatrix(ytestdf, predictions)) ConfusionMatrixDisplay.frompredictions(ytestdf, predictions) plt.title('Confusion Matrix') plt.show()
Print the classification report
print("Classification Report:") print(classificationreport(ytest_df, predictions))
Explain the model using SHAP
explainer = shap.TreeExplainer(rfclf) explanation = explainer(prunedXtest) shap.plots.beeswarm(explanation[:, :, 0], maxdisplay=40) plt.show()
Re-prune the data with sheafification
prunedXtrainsheaf, prunedXtestsheaf = PHeatPruner(Xtrain, Xtest, sheafification=True)
Retrain the model on the sheafified data
rfclf.fit(prunedXtrainsheaf, ytraindf) predictionssheaf = rfclf.predict(prunedXtest_sheaf)
Explain the sheafified data model using SHAP
explanationsheaf = explainer(prunedXtestsheaf) shap.plots.beeswarm(explanationsheaf[:, :, 0], maxdisplay=40) plt.show()
Display the confusion matrix for the sheafified data
print("Confusion Matrix (Sheafified Data):") print(confusionmatrix(ytestdf, predictionssheaf)) ConfusionMatrixDisplay.frompredictions(ytestdf, predictionssheaf) plt.title('Confusion Matrix (Sheafified Data)') plt.show()
Print the classification report for the sheafified data
print("Classification Report (Sheafified Data):") print(classificationreport(ytestdf, predictionssheaf)) ```
Note
- Persistent Homology: This technique determines the optimal epsilon threshold for pruning variables based on the topological features of the dataset.
- Sheafification: An optional process that enhances the feature set by considering higher-order interactions among the variables.
LICENSE
This project is licensed under the Apache License - see the LICENSE file for details.
Owner
- Name: Anh-Duy Pham
- Login: a11to1n3
- Kind: user
- Website: https://a11to1n3.github.io/blog/
- Repositories: 1
- Profile: https://github.com/a11to1n3
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3