Recent Releases of fgclustering
fgclustering - Preprint Release, Fixes & Enhancements
This release brings several improvements and additions to Forest-Guided Clustering, including bug fixes, feature enhancements, and support for the new preprint publication.
🔧 Bug Fixes - Consistent Cluster Ordering: Fixed an issue where the cluster ordering differed between the feature importance plots and the heatmap/boxplot visualizations.
✨ Enhancements - CLARA Subsampling Improvements: Added support for subsampling in CLARA clustering with preservation of the original target label distribution, improving robustness and interpretability.
📄 New Additions - Preprint Publication Scripts: Added new scripts and Jupyter notebooks used in the preparation of the FGC preprint, enabling full reproducibility. - README Update: Repository overview now includes a citation badge and link to the preprint on arXiv.
- Jupyter Notebook
Published by lisa-sousa 7 months ago
fgclustering - Preprint Release, Fixes & Enhancements
This release brings several improvements and additions to Forest-Guided Clustering, including bug fixes, feature enhancements, and support for the new preprint publication.
🔧 Bug Fixes - Consistent Cluster Ordering: Fixed an issue where the cluster ordering differed between the feature importance plots and the heatmap/boxplot visualizations.
✨ Enhancements - CLARA Subsampling Improvements: Added support for subsampling in CLARA clustering with preservation of the original target label distribution, improving robustness and interpretability.
📄 New Additions - Preprint Publication Scripts: Added new scripts and Jupyter notebooks used in the preparation of the FGC preprint, enabling full reproducibility. - README Update: Repository overview now includes a citation badge and link to the preprint on arXiv.
- Jupyter Notebook
Published by lisa-sousa 7 months ago
fgclustering - v2.0.1 Bug Fixes
This patch release includes two minor but important bug fixes:
Bug Fixes
- Fixed an inconsistency in
plot_heatmap_classificationrelated to renaming of variable names - Removed an unnecessary
elsebranch inDistanceJensenShannon.calculate_distance_cluster_vs_background
- Jupyter Notebook
Published by lisa-sousa 8 months ago
fgclustering - v2.0.0 Refactored, Scalable, and Faster
Major Release
This release delivers major performance improvements, scalable computation, and a full API refactor to align with the scikit-learn ecosystem. It introduces a memory-efficient distance matrix implementation, enhanced stability analysis, and a modular architecture for flexible usage and extension.
Performance & Storage Improvements
- Faster distance matrix computation using optimized logic.
- Memory-efficient distance matrix via on-disk
memmapstorage format.
Stability Estimation with Jaccard Index
- Switched from minimum to mean Jaccard Index as the stability metric.
- Each cluster’s stability is now reported individually and reporting format changed.
- Bootstrapping now supports sampling only a fraction (x%) of the dataset.
Refactored API (Scikit-learn Compatible)
- Main logic now exposed as a functional API returning
sklearn.utils.Bunch. - Distance metrics (e.g., Random Forest proximity) refactored into reusable classes.
- Clustering algorithms (KMedoids, CLARA) encapsulated as configurable classes.
- Feature importance computation and cluster optimization structured as modular classes.
New: CLARA Clustering Algorithm
- Integrated ClusteringClara for handling large datasets.
- Supports bootstrapped inputs with missing sample indices.
- Verified inertia and label assignment logic.
- Includes unit tests for stability and correctness.
Documentation Updates
- Updated README, example notebooks, docstrings, and ReadTheDocs.
- Standardized function and class documentation for clarity and completeness.
- Jupyter Notebook
Published by lisa-sousa 8 months ago
fgclustering - v1.2.0 - New Importance Calculation
This is a major release where we:
- substitute p-values as measure for feature importance with Wasserstein and Jensen-Shannon distance
- enable the usage of custom colors for heatmap plotting and saving of interactive heatmaps as html files
- Jupyter Notebook
Published by lisa-sousa 9 months ago
fgclustering - v1.1.1 - decision path plotting heatmap
This is a minor reease where we: - refactored the code for the decision path heatmap plot - added the possibility to plot interactive heatmaps using plotly
- Jupyter Notebook
Published by lisa-sousa over 1 year ago
fgclustering - v1.1.0 - importance computation and plotting
This is a major release where we changed:
- Importance computation: adjust importance value computation from 1-p-value to a normalized negative log transformation. In addition, clip the p-values at 1e-50 to avoid log10(0) and then normalize by log10(1e-50): -log10(p-value) / log10(1e-50). New importance values range between 0 and 1 but stretch the smaller p-values (closer to 0) more and compress the larger ones (closer to 1).
- Plotting: merge two feature impotance plotting functions (plot_global_feature_importance and plot_local_feature_importance) into one function plot_feature_importance, which plots global and lcoal feature importance in one grid plot
- Jupyter Notebook
Published by lisa-sousa over 1 year ago
fgclustering - v1.0.4 - additional functionalities and bug fix
Major changes in release: - github actions miniconda installation error fix - bug fix pandas groupby error for new pandas version - enable selection of only top n features for plotting functions - add multiple testing correction to p-value calculation across clusters - bug fix calculation of chi square test for multiple categories over multiple clusters
- Jupyter Notebook
Published by lisa-sousa over 1 year ago
fgclustering - v1.0.3 - bug fix
This is a minor release with small bug fix: fix pytest.
- Jupyter Notebook
Published by lisa-sousa almost 3 years ago
fgclustering - v1.0.2 - bug fix
This is a minor release with small bug fix: adding missing import statement.
- Jupyter Notebook
Published by lisa-sousa almost 3 years ago
fgclustering - v1.0.1 - speed-up
This is a major release where we changed:
- new functions added that are speeding up the code
- new package for kmedoids calculation is used which enables faster computation
- parallelization of cluster optimization step
- introduction of new parameters such as n_jobs for number of parallel jobs to execute or verbose for printing the output
- update of the documentation
- updated and extended tutorial
- minor bug fixes - fgc runs now on categorical target inputs; use metric = precomputed for the k-medoids calculation
- added extra function calculate_statistics which enables visualisations with features not seen by the random forest or fgc
- Jupyter Notebook
Published by hpelin over 3 years ago
fgclustering - v0.3 - New Plotting Functionalities
This is a minor release, where we changed:
- change plotboxplots() into plotdistribution() and display categorical features as barplots
- merge plotheatmap() and plotboxplots() into plotdecisionpaths(), as both plots are used to derive decision rules from Random Forests
- minor bug fix: when computing the global feature, copy X such that original X is not modified
- Jupyter Notebook
Published by lisa-sousa over 3 years ago
fgclustering - v0.2.0 - first release of FGC package
The first version of the package was developed by @lisa-sousa and @DoTha .
Full Changelog: https://github.com/HelmholtzAI-Consultants-Munich/fg-clustering/commits/v0.2.0
- Jupyter Notebook
Published by lisa-sousa almost 4 years ago