HiPart

HiPart: Hierarchical Divisive Clustering Toolbox - Published in JOSS (2023)

https://github.com/panagiotisanagnostou/hipart

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bisecting-k-means bisecting-kmeans bisecting-kmeans-clustering cluster clustering contributions-welcome data-analysis data-mining data-science data-visualization divisive-clustering hierarchical-clustering machine-learning package pddp pyhton python python-package python3 visualization

Scientific Fields

Economics Social Sciences - 85% confidence
Last synced: 6 months ago · JSON representation

Repository

Hierarchical divisive clustering algorithm execution, visualization and Interactive visualization.

Basic Info
Statistics
  • Stars: 52
  • Watchers: 8
  • Forks: 8
  • Open Issues: 1
  • Releases: 19
Topics
bisecting-k-means bisecting-kmeans bisecting-kmeans-clustering cluster clustering contributions-welcome data-analysis data-mining data-science data-visualization divisive-clustering hierarchical-clustering machine-learning package pddp pyhton python python-package python3 visualization
Created over 4 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

PyPI PyPI - Python Version example workflow codecov Codacy Badge License: MIT DOI

HiPart: Hierarchical divisive clustering toolbox

This repository presents the HiPart package, an open-source native python library that provides efficient and interpretable implementations of divisive hierarchical clustering algorithms. HiPart supports interactive visualizations for the manipulation of the execution steps allowing the direct intervention of the clustering outcome. This package is highly suited for Big Data applications as the focus has been given to the computational efficiency of the implemented clustering methodologies. The dependencies used are either Python build-in packages or highly maintained stable external packages. The software is provided under the MIT license.

Installation

For the installation of the package, the only necessary actions and requirements are a version of Python higher or equal to 3.8 and the execution of the following command.

bash pip install HiPart

Simple Example Execution

The example bellow is the simplest form of the package's execution. Shortly, it shows the creation of synthetic clustering dataset containing 6 clusters. Afterwards it is clustered with the DePDDP algorithm and only the cluster labels are returned.

```python from HiPart.clustering import DePDDP from sklearn.datasets import make_blobs

X, y = makeblobs(nsamples=1500, centers=6, random_state=0)

clusteredclass = DePDDP(maxclustersnumber=6).fitpredict(X) ```

The HiPart package offers a comprehensive suite of examples to guide users in utilizing its various algorithms. These examples are conveniently located in the repository's examples directory.

For a general understanding of the package's capabilities, users can refer to the clustering_example file. This file serves as a foundational guide, providing complete examples of the package's algorithms in action.

Additionally, for those interested in incorporating KernelPCA methods, the clusteringwithkpca_example file is an invaluable resource. It offers a detailed example of how to apply KernelPCA within the context of the HiPart package.

Recognizing the importance of clustering via similarity or dissimilarity matrices, such as distance matrices, the HiPart package includes the clusteringwithdistancematrixexample file. This specific example demonstrates the use of the DePDDP algorithm with a distance matrix, offering a practical application scenario.

Lastly, the package features an interactive visualization component, which is exemplified in the interactivevisualizationexample file. This example not only showcases the execution of the interactive visualization but also provides comprehensive instructions for navigating the visualization GUI.

These resources collectively ensure that users of the HiPart package have a well-rounded and practical understanding of its functionalities and applications.

Documentation

The full documentation of the package can be found here.

Citation

bibtex @article{Anagnostou2023HiPart, title = {HiPart: Hierarchical Divisive Clustering Toolbox}, author = {Panagiotis Anagnostou and Sotiris Tasoulis and Vassilis P. Plagianakos and Dimitris Tasoulis}, year = {2023}, journal = {Journal of Open Source Software}, publisher = {The Open Journal}, volume = {8}, number = {84}, pages = {5024}, doi = {10.21105/joss.05024}, url = {https://doi.org/10.21105/joss.05024} }

Acknowledgments

This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI), under grant agreement No 1901.

Collaborators

Dimitris Tasoulis :email: Panagiotis Anagnostou :email: Sotiris Tasoulis :email: Vassilis Plagianakos :email:

Owner

  • Name: Panagiotis Anagnostou
  • Login: panagiotisanagnostou
  • Kind: user

JOSS Publication

HiPart: Hierarchical Divisive Clustering Toolbox
Published
April 18, 2023
Volume 8, Issue 84, Page 5024
Authors
Panagiotis Anagnostou ORCID
Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
Sotiris Tasoulis ORCID
Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
Vassilis P. Plagianakos ORCID
Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
Dimitris Tasoulis
Signal Ocean SMPC, Greece
Editor
Mehmet Hakan Satman ORCID
Tags
Clustering High dimensionality Machine Learning

GitHub Events

Total
  • Release event: 2
  • Watch event: 10
  • Push event: 9
  • Pull request event: 3
  • Fork event: 1
  • Create event: 2
Last Year
  • Release event: 2
  • Watch event: 10
  • Push event: 9
  • Pull request event: 3
  • Fork event: 1
  • Create event: 2

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 154
  • Total Committers: 6
  • Avg Commits per committer: 25.667
  • Development Distribution Score (DDS): 0.078
Past Year
  • Commits: 12
  • Committers: 1
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Panagiotis Anagnostou p****o@u****r 142
panagiotis40 p****0 4
Steve Stavropoulos s****e@m****r 3
Julien Jerphanion g****t@j****z 3
nicospavlidis n****s@g****m 1
Steve Stavropoulos s****e@n****r 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 32
  • Average time to close issues: about 14 hours
  • Average time to close pull requests: about 19 hours
  • Total issue authors: 3
  • Total pull request authors: 5
  • Average comments per issue: 0.33
  • Average comments per pull request: 0.59
  • Merged pull requests: 30
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 32 minutes
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • etsakanika (1)
  • JohnNellas (1)
  • panagiotisanagnostou (1)
  • Petros-Barmpas (1)
Pull Request Authors
  • panagiotisanagnostou (29)
  • stevestavropoulos (4)
  • jjerphan (2)
  • jbytecode (1)
  • nicospavlidis (1)
Top Labels
Issue Labels
enhancement (1) good first issue (1) bug (1) documentation (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 149 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 18
  • Total maintainers: 1
pypi.org: hipart

A hierarchical divisive clustering toolbox

  • Versions: 18
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 149 Last month
Rankings
Dependent packages count: 7.3%
Stargazers count: 10.4%
Forks count: 14.3%
Average: 17.2%
Dependent repos count: 22.1%
Downloads: 32.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/python-app.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • codecov/codecov-action v2 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
.github/workflows/branch_checker.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
docs/requirements.txt pypi
  • HiPart *
  • sphinx_rtd_theme *
pyproject.toml pypi
setup.py pypi
  • dash >=2.0
  • kdepy *
  • matplotlib *
  • numpy *
  • plotly *
  • scikit-learn *
  • scipy <=1.15.3
  • statsmodels >=0.13
  • treelib >=1.6