Persistable

Persistable: persistent and stable clustering - Published in JOSS (2023)

https://github.com/luisscoccola/persistable

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

cluster-analysis clustering clustering-algorithm machine-learning machine-learning-algorithms topological-data-analysis unsupervised-learning

Scientific Fields

Mathematics Computer Science - 84% confidence
Last synced: 4 months ago · JSON representation

Repository

density-based clustering for exploratory data analysis based on multi-parameter persistence

Basic Info
  • Host: GitHub
  • Owner: LuisScoccola
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 11.3 MB
Statistics
  • Stars: 41
  • Watchers: 6
  • Forks: 2
  • Open Issues: 1
  • Releases: 15
Topics
cluster-analysis clustering clustering-algorithm machine-learning machine-learning-algorithms topological-data-analysis unsupervised-learning
Created over 4 years ago · Last pushed 5 months ago
Metadata Files
Readme License Code of conduct

README.md

PyPI Downloads tests coverage docs status

license

Persistent and stable clustering (Persistable) is a density-based clustering algorithm intended for exploratory data analysis. What distinguishes Persistable from other clustering algorithms is its visualization capabilities. Persistable's interactive mode lets you visualize multi-scale and multi-density cluster structure present in the data. This is used to guide the choice of parameters that lead to the final clustering.

Usage

Here is a brief outline of the main functionality; see the documentation for details, including the API reference.

In order to run Persistable's interactive mode from a Jupyter notebook, run the following in a Jupyter cell:

```python import persistable from sklearn.datasets import make_blobs

X = makeblobs(2000, centers=4, randomstate=1)[0]

p = persistable.Persistable(X) pi = persistable.PersistableInteractive(p) pi.start_ui() ```

The last command returns the port in localhost serving the UI, which is 8050 by default. Now go to localhost:8050 in your web browser to access the graphical user interface:

Alt text

After choosing your parameters using the user interface, you can get your clustering in another Jupyter cell by running:

python clustering_labels = pi.cluster()

Note: You may use pi.start_ui(jupyter_mode="inline") to have the graphical user interface display directly in the Jupyter notebook!

Installing

Make sure you are using Python 3. Persistable depends on the following python packages, which will be installed automatically when you install with pip: numpy, scipy, scikit-learn, cython, plotly, dash, diskcache, multiprocess, psutil. To install from pypi, simply run the following:

bash pip install persistable-clustering

Documentation and support

You can find the documentation at persistable.readthedocs.io. If you have further questions, please open an issue and we will do our best to help you. Please include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use. If you do not wish to open an issue, you are also welcome to contact Luis Scoccola directly. Please be patient if it takes us a bit to get back to you.

Running the tests

You can run the tests by running the following commands from the root directory of a clone of this repository. If a test fails, please report a bug, trying to include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.

bash pip install pytest playwright pytest-playwright python -m playwright install --with-deps pip install -r requirements.txt python -m setup build_ext --inplace pytest .

Details about theory and implementation

Persistable is based on multi-parameter persistence [4], a method from topological data analysis. The theory behind Persistable is developed in [1], while this implementation uses the high performance algorithms for density-based clustering developed in [2] and implemented in [3]. Persistable's interactive mode is inspired by RIVET [5] and is implemented in Dash.

Contributing

To contribute, you can fork the project, make your changes, and submit a pull request. You may want to contact Luis Scoccola first, to make sure your work does not overlap with ongoing work.

Authors

Luis Scoccola and Alexander Rolle.

Citing

If you use this package in your work, you may cite the corresponding paper using the following bibtex entry:

@article{Scoccola2023, doi = {10.21105/joss.05022}, url = {https://doi.org/10.21105/joss.05022}, year = {2023}, publisher = {The Open Journal}, volume = {8}, number = {83}, pages = {5022}, author = {Luis Scoccola and Alexander Rolle}, title = {Persistable: persistent and stable clustering}, journal = {Journal of Open Source Software} }

References

[1] Stable and consistent density-based clustering via multiparameter persistence. A. Rolle and L. Scoccola. Journal of Machine Learning Research, 25(258):1-74, 2024

[2] Accelerated Hierarchical Density Based Clustering. L. McInnes, J. Healy. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017

[3] hdbscan: Hierarchical density based clustering. L. McInnes, J. Healy, S. Astels. Journal of Open Source Software, The Open Journal, volume 2, number 11. 2017

[4] An Introduction to Multiparameter Persistence. M. B. Botnan, M. Lesnick. Proceedings of the 2020 International Conference on Representations of Algebras. 2022

[5] RIVET. The RIVET Developers. [Git] [docs]

License

This software is published under the 3-clause BSD license.

Owner

  • Login: LuisScoccola
  • Kind: user

JOSS Publication

Persistable: persistent and stable clustering
Published
March 08, 2023
Volume 8, Issue 83, Page 5022
Authors
Luis Scoccola ORCID
Northeastern University
Alexander Rolle
Technical University of Munich
Editor
Rachel Kurchin ORCID
Tags
clustering unsupervised learning machine learning topological data analysis

GitHub Events

Total
  • Watch event: 7
  • Push event: 7
Last Year
  • Watch event: 7
  • Push event: 7

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 228
  • Total Committers: 3
  • Avg Commits per committer: 76.0
  • Development Distribution Score (DDS): 0.048
Past Year
  • Commits: 14
  • Committers: 1
  • Avg Commits per committer: 14.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Luis Scoccola l****a@g****m 217
alexanderrolle a****e@g****m 10
Manuel Ferreria m****F 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 8
  • Total pull requests: 42
  • Average time to close issues: 4 days
  • Average time to close pull requests: 2 minutes
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 1.75
  • Average comments per pull request: 0.14
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 10 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AP6YC (3)
  • lmcinnes (3)
  • peekxc (1)
  • jonesparg (1)
Pull Request Authors
  • LuisScoccola (42)
  • manuelF (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 106 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 17
  • Total maintainers: 2
pypi.org: persistable-clustering

Density-based clustering for exploratory data analysis based on multi-parameter persistence

  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 106 Last month
Rankings
Dependent packages count: 6.6%
Downloads: 8.2%
Stargazers count: 11.1%
Average: 16.0%
Forks count: 23.2%
Dependent repos count: 30.6%
Maintainers (2)
Last synced: 4 months ago

Dependencies

.github/workflows/run_tests.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
requirements.txt pypi
  • cython >=0.27
  • ipympl *
  • matplotlib *
  • numpy >=1.20
  • scikit-learn >=0.20
  • scipy >=1.0
.github/workflows/publish.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
pyproject.toml pypi
setup.py pypi
docs/requirements.txt pypi
  • sphinx-rtd-theme *