mst_clustering

mst_clustering: Clustering via Euclidean Minimum Spanning Trees - Published in JOSS (2016)

https://github.com/jakevdp/mst_clustering

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.8%) to scientific vocabulary

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation

Repository

Scikit-learn style estimator for Minimum Spanning Tree Clustering in Python

Basic Info
  • Host: GitHub
  • Owner: jakevdp
  • License: bsd-2-clause
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 439 KB
Statistics
  • Stars: 85
  • Watchers: 6
  • Forks: 19
  • Open Issues: 1
  • Releases: 2
Created over 10 years ago · Last pushed almost 10 years ago
Metadata Files
Readme License

README.md

Minimum Spanning Tree Clustering

build status version status license DOI JOSS

This package implements a simple scikit-learn style estimator for clustering with a minimum spanning tree.

Motivation

Automated clustering can be an important means of identifying structure in data, but many of the more popular clustering algorithms do not perform well in the presence of background noise. The clustering algorithm implemented here, based on a trimmed Euclidean Minimum Spanning Tree, can be useful in this case.

Example

The API of the mst_clustering code is designed for compatibility with the scikit-learn project.

```python from mstclustering import MSTClustering from sklearn.datasets import makeblobs import matplotlib.pyplot as plt

create some data with four clusters

X, y = makeblobs(200, centers=4, randomstate=42)

predict the labels with the MST algorithm

model = MSTClustering(cutoffscale=2) labels = model.fitpredict(X)

plot the results

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow'); ```

Simple Clustering Plot

For a detailed explanation of the algorithm and a more interesting example of it in action, see the MST Clustering Notebook.

Installation & Requirements

The mst_clustering package itself is fairly lightweight. It is tested on Python 2.7 and 3.4-3.5, and depends on the following packages:

Using the cross-platform conda package manager, these requirements can be installed as follows:

$ conda install numpy scipy scikit-learn

Finally, the current release of mst_clustering can be installed using pip: $ conda install pip # if using conda $ pip install mst_clustering

To install mst_clustering from source, first download the source repository and then run $ python setup.py install

Contributing & Reporting Issues

Bug reports, questions, suggestions, and contributions are welcome. For these, please make use the Issues or Pull Requests associated with this repository.

Citing

If you use this code in an academic publication, please consider citing this JOSS Paper.

Owner

  • Name: Jake Vanderplas
  • Login: jakevdp
  • Kind: user
  • Location: Oakland CA
  • Company: Google

Python ML & Data Science

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 36
  • Total Committers: 1
  • Avg Commits per committer: 36.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jake VanderPlas j****p@g****m 36

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ktsakos (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 901 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 5
  • Total versions: 3
  • Total maintainers: 1
pypi.org: mst_clustering

Clustering with Minimum Spanning Trees

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 5
  • Downloads: 901 Last month
Rankings
Dependent repos count: 6.7%
Stargazers count: 7.6%
Average: 8.4%
Forks count: 8.6%
Downloads: 9.2%
Dependent packages count: 10.0%
Maintainers (1)
Last synced: 6 months ago