mosaic-clustering

Correlation-based feature selection of Molecular Dynamics simulations

https://github.com/moldyn/mosaic

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, acs.org
  • Committers with academic emails
  • Institutional organization owner
    Organization moldyn has institutional domain (www.moldyn.uni-freiburg.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary

Keywords from Contributors

cryptocurrencies molecule mesh networks interactive dynamics hacking
Last synced: 6 months ago · JSON representation ·

Repository

Correlation-based feature selection of Molecular Dynamics simulations

Basic Info
Statistics
  • Stars: 29
  • Watchers: 2
  • Forks: 2
  • Open Issues: 4
  • Releases: 2
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

DocsFeaturesInstallationUsageFAQ

Molecular Systems Automated Identification of Cooperativity

MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise. Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.

The method was published in:

Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins
G. Diez, D. Nagel, and G. Stock,
J. Chem. Theory Comput. 2022 18 (8), 5079-5088,
doi: 10.1021/acs.jctc.2c00337

If you use this software package, please cite the above mentioned paper.

Features

  • Intuitive usage via module and via CI
  • Sklearn-style API for fast integration into your Python workflow
  • No magic, only a single parameter which can be optimized via cross-validation
  • Extensive documentation and detailed discussion in publication
  • Step by step tutorial to follow

Installation

The package is called mosaic-clustering and is available via PyPI or conda. To install it, simply call: bash python3 -m pip install --upgrade mosaic-clustering or conda install -c conda-forge mosaic-clustering

or for the latest dev version ```bash

via ssh key

python3 -m pip install git+ssh://git@github.com/moldyn/MoSAIC.git

or via password-based login

python3 -m pip install git+https://github.com/moldyn/MoSAIC.git ```

In case one wants to use the deprecated UMAPSimilarity or the module mosaic umap one needs to specify the extras_require='umap', so bash python3 -m pip install --upgrade moldyn-mosaic[umap]

Shell Completion

Using the bash, zsh or fish shell click provides an easy way to provide shell completion, checkout the docs. In the case of bash you need to add following line to your ~/.bashrc bash eval "$(_MOSAIC_COMPLETE=bash_source mosaic)"

Usage

In general one can call the module directly by its entry point $ MoSAIC or by calling the module $ python -m mosaic. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.

CI - Usage Directly from the Command Line

The module brings a rich CI using click. Each module and submodule contains a detailed help, which can be accessed by ```bash $ python -m mosaic Usage: python -m mosaic [OPTIONS] COMMAND [ARGS]...

MoSAIC motion v0.4.1

Molecular systems automated identification of collective motion, is a correlation based feature selection framework for MD data. Copyright (c) 2021-2023, Georg Diez and Daniel Nagel

Options: --help Show this message and exit.

Commands: clustering Clustering similarity matrix of coordinates. similarity Creating similarity matrix of coordinates. tui Open Textual TUI for interactive usage. ``` For more details on the submodule one needs to specify one of the two commands, or by opening the terminal user interface (tui).

A simple workflow example for clustering the input file input_file using correlation and Leiden with CPM and the default resolution parameter: ```bash

creating correlation matrix

$ python -m mosaic similarity -i inputfile -o outputsimilarity --metric correlation -v

MoSAIC SIMILARITY ~~~ Initialize similarity class ~~~ Load file inputfile ~~~ Fit input ~~~ Store similarity matrix in outputsimilarity

clustering with CPM and default resolution parameter

the latter needs to be fine-tuned to each matrix

$ python -m mosaic clustering -i outputsimilarity -o outputclustering --plot -v

MoSAIC CLUSTERING ~~~ Initialize clustering class ~~~ Load file outputsimilarity ~~~ Fit input ~~~ Store output ~~~ Plot matrix `` This will generate the similarity matrix stored inoutputsimilarity, the plotted result inoutputclustering.matrix.pdf, the raw data of the matrix inoutputclustering.matrix` and a file containing in each row the indices of a cluster.

Module - Inside a Python Script

```python import mosaic

Load file

X is np.ndarray of shape (nsamples, nfeatures)

sim = mosaic.Similarity( metric='correlation', # or 'NMI', 'GY', 'JSD' ) sim.fit(X)

Cluster matrix

clust = mosaic.Clustering( mode='CPM', # or 'modularity ) clust.fit(sim.matrix_)

clusters = clust.clusters_ clusterdX = clust.matrix ... ```

Owner

  • Name: Biomolecular Dynamics
  • Login: moldyn
  • Kind: organization
  • Location: Freiburg

Group of Prof. Dr. G. Stock, University of Freiburg

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - given-names: "Daniel"
    family-names: "Nagel"
    orcid: "https://orcid.org/0000-0002-2863-2646"
title: "MoSAIC: Molecular Systems Automated Identification of Cooperativity"
version: 0.2.1
url: "https://moldyn.github.io/MoSAIC"
preferred-citation:
  type: article
  title: "Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins"
  authors:
  - given-names: "Georg"
    family-names: "Diez"
    orcid: "https://orcid.org/0000-0002-4114-1577"
  - given-names: "Daniel"
    family-names: "Nagel"
    orcid: "https://orcid.org/0000-0002-2863-2646"
  - given-names: "Gerhard"
    family-names: "Stock"
    orcid: "https://orcid.org/0000-0002-3302-3044"
  doi: "10.1021/acs.jctc.2c00337"
  journal: "J. Chem. Theory Comput."
  month: 8
  start: 5079
  end: 5088
  issue: 8
  volume: 18
  year: 2022

GitHub Events

Total
  • Watch event: 5
  • Delete event: 1
  • Push event: 2
  • Pull request event: 3
  • Create event: 1
Last Year
  • Watch event: 5
  • Delete event: 1
  • Push event: 2
  • Pull request event: 3
  • Create event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 392
  • Total Committers: 5
  • Avg Commits per committer: 78.4
  • Development Distribution Score (DDS): 0.253
Past Year
  • Commits: 19
  • Committers: 3
  • Avg Commits per committer: 6.333
  • Development Distribution Score (DDS): 0.526
Top Committers
Name Email Commits
braniii d****l@p****m 293
gegabo g****z@g****m 53
gd1022 y****u@e****m 42
dependabot[bot] 4****] 3
moldyn-nagel 4****l 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 10
  • Total pull requests: 22
  • Average time to close issues: 2 months
  • Average time to close pull requests: 8 days
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 2.1
  • Average comments per pull request: 0.68
  • Merged pull requests: 21
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 2
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 3.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • braniii (5)
  • gegabo (2)
  • mmfarrugia (1)
  • austinweigle (1)
  • EleSpine (1)
Pull Request Authors
  • braniii (10)
  • gegabo (9)
  • dependabot[bot] (7)
Top Labels
Issue Labels
bug (1)
Pull Request Labels
dependencies (7) github_actions (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 20 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 12
  • Total maintainers: 1
pypi.org: mosaic-clustering

Correlation based feature selection for MD data

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 20 Last month
Rankings
Dependent packages count: 10.1%
Stargazers count: 13.7%
Forks count: 19.2%
Average: 19.4%
Dependent repos count: 21.6%
Downloads: 32.3%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: mosaic-clustering

MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise. Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Average: 48.4%
Stargazers count: 50.9%
Dependent packages count: 51.2%
Forks count: 57.4%
Last synced: 6 months ago

Dependencies

extra-requirements.txt pypi
  • flake8 *
  • pdoc3 *
  • pytest *
  • pytest-cov *
  • pytest-rerunfailures *
  • umap-learn *
requirements.txt pypi
  • beartype >=0.8.1
  • click >=7.0.0
  • leidenalg >=0.8.0
  • numpy >=1.21.0
  • pandas *
  • prettypyplot *
  • scikit-learn *
  • scipy *
  • typing_extensions >=3.9.0
setup.py pypi
  • beartype >=0.10.4
  • click >=7.0.0
  • igraph *
  • leidenalg >=0.8.0
  • numpy >=1.21.0
  • pandas *
  • prettypyplot *
  • scikit-learn *
  • scikit-learn-extra *
  • scipy *
  • typing_extensions >=3.9.0
  • umap-learn *
.github/workflows/pages.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/pytest.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • codecov/codecov-action v2 composite
.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite