mosaic-clustering
Correlation-based feature selection of Molecular Dynamics simulations
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, acs.org -
○Committers with academic emails
-
✓Institutional organization owner
Organization moldyn has institutional domain (www.moldyn.uni-freiburg.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Keywords from Contributors
Repository
Correlation-based feature selection of Molecular Dynamics simulations
Basic Info
- Host: GitHub
- Owner: moldyn
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://moldyn.github.io/MoSAIC/
- Size: 43.1 MB
Statistics
- Stars: 29
- Watchers: 2
- Forks: 2
- Open Issues: 4
- Releases: 2
Metadata Files
README.md
Molecular Systems Automated Identification of Cooperativity
MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise. Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.
The method was published in:
Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins
G. Diez, D. Nagel, and G. Stock,
J. Chem. Theory Comput. 2022 18 (8), 5079-5088,
doi: 10.1021/acs.jctc.2c00337
If you use this software package, please cite the above mentioned paper.
Features
- Intuitive usage via module and via CI
- Sklearn-style API for fast integration into your Python workflow
- No magic, only a single parameter which can be optimized via cross-validation
- Extensive documentation and detailed discussion in publication
- Step by step tutorial to follow
Installation
The package is called mosaic-clustering and is available via PyPI or conda. To install it, simply call:
bash
python3 -m pip install --upgrade mosaic-clustering
or
conda install -c conda-forge mosaic-clustering
or for the latest dev version ```bash
via ssh key
python3 -m pip install git+ssh://git@github.com/moldyn/MoSAIC.git
or via password-based login
python3 -m pip install git+https://github.com/moldyn/MoSAIC.git ```
In case one wants to use the deprecated UMAPSimilarity or the module mosaic umap one needs to specify the extras_require='umap', so
bash
python3 -m pip install --upgrade moldyn-mosaic[umap]
Shell Completion
Using the bash, zsh or fish shell click provides an easy way to provide shell completion, checkout the docs.
In the case of bash you need to add following line to your ~/.bashrc
bash
eval "$(_MOSAIC_COMPLETE=bash_source mosaic)"
Usage
In general one can call the module directly by its entry point $ MoSAIC or by calling the module $ python -m mosaic. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.
CI - Usage Directly from the Command Line
The module brings a rich CI using click. Each module and submodule contains a detailed help, which can be accessed by ```bash $ python -m mosaic Usage: python -m mosaic [OPTIONS] COMMAND [ARGS]...
MoSAIC motion v0.4.1
Molecular systems automated identification of collective motion, is a correlation based feature selection framework for MD data. Copyright (c) 2021-2023, Georg Diez and Daniel Nagel
Options: --help Show this message and exit.
Commands: clustering Clustering similarity matrix of coordinates. similarity Creating similarity matrix of coordinates. tui Open Textual TUI for interactive usage. ``` For more details on the submodule one needs to specify one of the two commands, or by opening the terminal user interface (tui).
A simple workflow example for clustering the input file input_file using
correlation and Leiden with CPM and the default resolution parameter:
```bash
creating correlation matrix
$ python -m mosaic similarity -i inputfile -o outputsimilarity --metric correlation -v
MoSAIC SIMILARITY ~~~ Initialize similarity class ~~~ Load file inputfile ~~~ Fit input ~~~ Store similarity matrix in outputsimilarity
clustering with CPM and default resolution parameter
the latter needs to be fine-tuned to each matrix
$ python -m mosaic clustering -i outputsimilarity -o outputclustering --plot -v
MoSAIC CLUSTERING
~~~ Initialize clustering class
~~~ Load file outputsimilarity
~~~ Fit input
~~~ Store output
~~~ Plot matrix
``
This will generate the similarity matrix stored inoutputsimilarity,
the plotted result inoutputclustering.matrix.pdf, the raw data of
the matrix inoutputclustering.matrix` and a file containing in each
row the indices of a cluster.
Module - Inside a Python Script
```python import mosaic
Load file
X is np.ndarray of shape (nsamples, nfeatures)
sim = mosaic.Similarity( metric='correlation', # or 'NMI', 'GY', 'JSD' ) sim.fit(X)
Cluster matrix
clust = mosaic.Clustering( mode='CPM', # or 'modularity ) clust.fit(sim.matrix_)
clusters = clust.clusters_ clusterdX = clust.matrix ... ```
Owner
- Name: Biomolecular Dynamics
- Login: moldyn
- Kind: organization
- Location: Freiburg
- Website: http://www.moldyn.uni-freiburg.de/index.html
- Twitter: MolDynFR
- Repositories: 7
- Profile: https://github.com/moldyn
Group of Prof. Dr. G. Stock, University of Freiburg
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- given-names: "Daniel"
family-names: "Nagel"
orcid: "https://orcid.org/0000-0002-2863-2646"
title: "MoSAIC: Molecular Systems Automated Identification of Cooperativity"
version: 0.2.1
url: "https://moldyn.github.io/MoSAIC"
preferred-citation:
type: article
title: "Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins"
authors:
- given-names: "Georg"
family-names: "Diez"
orcid: "https://orcid.org/0000-0002-4114-1577"
- given-names: "Daniel"
family-names: "Nagel"
orcid: "https://orcid.org/0000-0002-2863-2646"
- given-names: "Gerhard"
family-names: "Stock"
orcid: "https://orcid.org/0000-0002-3302-3044"
doi: "10.1021/acs.jctc.2c00337"
journal: "J. Chem. Theory Comput."
month: 8
start: 5079
end: 5088
issue: 8
volume: 18
year: 2022
GitHub Events
Total
- Watch event: 5
- Delete event: 1
- Push event: 2
- Pull request event: 3
- Create event: 1
Last Year
- Watch event: 5
- Delete event: 1
- Push event: 2
- Pull request event: 3
- Create event: 1
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| braniii | d****l@p****m | 293 |
| gegabo | g****z@g****m | 53 |
| gd1022 | y****u@e****m | 42 |
| dependabot[bot] | 4****] | 3 |
| moldyn-nagel | 4****l | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 10
- Total pull requests: 22
- Average time to close issues: 2 months
- Average time to close pull requests: 8 days
- Total issue authors: 5
- Total pull request authors: 3
- Average comments per issue: 2.1
- Average comments per pull request: 0.68
- Merged pull requests: 21
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 2
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 3.5
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 3
Top Authors
Issue Authors
- braniii (5)
- gegabo (2)
- mmfarrugia (1)
- austinweigle (1)
- EleSpine (1)
Pull Request Authors
- braniii (10)
- gegabo (9)
- dependabot[bot] (7)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 20 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 12
- Total maintainers: 1
pypi.org: mosaic-clustering
Correlation based feature selection for MD data
- Homepage: https://github.com/moldyn/feature_selection
- Documentation: https://moldyn.github.io/MoSAIC
- License: BSD 3-Clause License
-
Latest release: 0.4.1
published almost 3 years ago
Rankings
Maintainers (1)
conda-forge.org: mosaic-clustering
MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise. Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.
- Homepage: https://github.com/moldyn/MoSAIC
- License: MIT
-
Latest release: 0.3.2
published over 3 years ago
Rankings
Dependencies
- flake8 *
- pdoc3 *
- pytest *
- pytest-cov *
- pytest-rerunfailures *
- umap-learn *
- beartype >=0.8.1
- click >=7.0.0
- leidenalg >=0.8.0
- numpy >=1.21.0
- pandas *
- prettypyplot *
- scikit-learn *
- scipy *
- typing_extensions >=3.9.0
- beartype >=0.10.4
- click >=7.0.0
- igraph *
- leidenalg >=0.8.0
- numpy >=1.21.0
- pandas *
- prettypyplot *
- scikit-learn *
- scikit-learn-extra *
- scipy *
- typing_extensions >=3.9.0
- umap-learn *
- actions/checkout v2 composite
- actions/setup-python v2 composite
- peaceiris/actions-gh-pages v3 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- codecov/codecov-action v2 composite
- actions/checkout v3 composite
- github/codeql-action/analyze v2 composite
- github/codeql-action/autobuild v2 composite
- github/codeql-action/init v2 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish release/v1 composite