mica

Mutual Information-based Non-linear Clustering Analysis

https://github.com/jyyulab/mica

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

clustering-analysis dimensionality-reduction graph-embedding mutual-information single-cell-rna-seq
Last synced: 6 months ago · JSON representation

Repository

Mutual Information-based Non-linear Clustering Analysis

Basic Info
  • Host: GitHub
  • Owner: jyyulab
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 497 MB
Statistics
  • Stars: 2
  • Watchers: 6
  • Forks: 3
  • Open Issues: 4
  • Releases: 4
Topics
clustering-analysis dimensionality-reduction graph-embedding mutual-information single-cell-rna-seq
Created about 7 years ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

MICA

MICA is a clustering tool for single-cell RNA-seq data. MICA takes a preprocessed gene expression matrix as input and efficiently cluster the cells. MICA consists of the following main components: 1. Mutual information estimation for cell-cell distance quantification 2. Dimension reduction on the non-linear mutual information-based distance space 3. Consensus clustering on dimension-reduced spaces 4. Clustering visualization and cell type annotation

MICA workflow:

Prerequisites

Installation

Using conda to create a virtual environment

(Not available until this line is removed)

The recommended method of setting up the required Python environment and dependencies is to use the conda dependency manager: conda create -n mica100 python=3.9.2 # Create a python virtual environment source activate mica100 # Activate the virtual environment pip install MICA # Install MICA and its dependencies

Install from source

conda create -n mica100 python=3.9.2 # Create a python virtual environment source activate mica100 # Activate the virtual environment git clone https://github.com/jyyulab/MICA # Clone the repo cd MICA # Switch to the MICA root directory pip install . # Install MICA from source mica -h # Check if mica works correctly

Usage

MICA workflow has two built-in dimension reduction methods. The auto mode (mica or mica auto) selects a dimension reduction method automatically based on the cell count of the preprocessed matrix. Users can select graph embedding method (mica ge) or MDS (mica mds) or Louvain (mica louvain) method manually using the subcommand ge or mds or louvain respectively. ``` $ mica -h usage: mica [-h] {auto,ge,mds} ...

MICA - Mutual Information-based Clustering Analysis tool.

optional arguments: -h, --help show this help message and exit

subcommands: {auto,ge,mds} versions auto automatic version ge graph embedding version mds MDS version louvain simple louvain version Usemica ge -h,mica mds -h, andmica louvain -h``` to check helps with subcommands.

Inputs

*Recently Changed** The main input for MICA is tab-separated genes/proteins by cells/samples (rows are genes/proteins) expression matrix or an anndata file after preprocessing.

Outputs

After the completion of the pipeline, mica will generate the following outputs: * Clustering results plot with clustering label mapped to each cluster * Clustering results txt file with visualization coordinates and clustering label

Examples

Running MICA auto mode

MICA auto mode reduces the dimensionality using either the multidimensional scaling method (<= 5,000 cells) or the graph embedding method (> 5,000 cells), where the number of cells cutoff was chosen based on performance evaluation of datasets of various sizes.

mica auto -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn pbmc3k -nc 10

Running MICA GE mode

MICA GE mode reduces the dimensionality using the graph embedding method. It sweeps a range of resolutions of the Louvain clustering algorithm. -ar parameter sets the upper bound of the range.

mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -maxr 4.0 -ss 1

The default setting is to build the MI distance-based graph with the K-nearest-neighbors algorithm, and the number of the neighbors can be set with -nnm. Another way to build the graph is to run approximate-nearest-neighbors(ann) based on the Hierarchical Navigable Small World(HNSW) algorithm. Set -nnt(knn or ann) to enable nn type selection.

Here are 2 main hyperparameters in ann, ef(-annef) and m(-annm). Suggested setting: * default * Really fast m = 4, ef = 200 * Accurate m = 16, ef = 800

Optimize these 2 parameters to make them work on your case, to make the ge mode both fast and robust. Please increase ef when -nnm is increased.

mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -nnt ann -annm 8 -annef 400 -maxr 4.0 -ss 1

To set the number of neighbors in the graph for Louvain clustering, please set -nne

Running MICA MDS mode

MICA MDS mode reduces the dimensionality using the multidimensional scaling method. It includes both Kmeans clustering and louvain clustering. To run KMeans mode please set -nck, to run louvain graph clustering, please set -nn or as default. -pn parameter sets the project name; -nck specifies the numbers of clusters (k in k-mean clustering algorithm); -dd is the number of dimensions used in performing k-mean clusterings in the dimension reduced matrix.

mica mds -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn PBMC3k -nck 8

Running MICA Louvain mode

MICA Louvain mode reduces the dimension without MI distance estimate via PCA or MDS directly, and then the Louvain clustering will be executed. To set the dimension-reduction method, please set -dm (PCA or MDS)

'mica louvain -i ./testdata/inputs/10x/PBMC/3k/pre-processed/pbmc3kpreprocessed.h5ad -o ./test_data/outputs -dm PCA'

Some sharing parameters

-dd: Number of dimensions to reduce to

-cldis: distance in Louvain clustering, euclidean/cosine

-bpr: set the power index of the bin size for MI, 3 -> bins=(features)**(1/3)

-bsz: set the bin size for MI

-sil: run silhouette analysis for louvain clustering

-nw: num of workers

Some common issues for St Jude HPC users

  1. Try to install MICA 1.0.0 but when checking, it’s still old version:

Solve: make sure to create a new folder and change direction into there before you start Step 1, if you’ve have a like ‘MICA’ at the current direction.

```bash mkdir MICA100 cd MICA100

start installation

```

  1. Illegal instruction(core dumped) or other problem when importing mihnsw

Often because of the conda version or gcc version

try:

```bash module load conda3/202402 module load gcc/10.2.0

others may also works like gcc 9.5.0

```

About scMINER

https://jyyulab.github.io/scMINER/site/

Reference

hnswlib: https://github.com/nmslib/hnswlib. The author of MICA adds a 'mutual-info-distance' to the space of hnswlib.

scMINER paper: https://www.nature.com/articles/s41467-025-59620-6

Owner

  • Name: Yu Laboratory @ St. Jude
  • Login: jyyulab
  • Kind: organization
  • Location: Memphis, TN

Yu Lab in the Department of Computational Biology at St. Jude Children's Research Hospital

GitHub Events

Total
  • Watch event: 1
  • Push event: 15
  • Pull request review event: 1
  • Pull request event: 6
  • Fork event: 1
  • Create event: 5
Last Year
  • Watch event: 1
  • Push event: 15
  • Pull request review event: 1
  • Pull request event: 6
  • Fork event: 1
  • Create event: 5

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 195
  • Total Committers: 7
  • Avg Commits per committer: 27.857
  • Development Distribution Score (DDS): 0.159
Past Year
  • Commits: 6
  • Committers: 3
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Liang Ding a****g@g****m 164
Jimmy Veloso j****0@y****r 12
Tracy Qian c****n@s****g 12
Arihant Jain j****r@g****m 4
Ding l****g@l****l 1
Arihant Jain 4****4 1
Koon-Kiu Yan q****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 30
  • Average time to close issues: about 1 year
  • Average time to close pull requests: 2 months
  • Total issue authors: 2
  • Total pull request authors: 6
  • Average comments per issue: 0.8
  • Average comments per pull request: 0.47
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 0
  • Pull requests: 9
  • Average time to close issues: N/A
  • Average time to close pull requests: 8 days
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.11
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • adamdingliang (4)
  • jiangzh-coder (1)
Pull Request Authors
  • ZebYulon (27)
  • dependabot[bot] (10)
  • arihantjain4 (3)
  • QingfeiPan (2)
  • jimmyv9 (1)
  • qianchenxi1109 (1)
Top Labels
Issue Labels
enhancement (2) bug (1)
Pull Request Labels
dependencies (10)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 8 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 2
  • Total maintainers: 1
pypi.org: mica

Mutual Information-based Clustering Analysis tool designed for scRNA-seq data

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 8 Last month
Rankings
Dependent packages count: 10.0%
Dependent repos count: 21.7%
Forks count: 22.6%
Average: 27.5%
Stargazers count: 31.9%
Downloads: 51.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • anndata ==0.7.5
  • cwltool ==3.0.20210124104916
  • fast_histogram ==0.9
  • gensim ==4.1.2
  • h5py ==3.2.1
  • llvmlite ==0.36.0
  • matplotlib ==3.4.0rc3
  • numba ==0.53.1
  • numpy ==1.20.1
  • pandas ==1.2.3
  • pecanpy ==1.0.1
  • pynndescent ==0.5.2
  • python-louvain ==0.15
  • rdflib_jsonld ==0.5.0
  • scanpy ==1.7.1
  • scikit-learn ==0.24.1
  • scipy ==1.6.1
  • setuptools ==57.5.0
  • tables ==3.6.1
  • torch ==1.12.1
  • torch-geometric ==2.2.0
  • umap-learn ==0.5.1
setup.py pypi
  • anndata ==0.7.5
  • cwltool ==3.0.20210124104916
  • fast_histogram ==0.9
  • gensim ==4.1.2
  • h5py ==3.2.1
  • llvmlite ==0.36.0
  • matplotlib ==3.4.0rc3
  • numba ==0.53.1
  • numpy ==1.20.1
  • pandas ==1.2.3
  • pecanpy ==1.0.1
  • pynndescent ==0.5.2
  • python-louvain ==0.15
  • rdflib_jsonld ==0.5.0
  • scanpy ==1.7.1
  • scikit-learn ==0.24.1
  • scipy ==1.6.1
  • setuptools ==57.5.0
  • tables ==3.6.1
  • umap-learn ==0.5.1
MICA/setup.py pypi
  • anndata >=0.7.5,
  • fast_histogram ==0.9
  • h5py >=3.2.1,
  • mihnsw ==0.8.6
  • networkx >=2.5.0,
  • numba ==0.53.1
  • numpy >=1.20.1,<=1.21.6
  • pecanpy ==1.0.1
  • python-louvain ==0.15
  • scanpy ==1.7.1
  • scipy >=1.6.1,<=1.7.3
mihnsw/python_bindings/setup.py pypi
mihnsw/setup.py pypi
  • numpy *