mica

Mutual Information-based Non-linear Clustering Analysis

https://github.com/jyyulab/mica

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: nature.com
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Keywords

clustering-analysis dimensionality-reduction graph-embedding mutual-information single-cell-rna-seq

Last synced: 6 months ago · JSON representation

Repository

Mutual Information-based Non-linear Clustering Analysis

Basic Info

Host: GitHub
Owner: jyyulab
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 497 MB

Statistics

Stars: 2
Watchers: 6
Forks: 3
Open Issues: 4
Releases: 4

Topics

clustering-analysis dimensionality-reduction graph-embedding mutual-information single-cell-rna-seq

Created about 7 years ago · Last pushed 8 months ago

Metadata Files

Readme License

MICA

MICA is a clustering tool for single-cell RNA-seq data. MICA takes a preprocessed gene expression matrix as input and efficiently cluster the cells. MICA consists of the following main components: 1. Mutual information estimation for cell-cell distance quantification 2. Dimension reduction on the non-linear mutual information-based distance space 3. Consensus clustering on dimension-reduced spaces 4. Clustering visualization and cell type annotation

MICA workflow:

Prerequisites

python>=3.7.6, <=3.9.2 (developed and tested on python 3.7.6, tested on python3.9.2)
- See requirements.txt file for other dependencies

Installation

Using conda to create a virtual environment

(Not available until this line is removed)

The recommended method of setting up the required Python environment and dependencies is to use the conda dependency manager: conda create -n mica100 python=3.9.2 # Create a python virtual environment source activate mica100 # Activate the virtual environment pip install MICA # Install MICA and its dependencies

Install from source

conda create -n mica100 python=3.9.2 # Create a python virtual environment source activate mica100 # Activate the virtual environment git clone https://github.com/jyyulab/MICA # Clone the repo cd MICA # Switch to the MICA root directory pip install . # Install MICA from source mica -h # Check if mica works correctly

Usage

MICA workflow has two built-in dimension reduction methods. The auto mode (mica or mica auto) selects a dimension reduction method automatically based on the cell count of the preprocessed matrix. Users can select graph embedding method (mica ge) or MDS (mica mds) or Louvain (mica louvain) method manually using the subcommand ge or mds or louvain respectively. ``` $ mica -h usage: mica [-h] {auto,ge,mds} ...

MICA - Mutual Information-based Clustering Analysis tool.

optional arguments: -h, --help show this help message and exit

subcommands: {auto,ge,mds} versions auto automatic version ge graph embedding version mds MDS version louvain simple louvain version Usemica ge -h,mica mds -h, andmica louvain -h``` to check helps with subcommands.

Inputs

*Recently Changed** The main input for MICA is tab-separated genes/proteins by cells/samples (rows are genes/proteins) expression matrix or an anndata file after preprocessing.

Outputs

After the completion of the pipeline, mica will generate the following outputs: * Clustering results plot with clustering label mapped to each cluster * Clustering results txt file with visualization coordinates and clustering label

Examples

Running MICA auto mode

MICA auto mode reduces the dimensionality using either the multidimensional scaling method (<= 5,000 cells) or the graph embedding method (> 5,000 cells), where the number of cells cutoff was chosen based on performance evaluation of datasets of various sizes.

mica auto -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn pbmc3k -nc 10

Running MICA GE mode

MICA GE mode reduces the dimensionality using the graph embedding method. It sweeps a range of resolutions of the Louvain clustering algorithm. -ar parameter sets the upper bound of the range.

mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -maxr 4.0 -ss 1

The default setting is to build the MI distance-based graph with the K-nearest-neighbors algorithm, and the number of the neighbors can be set with -nnm. Another way to build the graph is to run approximate-nearest-neighbors(ann) based on the Hierarchical Navigable Small World(HNSW) algorithm. Set -nnt(knn or ann) to enable nn type selection.

Here are 2 main hyperparameters in ann, ef(-annef) and m(-annm). Suggested setting: * default * Really fast m = 4, ef = 200 * Accurate m = 16, ef = 800

Optimize these 2 parameters to make them work on your case, to make the ge mode both fast and robust. Please increase ef when -nnm is increased.

mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -nnt ann -annm 8 -annef 400 -maxr 4.0 -ss 1

To set the number of neighbors in the graph for Louvain clustering, please set -nne

Running MICA MDS mode

MICA MDS mode reduces the dimensionality using the multidimensional scaling method. It includes both Kmeans clustering and louvain clustering. To run KMeans mode please set -nck, to run louvain graph clustering, please set -nn or as default. -pn parameter sets the project name; -nck specifies the numbers of clusters (k in k-mean clustering algorithm); -dd is the number of dimensions used in performing k-mean clusterings in the dimension reduced matrix.

mica mds -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn PBMC3k -nck 8

Running MICA Louvain mode

MICA Louvain mode reduces the dimension without MI distance estimate via PCA or MDS directly, and then the Louvain clustering will be executed. To set the dimension-reduction method, please set -dm (PCA or MDS)

'mica louvain -i ./testdata/inputs/10x/PBMC/3k/pre-processed/pbmc3kpreprocessed.h5ad -o ./test_data/outputs -dm PCA'

Some sharing parameters

-dd: Number of dimensions to reduce to

-cldis: distance in Louvain clustering, euclidean/cosine

-bpr: set the power index of the bin size for MI, 3 -> bins=(features)**(1/3)

-bsz: set the bin size for MI

-sil: run silhouette analysis for louvain clustering

-nw: num of workers

Some common issues for St Jude HPC users

Try to install MICA 1.0.0 but when checking, it’s still old version:

Solve: make sure to create a new folder and change direction into there before you start Step 1, if you’ve have a like ‘MICA’ at the current direction.

```bash mkdir MICA100 cd MICA100

start installation

```

Illegal instruction(core dumped) or other problem when importing mihnsw

Often because of the conda version or gcc version

try:

```bash module load conda3/202402 module load gcc/10.2.0

others may also works like gcc 9.5.0

```

About scMINER

https://jyyulab.github.io/scMINER/site/

Reference

hnswlib: https://github.com/nmslib/hnswlib. The author of MICA adds a 'mutual-info-distance' to the space of hnswlib.

scMINER paper: https://www.nature.com/articles/s41467-025-59620-6

Owner

Name: Yu Laboratory @ St. Jude
Login: jyyulab
Kind: organization
Location: Memphis, TN

Website: https://www.stjude.org/research/labs/yu-lab.html
Twitter: jiyang_yu
Repositories: 11
Profile: https://github.com/jyyulab

Yu Lab in the Department of Computational Biology at St. Jude Children's Research Hospital

GitHub Events

Total

Watch event: 1
Push event: 15
Pull request review event: 1
Pull request event: 6
Fork event: 1
Create event: 5

Last Year

Watch event: 1
Push event: 15
Pull request review event: 1
Pull request event: 6
Fork event: 1
Create event: 5

Committers

Last synced: over 2 years ago

All Time

Total Commits: 195
Total Committers: 7
Avg Commits per committer: 27.857
Development Distribution Score (DDS): 0.159

Past Year

Commits: 6
Committers: 3
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.333

Top Committers

Name	Email	Commits
Liang Ding	a**g@g**m	164
Jimmy Veloso	j**0@y**r	12
Tracy Qian	c**n@s**g	12
Arihant Jain	j**r@g**m	4
Ding	l**g@l**l	1
Arihant Jain	4****4	1
Koon-Kiu Yan	q****n	1

Committer Domains (Top 20 + Academic)

stjude.org: 1 yahoo.com.br: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 30
Average time to close issues: about 1 year
Average time to close pull requests: 2 months
Total issue authors: 2
Total pull request authors: 6
Average comments per issue: 0.8
Average comments per pull request: 0.47
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 6

Past Year

Issues: 0
Pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: 8 days
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.11
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

adamdingliang (4)
jiangzh-coder (1)

Pull Request Authors

ZebYulon (27)
dependabot[bot] (10)
arihantjain4 (3)
QingfeiPan (2)
jimmyv9 (1)
qianchenxi1109 (1)

Top Labels

Issue Labels

enhancement (2) bug (1)

Pull Request Labels

dependencies (10)

Packages

Total packages: 1
Total downloads:
- pypi 8 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 2
Total maintainers: 1

pypi.org: mica

Mutual Information-based Clustering Analysis tool designed for scRNA-seq data

Homepage: https://github.com/jyyulab/MICA
Documentation: https://mica.readthedocs.io/
License: See LICENSE.md
Latest release: 0.2.2
published over 3 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 8 Last month

Rankings

Dependent packages count: 10.0%

Dependent repos count: 21.7%

Forks count: 22.6%

Average: 27.5%

Stargazers count: 31.9%

Downloads: 51.4%

Maintainers (1)

adamdingliang

Last synced: 6 months ago

Dependencies

requirements.txt pypi

anndata ==0.7.5
cwltool ==3.0.20210124104916
fast_histogram ==0.9
gensim ==4.1.2
h5py ==3.2.1
llvmlite ==0.36.0
matplotlib ==3.4.0rc3
numba ==0.53.1
numpy ==1.20.1
pandas ==1.2.3
pecanpy ==1.0.1
pynndescent ==0.5.2
python-louvain ==0.15
rdflib_jsonld ==0.5.0
scanpy ==1.7.1
scikit-learn ==0.24.1
scipy ==1.6.1
setuptools ==57.5.0
tables ==3.6.1
torch ==1.12.1
torch-geometric ==2.2.0
umap-learn ==0.5.1

setup.py pypi

anndata ==0.7.5
cwltool ==3.0.20210124104916
fast_histogram ==0.9
gensim ==4.1.2
h5py ==3.2.1
llvmlite ==0.36.0
matplotlib ==3.4.0rc3
numba ==0.53.1
numpy ==1.20.1
pandas ==1.2.3
pecanpy ==1.0.1
pynndescent ==0.5.2
python-louvain ==0.15
rdflib_jsonld ==0.5.0
scanpy ==1.7.1
scikit-learn ==0.24.1
scipy ==1.6.1
setuptools ==57.5.0
tables ==3.6.1
umap-learn ==0.5.1

MICA/setup.py pypi

anndata >=0.7.5,
fast_histogram ==0.9
h5py >=3.2.1,
mihnsw ==0.8.6
networkx >=2.5.0,
numba ==0.53.1
numpy >=1.20.1,<=1.21.6
pecanpy ==1.0.1
python-louvain ==0.15
scanpy ==1.7.1
scipy >=1.6.1,<=1.7.3

mihnsw/python_bindings/setup.py pypi

mihnsw/setup.py pypi

numpy *

mica

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

MICA

Prerequisites

Installation

Using conda to create a virtual environment

(Not available until this line is removed)

Install from source

Usage

Inputs

Outputs

Examples

Running MICA auto mode

Running MICA GE mode

Running MICA MDS mode

Running MICA Louvain mode

Some sharing parameters

Some common issues for St Jude HPC users

start installation

others may also works like gcc 9.5.0

About scMINER

Reference

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mica

Rankings

Maintainers (1)

Dependencies