Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: nature.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Keywords
Repository
Mutual Information-based Non-linear Clustering Analysis
Basic Info
Statistics
- Stars: 2
- Watchers: 6
- Forks: 3
- Open Issues: 4
- Releases: 4
Topics
Metadata Files
README.md
MICA
MICA is a clustering tool for single-cell RNA-seq data. MICA takes a preprocessed gene expression matrix as input and efficiently cluster the cells. MICA consists of the following main components: 1. Mutual information estimation for cell-cell distance quantification 2. Dimension reduction on the non-linear mutual information-based distance space 3. Consensus clustering on dimension-reduced spaces 4. Clustering visualization and cell type annotation
MICA workflow:

Prerequisites
- python>=3.7.6, <=3.9.2 (developed and tested on python 3.7.6, tested on python3.9.2)
- See requirements.txt file for other dependencies
Installation
Using conda to create a virtual environment
(Not available until this line is removed)
The recommended method of setting up the required Python environment and dependencies
is to use the conda dependency manager:
conda create -n mica100 python=3.9.2 # Create a python virtual environment
source activate mica100 # Activate the virtual environment
pip install MICA # Install MICA and its dependencies
Install from source
conda create -n mica100 python=3.9.2 # Create a python virtual environment
source activate mica100 # Activate the virtual environment
git clone https://github.com/jyyulab/MICA # Clone the repo
cd MICA # Switch to the MICA root directory
pip install . # Install MICA from source
mica -h # Check if mica works correctly
Usage
MICA workflow has two built-in dimension reduction methods. The auto mode (mica or mica auto)
selects a dimension reduction method automatically based on the cell count of the preprocessed matrix.
Users can select graph embedding method (mica ge) or MDS (mica mds) or Louvain (mica louvain) method manually using the subcommand
ge or mds or louvain respectively.
```
$ mica -h
usage: mica [-h] {auto,ge,mds} ...
MICA - Mutual Information-based Clustering Analysis tool.
optional arguments: -h, --help show this help message and exit
subcommands:
{auto,ge,mds} versions
auto automatic version
ge graph embedding version
mds MDS version
louvain simple louvain version
Usemica ge -h,mica mds -h, andmica louvain -h``` to check helps with subcommands.
Inputs
*Recently Changed** The main input for MICA is tab-separated genes/proteins by cells/samples (rows are genes/proteins) expression matrix or an anndata file after preprocessing.
Outputs
After the completion of the pipeline, mica will generate the following outputs:
* Clustering results plot with clustering label mapped to each cluster
* Clustering results txt file with visualization coordinates and clustering label
Examples
Running MICA auto mode
MICA auto mode reduces the dimensionality using either the multidimensional scaling method (<= 5,000 cells) or the graph embedding method (> 5,000 cells), where the number of cells cutoff was chosen based on performance evaluation of datasets of various sizes.
mica auto -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn pbmc3k -nc 10
Running MICA GE mode
MICA GE mode reduces the dimensionality using the graph embedding method. It sweeps a range of resolutions
of the Louvain clustering algorithm. -ar parameter sets the upper bound of the range.
mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs
-maxr 4.0 -ss 1
The default setting is to build the MI distance-based graph with the K-nearest-neighbors algorithm, and the number of the neighbors can be set with -nnm. Another way to build the graph is to run approximate-nearest-neighbors(ann) based on the Hierarchical Navigable Small World(HNSW) algorithm. Set -nnt(knn or ann) to enable nn type selection.
Here are 2 main hyperparameters in ann, ef(-annef) and m(-annm). Suggested setting:
* default
* Really fast m = 4, ef = 200
* Accurate m = 16, ef = 800
Optimize these 2 parameters to make them work on your case, to make the ge mode both fast and robust. Please increase ef when -nnm is increased.
mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs
-nnt ann -annm 8 -annef 400 -maxr 4.0 -ss 1
To set the number of neighbors in the graph for Louvain clustering, please set -nne
Running MICA MDS mode
MICA MDS mode reduces the dimensionality using the multidimensional scaling method. It includes both Kmeans clustering and louvain clustering.
To run KMeans mode please set -nck, to run louvain graph clustering, please set -nn or as default.
-pn parameter sets the
project name; -nck specifies the numbers of clusters (k in k-mean clustering algorithm); -dd is the
number of dimensions used in performing k-mean clusterings in the dimension reduced matrix.
mica mds -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o
./test_data/outputs -pn PBMC3k -nck 8
Running MICA Louvain mode
MICA Louvain mode reduces the dimension without MI distance estimate via PCA or MDS directly, and then the Louvain clustering will be executed.
To set the dimension-reduction method, please set -dm (PCA or MDS)
'mica louvain -i ./testdata/inputs/10x/PBMC/3k/pre-processed/pbmc3kpreprocessed.h5ad -o ./test_data/outputs -dm PCA'
Some sharing parameters
-dd: Number of dimensions to reduce to
-cldis: distance in Louvain clustering, euclidean/cosine
-bpr: set the power index of the bin size for MI, 3 -> bins=(features)**(1/3)
-bsz: set the bin size for MI
-sil: run silhouette analysis for louvain clustering
-nw: num of workers
Some common issues for St Jude HPC users
- Try to install MICA 1.0.0 but when checking, it’s still old version:
Solve: make sure to create a new folder and change direction into there before you start Step 1, if you’ve have a like ‘MICA’ at the current direction.
```bash mkdir MICA100 cd MICA100
start installation
```
- Illegal instruction(core dumped) or other problem when importing mihnsw
Often because of the conda version or gcc version
try:
```bash module load conda3/202402 module load gcc/10.2.0
others may also works like gcc 9.5.0
```
About scMINER
https://jyyulab.github.io/scMINER/site/
Reference
hnswlib: https://github.com/nmslib/hnswlib. The author of MICA adds a 'mutual-info-distance' to the space of hnswlib.
scMINER paper: https://www.nature.com/articles/s41467-025-59620-6
Owner
- Name: Yu Laboratory @ St. Jude
- Login: jyyulab
- Kind: organization
- Location: Memphis, TN
- Website: https://www.stjude.org/research/labs/yu-lab.html
- Twitter: jiyang_yu
- Repositories: 11
- Profile: https://github.com/jyyulab
Yu Lab in the Department of Computational Biology at St. Jude Children's Research Hospital
GitHub Events
Total
- Watch event: 1
- Push event: 15
- Pull request review event: 1
- Pull request event: 6
- Fork event: 1
- Create event: 5
Last Year
- Watch event: 1
- Push event: 15
- Pull request review event: 1
- Pull request event: 6
- Fork event: 1
- Create event: 5
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Liang Ding | a****g@g****m | 164 |
| Jimmy Veloso | j****0@y****r | 12 |
| Tracy Qian | c****n@s****g | 12 |
| Arihant Jain | j****r@g****m | 4 |
| Ding | l****g@l****l | 1 |
| Arihant Jain | 4****4 | 1 |
| Koon-Kiu Yan | q****n | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 5
- Total pull requests: 30
- Average time to close issues: about 1 year
- Average time to close pull requests: 2 months
- Total issue authors: 2
- Total pull request authors: 6
- Average comments per issue: 0.8
- Average comments per pull request: 0.47
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 6
Past Year
- Issues: 0
- Pull requests: 9
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.11
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- adamdingliang (4)
- jiangzh-coder (1)
Pull Request Authors
- ZebYulon (27)
- dependabot[bot] (10)
- arihantjain4 (3)
- QingfeiPan (2)
- jimmyv9 (1)
- qianchenxi1109 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 8 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 2
- Total maintainers: 1
pypi.org: mica
Mutual Information-based Clustering Analysis tool designed for scRNA-seq data
- Homepage: https://github.com/jyyulab/MICA
- Documentation: https://mica.readthedocs.io/
- License: See LICENSE.md
-
Latest release: 0.2.2
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- anndata ==0.7.5
- cwltool ==3.0.20210124104916
- fast_histogram ==0.9
- gensim ==4.1.2
- h5py ==3.2.1
- llvmlite ==0.36.0
- matplotlib ==3.4.0rc3
- numba ==0.53.1
- numpy ==1.20.1
- pandas ==1.2.3
- pecanpy ==1.0.1
- pynndescent ==0.5.2
- python-louvain ==0.15
- rdflib_jsonld ==0.5.0
- scanpy ==1.7.1
- scikit-learn ==0.24.1
- scipy ==1.6.1
- setuptools ==57.5.0
- tables ==3.6.1
- torch ==1.12.1
- torch-geometric ==2.2.0
- umap-learn ==0.5.1
- anndata ==0.7.5
- cwltool ==3.0.20210124104916
- fast_histogram ==0.9
- gensim ==4.1.2
- h5py ==3.2.1
- llvmlite ==0.36.0
- matplotlib ==3.4.0rc3
- numba ==0.53.1
- numpy ==1.20.1
- pandas ==1.2.3
- pecanpy ==1.0.1
- pynndescent ==0.5.2
- python-louvain ==0.15
- rdflib_jsonld ==0.5.0
- scanpy ==1.7.1
- scikit-learn ==0.24.1
- scipy ==1.6.1
- setuptools ==57.5.0
- tables ==3.6.1
- umap-learn ==0.5.1
- anndata >=0.7.5,
- fast_histogram ==0.9
- h5py >=3.2.1,
- mihnsw ==0.8.6
- networkx >=2.5.0,
- numba ==0.53.1
- numpy >=1.20.1,<=1.21.6
- pecanpy ==1.0.1
- python-louvain ==0.15
- scanpy ==1.7.1
- scipy >=1.6.1,<=1.7.3
- numpy *