mnnpy

An implementation of MNN (Mutual Nearest Neighbors) correct in python.

https://github.com/chriscainx/mnnpy

Science Score: 20.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: nature.com
✓
Committers with academic emails
1 of 6 committers (16.7%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.7%) to scientific vocabulary

Keywords

batch-effects bioinformatics-tool biological-data-analysis mnn-correct mutual-nearest-neighbor single-cell-analysis

Keywords from Contributors

anndata bioinformatics scanpy scverse transcriptomics visualize-data

Last synced: 6 months ago · JSON representation

Repository

An implementation of MNN (Mutual Nearest Neighbors) correct in python.

Basic Info

Host: GitHub
Owner: chriscainx
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage:
Size: 1.05 MB

Statistics

Stars: 74
Watchers: 5
Forks: 33
Open Issues: 45
Releases: 0

Topics

batch-effects bioinformatics-tool biological-data-analysis mnn-correct mutual-nearest-neighbor single-cell-analysis

Created almost 8 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

mnnpy - MNN-correct in python!

An implementation of MNN correct in python featuring low memory usage, full multicore support and compatibility with the scanpy framework.

Batch effect correction by matching mutual nearest neighbors (Haghverdi et al, 2018) has been implemented as a function 'mnnCorrect' in the R package scran. Sadly it's extremely slow for big datasets and doesn't make full use of the parallel architecture of modern CPUs.

This project is a python implementation of the MNN correct algorithm which takes advantage of python's extendability and hackability. It seamlessly integrates with the scanpy framework and has multicore support in its bones.

Status

ver 0.1.9.5: corrected a bug with cos_norm. Thank @LisaSikkema
in the latest version of SCRAN, the default value of sigma is now 0.1. We are still keeping the 1.0 in mnnpy due to some reports that 1.0 is better. You are recommended to try out this new setting and find what’s best for your data. Thank @yueqiw
ver 0.1.9.4: added a fix for sparse matrix normalization. Thank @zacharylau10
ver 0.1.9.3: fixed potential multiprocessing bug. set mnnpy.settings.normalization = 'seq' to perform normalization sequentially and avoid strange multiprocessing problems. Thank @LucasESBS and @julien-kann
ver 0.1.9.2: fixed an error in subtractbiospan. Thank @dylkot
ver 0.1.9: replacing python's notorious multiprocessing with openmp-based multithreading. Reduced huge memory overhead, saving both (a lot of) time and (a lot of) memory. PLEASE UPDATE TO THIS VERSION
ver 0.1.8: changed multiprocess chunck size for significant speedup
ver 0.1.7: rewrote adjustshiftvariance in Cython for performance gain.
TODO: optimize compute_correction

Please help me test it and file in any issues.

JIT speed up from numba is currently used.

Consider using Intel Math Kernel Library-enabled python environment or Intel Python Distribution for better performance on Intel CPUs.

Further speed up with C++/Cython/Tensorflow/CUDA is on our bucket list and being developed!

Speed

Finishes correcting ~50000 cells/19 batches * ~30000 genes in ~12h on a 16 core 32GB mem server.

Highlights

High CPU utilization
Low mem consumption
Extendable/hackable
Compatible with scanpy
Full verbosity

Install

Mnnpy is available on PyPI. You can install with pip install mnnpy.

If you want the developing version, do: git clone https://github.com/chriscainx/mnnpy.git cd mnnpy pip install .

Note: It is possible that simply running pip install . or pip install mnnpy may not work to install mnnpy, depending on which C compiler is being used on the system. If this occurs, and you are running a Linux/MacOS system, run this instead: brew install gcc@8 export CC=/usr/local/Cellar/gcc@8/8.4.0/bin/gcc-8 pip install mnnpy unset CC or if you'd like the developing version, do: brew install gcc@8 export CC=/usr/local/Cellar/gcc@8/8.4.0/bin/gcc-8 git clone https://github.com/chriscainx/mnnpy.git cd mnnpy pip install . unset CC

Usage

Mnnpy takes matrices or AnnData objects. For example: ```python import scanpy.api as sc import mnnpy

sample1 = sc.read("Sample1.h5ad") sample2 = sc.read("Sample2.h5ad") sample3 = sc.read("Sample3.h5ad") hvgs = loadfromfile("Some HVGs.csv") corrected = mnnpy.mnncorrect(sample1, sample2, sample3, varsubset=hvgs, batch_categories = ["N0123X", "N0124X", "T0124X"]) adata = corrected[0] ```

Running Scanpy 1.0.4 on 2018-04-25 13:39.
Dependencies: anndata==0.5.10 numpy==1.14.0 scipy==1.0.0 pandas==0.22.0+0.ga00154d.dirty scikit-learn==0.19.1 statsmodels==0.8.0
Performing cosine normalization...
Starting MNN correct iteration. Reference batch: 0
Step 1 of 2: processing batch 1
--Looking for MNNs...
--Computing correction vectors...
--Adjusting variance...
--Applying correction...
Step 2 of 2: processing batch 2
--Looking for MNNs...
--Computing correction vectors...
--Adjusting variance...
--Applying correction...
MNN correction complete. Gathering output...
Packing AnnData object...
Done.
python adata AnnData object with nobs × nvars = 20270 × 33694
obs: 'ngenes', 'percentmito', 'ncounts', 'Sample', 'Donor', 'Tissue', 'batch'
var: 'geneids-0', 'ncells-0', 'geneids-1', 'ncells-1', 'geneids-2', 'n_cells-2'
python bdata = adata[:, hvgs] sc.pp.scale(bdata) sc.pp.neighbors(bdata) sc.tl.umap(bdata) sc.pl.umap(bdata, color='Sample')

For further information do help(mnnpy.mnn_correct) in python to read the function docstring, or wait for me to write the docs...

Best practice

It is recommended to pass log-transformed matrices/AnnData objects to mnn_correct, and use HVGs instead of all the genes.

Screenshot

Credits

Algorithm is from Laleh Haghverdi and Aaron T. L. Lun.

.irlb is copied from airysen/irlbpy, couldn't figure out how to make it a symbolic link from the submodule though😂

Owner

Name: Chris Kang
Login: chriscainx
Kind: user
Location: Beijing

Repositories: 3
Profile: https://github.com/chriscainx

GitHub Events

Total

Issues event: 1
Watch event: 2
Issue comment event: 2

Last Year

Issues event: 1
Watch event: 2
Issue comment event: 2

Committers

Last synced: over 2 years ago

All Time

Total Commits: 93
Total Committers: 6
Avg Commits per committer: 15.5
Development Distribution Score (DDS): 0.054

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Chris Kang	k**s@g**m	88
Jonathan Manning	j**g@e**k	1
Chris Kang	c**s@C**l	1
Scott Gigante	8****i	1
L Sikkema	3****a	1
Tamjeed Azad	2****d	1

Committer Domains (Top 20 + Academic)

ebi.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 47
Total pull requests: 10
Average time to close issues: 6 months
Average time to close pull requests: 9 months
Total issue authors: 41
Total pull request authors: 9
Average comments per issue: 2.06
Average comments per pull request: 0.3
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

scottgigante (2)
wangjiawen2013 (2)
jayypaul (2)
dylkot (2)
stashkov (2)
brainfo (2)
Zifeng-L (1)
ivonyao (1)
veghp (1)
aheravi (1)
yupingz (1)
julien-kann (1)
alexlenail (1)
dawe (1)
ktpolanski (1)

Pull Request Authors

chriscainx (2)
pinin4fjords (1)
AlexandreHutton (1)
LisaSikkema (1)
scottgigante-immunai (1)
tamjazad (1)
jkobject (1)
scottgigante (1)
HelloWorldLTY (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 238 last-month

Total dependent packages: 4
Total dependent repositories: 7
Total versions: 13
Total maintainers: 1

pypi.org: mnnpy

Mutual nearest neighbors correction in python.

Homepage: http://github.com/chriscainx/mnnpy
Documentation: https://mnnpy.readthedocs.io/
License: BSD 3
Latest release: 0.1.9
published almost 8 years ago

Versions: 13
Dependent Packages: 4
Dependent Repositories: 7
Downloads: 238 Last month
Docker Downloads: 0

Rankings

Dependent packages count: 2.3%

Docker downloads count: 4.2%

Dependent repos count: 5.6%

Average: 7.1%

Forks count: 7.1%

Stargazers count: 8.3%

Downloads: 15.3%

Maintainers (1)

chriscainx

Last synced: 6 months ago

Dependencies

requirements.txt pypi

anndata *
numba *
numpy *
pandas *
scipy *

mnnpy

Science Score: 20.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

mnnpy - MNN-correct in python!

Status

Speed

Highlights

Install

Usage

Best practice

Screenshot

Credits

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mnnpy

Rankings

Maintainers (1)

Dependencies