harmonypy
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
Science Score: 33.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
â—‹.zenodo.json file
-
â—‹DOI references
-
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
2 of 4 committers (50.0%) from academic institutions -
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
Basic Info
- Host: GitHub
- Owner: slowkow
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Homepage: https://portals.broadinstitute.org/harmony/
- Size: 2.77 MB
Statistics
- Stars: 224
- Watchers: 5
- Forks: 24
- Open Issues: 3
- Releases: 6
Topics
Metadata Files
README.md
harmonypy
Harmony is an algorithm for integrating multiple high-dimensional datasets.
harmonypy is a port of the harmony R package by Ilya Korsunsky.
Example
This animation shows the Harmony alignment of three single-cell RNA-seq datasets from different donors.
→ How to make this animation.
Installation
This package has been tested with Python 3.7.
Use pip to install:
bash
pip install harmonypy
Usage
Here is a brief example using the data that comes with the R package:
```python
Load data
import pandas as pd
metadata = pd.readcsv("data/meta.tsv.gz", sep = "\t") vars_use = ['dataset']
meta_data
cellid dataset nGene percentmito cell_type
0 half_TGAAATTGGTCTAG half 3664 0.017722 jurkat
1 half_GCGATATGCTGATG half 3858 0.029228 t293
2 half_ATTTCTCTCACTAG half 4049 0.015966 jurkat
3 half_CGTAACGACGAGAG half 3443 0.020379 jurkat
4 half_ACGCCTTGTTTACC half 2813 0.024774 t293
.. ... ... ... ... ...
295 t293_TTACGTACGACACT t293 4152 0.033997 t293
296 t293_TAGAATTGTTGGTG t293 3097 0.021769 t293
297 t293_CGGATAACACCACA t293 3157 0.020411 t293
298 t293_GGTACTGAGTCGAT t293 2685 0.027846 t293
299 t293_ACGCTGCTTCTTAC t293 3513 0.021240 t293
datamat = pd.readcsv("data/pcs.tsv.gz", sep = "\t") datamat = np.array(datamat)
data_mat[:5,:5]
array([[ 0.0071695 , -0.00552724, -0.0036281 , -0.00798025, 0.00028931],
[-0.011333 , 0.00022233, -0.00073589, -0.00192452, 0.0032624 ],
[ 0.0091214 , -0.00940727, -0.00106816, -0.0042749 , -0.00029096],
[ 0.00866286, -0.00514987, -0.0008989 , -0.00821785, -0.00126997],
[-0.00953977, 0.00222714, -0.00374373, -0.00028554, 0.00063737]])
meta_data.shape # 300 cells, 5 variables
(300, 5)
data_mat.shape # 300 cells, 20 PCs
(300, 20)
Run Harmony
import harmonypy as hm ho = hm.runharmony(datamat, metadata, varsuse)
Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Zcorr) res.columns = ['X{}'.format(i + 1) for i in range(res.shape[1])] res.tocsv("data/adj.tsv.gz", sep = "\t", index = False) ```
Owner
- Name: Kamil Slowikowski
- Login: slowkow
- Kind: user
- Company: Mass General Brigham
- Website: https://slowkow.com
- Twitter: slowkow
- Repositories: 22
- Profile: https://github.com/slowkow
Computational biologist. Using transcriptomics to learn about inflammation and cancer.
GitHub Events
Total
- Issues event: 12
- Watch event: 23
- Issue comment event: 9
- Pull request event: 1
- Fork event: 5
Last Year
- Issues event: 12
- Watch event: 23
- Issue comment event: 9
- Pull request event: 1
- Fork event: 5
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Kamil Slowikowski | k****i@g****m | 57 |
| John Arevalo | j****o@g****m | 6 |
| Bo Li | b****8@m****u | 3 |
| Jonathan Manning | j****g@e****k | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 32
- Total pull requests: 11
- Average time to close issues: about 1 month
- Average time to close pull requests: 21 days
- Total issue authors: 30
- Total pull request authors: 7
- Average comments per issue: 2.84
- Average comments per pull request: 1.91
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 1
- Average time to close issues: 21 days
- Average time to close pull requests: N/A
- Issue authors: 5
- Pull request authors: 1
- Average comments per issue: 0.67
- Average comments per pull request: 5.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Zhangruiqi111 (2)
- liboxun (2)
- shihsama (1)
- deevdevil88 (1)
- Arkkienkeli (1)
- bholmes-datasight (1)
- FFB-Lab (1)
- LetiTode (1)
- Sayyam-Shah (1)
- dangeles (1)
- aichander (1)
- hurleyLi (1)
- gmoore5 (1)
- swemeshy (1)
- HelloWorldLTY (1)
Pull Request Authors
- johnarevalo (6)
- rilango (2)
- pinin4fjords (1)
- bli25 (1)
- tariqdaouda (1)
- hurleyLi (1)
- milescsmith (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 13,679 last-month
- Total docker downloads: 1,694
- Total dependent packages: 20
- Total dependent repositories: 16
- Total versions: 9
- Total maintainers: 1
pypi.org: harmonypy
A data integration algorithm.
- Homepage: https://github.com/slowkow/harmonypy
- Documentation: https://harmonypy.readthedocs.io/
- License: GNU General Public License v3 or later (GPLv3+)
-
Latest release: 0.0.10
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- numpy *
- pandas *
- scipy *
- numpy >=1.13
- pandas >=0.25
- scikit-learn >=0.21
- scipy >=1.2