harmonypy

🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.

https://github.com/slowkow/harmonypy

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • â—‹
    .zenodo.json file
  • â—‹
    DOI references
  • ✓
    Academic publication links
    Links to: zenodo.org
  • ✓
    Committers with academic emails
    2 of 4 committers (50.0%) from academic institutions
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

bioinformatics data-integration data-science single-cell-analysis
Last synced: 6 months ago · JSON representation

Repository

🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.

Basic Info
Statistics
  • Stars: 224
  • Watchers: 5
  • Forks: 24
  • Open Issues: 3
  • Releases: 6
Topics
bioinformatics data-integration data-science single-cell-analysis
Created about 6 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License

README.md

harmonypy

Latest PyPI Version PyPI Downloads tests DOI

Harmony is an algorithm for integrating multiple high-dimensional datasets.

harmonypy is a port of the harmony R package by Ilya Korsunsky.

Example

This animation shows the Harmony alignment of three single-cell RNA-seq datasets from different donors.

→ How to make this animation.

Installation

This package has been tested with Python 3.7.

Use pip to install:

bash pip install harmonypy

Usage

Here is a brief example using the data that comes with the R package:

```python

Load data

import pandas as pd

metadata = pd.readcsv("data/meta.tsv.gz", sep = "\t") vars_use = ['dataset']

meta_data

cellid dataset nGene percentmito cell_type

0 half_TGAAATTGGTCTAG half 3664 0.017722 jurkat

1 half_GCGATATGCTGATG half 3858 0.029228 t293

2 half_ATTTCTCTCACTAG half 4049 0.015966 jurkat

3 half_CGTAACGACGAGAG half 3443 0.020379 jurkat

4 half_ACGCCTTGTTTACC half 2813 0.024774 t293

.. ... ... ... ... ...

295 t293_TTACGTACGACACT t293 4152 0.033997 t293

296 t293_TAGAATTGTTGGTG t293 3097 0.021769 t293

297 t293_CGGATAACACCACA t293 3157 0.020411 t293

298 t293_GGTACTGAGTCGAT t293 2685 0.027846 t293

299 t293_ACGCTGCTTCTTAC t293 3513 0.021240 t293

datamat = pd.readcsv("data/pcs.tsv.gz", sep = "\t") datamat = np.array(datamat)

data_mat[:5,:5]

array([[ 0.0071695 , -0.00552724, -0.0036281 , -0.00798025, 0.00028931],

[-0.011333 , 0.00022233, -0.00073589, -0.00192452, 0.0032624 ],

[ 0.0091214 , -0.00940727, -0.00106816, -0.0042749 , -0.00029096],

[ 0.00866286, -0.00514987, -0.0008989 , -0.00821785, -0.00126997],

[-0.00953977, 0.00222714, -0.00374373, -0.00028554, 0.00063737]])

meta_data.shape # 300 cells, 5 variables

(300, 5)

data_mat.shape # 300 cells, 20 PCs

(300, 20)

Run Harmony

import harmonypy as hm ho = hm.runharmony(datamat, metadata, varsuse)

Write the adjusted PCs to a new file.

res = pd.DataFrame(ho.Zcorr) res.columns = ['X{}'.format(i + 1) for i in range(res.shape[1])] res.tocsv("data/adj.tsv.gz", sep = "\t", index = False) ```

Owner

  • Name: Kamil Slowikowski
  • Login: slowkow
  • Kind: user
  • Company: Mass General Brigham

Computational biologist. Using transcriptomics to learn about inflammation and cancer.

GitHub Events

Total
  • Issues event: 12
  • Watch event: 23
  • Issue comment event: 9
  • Pull request event: 1
  • Fork event: 5
Last Year
  • Issues event: 12
  • Watch event: 23
  • Issue comment event: 9
  • Pull request event: 1
  • Fork event: 5

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 67
  • Total Committers: 4
  • Avg Commits per committer: 16.75
  • Development Distribution Score (DDS): 0.149
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Kamil Slowikowski k****i@g****m 57
John Arevalo j****o@g****m 6
Bo Li b****8@m****u 3
Jonathan Manning j****g@e****k 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 32
  • Total pull requests: 11
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 21 days
  • Total issue authors: 30
  • Total pull request authors: 7
  • Average comments per issue: 2.84
  • Average comments per pull request: 1.91
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 1
  • Average time to close issues: 21 days
  • Average time to close pull requests: N/A
  • Issue authors: 5
  • Pull request authors: 1
  • Average comments per issue: 0.67
  • Average comments per pull request: 5.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Zhangruiqi111 (2)
  • liboxun (2)
  • shihsama (1)
  • deevdevil88 (1)
  • Arkkienkeli (1)
  • bholmes-datasight (1)
  • FFB-Lab (1)
  • LetiTode (1)
  • Sayyam-Shah (1)
  • dangeles (1)
  • aichander (1)
  • hurleyLi (1)
  • gmoore5 (1)
  • swemeshy (1)
  • HelloWorldLTY (1)
Pull Request Authors
  • johnarevalo (6)
  • rilango (2)
  • pinin4fjords (1)
  • bli25 (1)
  • tariqdaouda (1)
  • hurleyLi (1)
  • milescsmith (1)
Top Labels
Issue Labels
question (3) enhancement (2)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 13,679 last-month
  • Total docker downloads: 1,694
  • Total dependent packages: 20
  • Total dependent repositories: 16
  • Total versions: 9
  • Total maintainers: 1
pypi.org: harmonypy

A data integration algorithm.

  • Versions: 9
  • Dependent Packages: 20
  • Dependent Repositories: 16
  • Downloads: 13,679 Last month
  • Docker Downloads: 1,694
Rankings
Dependent packages count: 0.7%
Docker downloads count: 2.5%
Dependent repos count: 3.6%
Downloads: 4.1%
Average: 4.1%
Stargazers count: 5.7%
Forks count: 8.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • numpy *
  • pandas *
  • scipy *
requirements.txt pypi
  • numpy >=1.13
  • pandas >=0.25
  • scikit-learn >=0.21
  • scipy >=1.2