https://github.com/canzarlab/sphetcher

Sketching single-cell transcriptomic heterogeneity

https://github.com/canzarlab/sphetcher

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Sketching single-cell transcriptomic heterogeneity

Basic Info
  • Host: GitHub
  • Owner: canzarlab
  • Language: C++
  • Default Branch: master
  • Homepage:
  • Size: 7.62 MB
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 6 years ago · Last pushed over 3 years ago
Metadata Files
Readme

README.md

Sphetcher

A software package for sampling massive singe-cell RNA secquencing datasets based on the spherical thresholding algorithm. It selects a small subset of cells referred to as sketch that evenly cover the transcriptomic space occupied by the original dataset. Such a sketch can accelerate downstream analyses and highlight rare cell types.

Installation

A compiler that supports C++11 is needed to build sphetcher. You can download and compile the latest code from github as follows:

git clone https://github.com/canzarlab/Sphetcher cd src make

Running Sphetcher

To begin

First you need to provide a matrix where rows are samples (cells) and columns are features (genes, transcripts, principal components PCs).

Additionally you can provide the prior information (e.g, cell label, collection time point) in case you want to preserve certain number of samples from each category.

An example of inputs is provided in the directory /data.

Usage

Once you have compiled Sphetcher it can be run easily with one of the following two options:

sphetcher expression_matrix.csv sketch_size sketch_indicator_output.csv or sphetcher expression_matrix.csv sketch_size class_labels.csv l_min sketch_indicator_output.csv For an example provided in /data sphetcher zeisel_pca.csv 1000 sketch_indicator_output.csv or sphetcher zeisel_pca.csv 1000 zeisel_pca_labels.csv 3 sketch_indicator_output.csv

Input/Output formats

Input:

expression_matrix.csv : expression matrix in comma-separated values (CSV) format: rows are cells, columns are features.
sketch_size : number of samples to obtain from the data set.
class_labels.csv : prior information stored in a column vector, each class is presented by an integer between 1 and K, where K is the number of classes.
l_min : minimum number of representatives we want to sample from each class

Output:

sketch_indicator_output.csv : an indicator vector of n samples where 1 indicates the sample is in the sketch, 0 otherwise (there are sketch_size 1's in the vector).

Sketches of large single cell datasets

Adult mouse brain cells from Saunders et al. (2018) (665,858 cells)

6646 cells (1%) 26573 cells (4%) 66649 cells (10%)

Mouse organogenesis cell atlas (MOCA) from Cao et al. (2019) (2,026,641 cells)

20296 cells (1%) 81093 cells (4%)

Mouse nervous system from Zeisel et al. (2018) (465,281 cells) ####

4643 cells (1%) 18577 cells (4%) 46450 cells (10%)

Owner

  • Login: canzarlab
  • Kind: user

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1