https://github.com/canzarlab/sphetcher
Sketching single-cell transcriptomic heterogeneity
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
Sketching single-cell transcriptomic heterogeneity
Basic Info
Statistics
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Sphetcher
A software package for sampling massive singe-cell RNA secquencing datasets based on the spherical thresholding algorithm. It selects a small subset of cells referred to as sketch that evenly cover the transcriptomic space occupied by the original dataset. Such a sketch can accelerate downstream analyses and highlight rare cell types.

Installation
A compiler that supports C++11 is needed to build sphetcher. You can download and compile the latest code from github as follows:
git clone https://github.com/canzarlab/Sphetcher
cd src
make
Running Sphetcher
To begin
First you need to provide a matrix where rows are samples (cells) and columns are features (genes, transcripts, principal components PCs).
Additionally you can provide the prior information (e.g, cell label, collection time point) in case you want to preserve certain number of samples from each category.
An example of inputs is provided in the directory /data.
Usage
Once you have compiled Sphetcher it can be run easily with one of the following two options:
sphetcher expression_matrix.csv sketch_size sketch_indicator_output.csv
or
sphetcher expression_matrix.csv sketch_size class_labels.csv l_min sketch_indicator_output.csv
For an example provided in /data
sphetcher zeisel_pca.csv 1000 sketch_indicator_output.csv
or
sphetcher zeisel_pca.csv 1000 zeisel_pca_labels.csv 3 sketch_indicator_output.csv
Input/Output formats
Input:
expression_matrix.csv
: expression matrix in comma-separated values (CSV) format: rows are cells, columns are features.
sketch_size
: number of samples to obtain from the data set.
class_labels.csv
: prior information stored in a column vector, each class is presented by an integer between 1 and K, where K is the number of classes.
l_min
: minimum number of representatives we want to sample from each class
Output:
sketch_indicator_output.csv : an indicator vector of n samples where 1 indicates the sample is in the sketch, 0 otherwise (there are sketch_size 1's in the vector).
Sketches of large single cell datasets
Adult mouse brain cells from Saunders et al. (2018) (665,858 cells)
6646 cells (1%)
26573 cells (4%)
66649 cells (10%)
Mouse organogenesis cell atlas (MOCA) from Cao et al. (2019) (2,026,641 cells)
20296 cells (1%)
81093 cells (4%)
Mouse nervous system from Zeisel et al. (2018) (465,281 cells) ####
Owner
- Login: canzarlab
- Kind: user
- Repositories: 20
- Profile: https://github.com/canzarlab
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1