Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: lijc0804
  • License: agpl-3.0
  • Language: Python
  • Default Branch: master
  • Size: 13.7 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation Security

README.md

Pyro-Velocity

Pyro-Velocity logo | | | | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | CI/CD | [![CI - Test](https://github.com/pinellolab/pyrovelocity/actions/workflows/tests.yml/badge.svg)](https://github.com/pinellolab/pyrovelocity/actions/workflows/tests.yml) [![CML](https://github.com/pinellolab/pyrovelocity/actions/workflows/cml.yml/badge.svg)](https://github.com/pinellolab/pyrovelocity/actions/workflows/cml.yml) [![pre-commit.ci status](https://results.pre-commit.ci/badge/github/pinellolab/pyrovelocity/master.svg)](https://results.pre-commit.ci/latest/github/pinellolab/pyrovelocity/master) | | Docs | [![Documentation Status](https://readthedocs.org/projects/pyrovelocity/badge/?version=latest)](https://pyrovelocity.readthedocs.io/en/latest/?badge=latest) [![Preprint](https://img.shields.io/badge/doi-10.1101/2022.09.12.507691v2-B31B1B)](https://doi.org/10.1101/2022.09.12.507691) | | Package | [![PyPI - Version](https://img.shields.io/pypi/v/pyrovelocity.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/pyrovelocity/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyrovelocity.svg?logo=python&label=Python&logoColor=gold)](https://pypi.org/project/pyrovelocity/) [![Docker iamge](https://img.shields.io/badge/docker-image-blue?logo=docker)](https://github.com/pinellolab/pyrovelocity/pkgs/container/pyrovelocity) | | Meta | [![codecov](https://codecov.io/gh/pinellolab/pyrovelocity/branch/master/graph/badge.svg)](https://codecov.io/gh/pinellolab/pyrovelocity) [![code style - Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License - MIT](https://img.shields.io/badge/license-AGPL%203-purple)](https://spdx.org/licenses/) [![Tuple](https://img.shields.io/badge/Tuple%20❤️%20OSS-5A67D8?logo=tuple)](https://tuple.app/github-badge) |

Pyro-Velocity is a Bayesian, generative, and multivariate RNA velocity model to estimate uncertainty in predictions of future cell states from minimal models approximating splicing dynamics. This approach models raw sequencing counts with synchronized cell time across all expressed genes to provide quantifiable information on cell fate choice and developmental trajectory dynamics.

Features

  • Probabilistic modeling of RNA velocity
  • Direct modeling of raw spliced and unspliced read count
  • Multiple uncertainty diagnostics analysis and visualizations
  • Synchronized cell time estimation across genes
  • Multivariate denoised gene expression and velocity prediction

Velocity workflow comparison

Installation

Development

We currently support installation and usage in a linux 64-bit development environment with access to a GPU. An IaC setup that works with GCP is documented in reproducibility/environment/README.md. Before proceeding to setup a minimal development environment, please fork this repository and clone a copy of your fork to your development machine. Unless otherwise mentioned, all commands assume your current working directory is the root of your local copy of your fork of this repository.

Please install mambaforge according to the instructions provided in conda-forge/miniforge.

You can then create a development environment with

bash mamba env create [--prefix /path_to_conda_environment] -f conda/environment-gpu.yml

This step takes about 10 minutes depending on network speed.

Make sure you are able to successfully activate the installed environment

bash conda activate pyrovelocity-gpu which python

by checking that the output of which python following activation refers to the python binary in the correct conda environment.

You can then install a development copy with

pip install --no-deps -e .

If this is successful, you will be able to

python import pyrovelocity

from a python interpreter.

User

Please see the documentation.

Quick start

After the installation, let\'s look at your dataset to see how Pyro-Velocity can help understand cell dynamics.

Starting from raw sequencing FASTQ files, obtained for example with SMART-seq, 10X genomics, inDrop or other similar single-cell assays, you can preprocess the data to generate spliced and unspliced gene count tables in h5ad file (or loom file using cellranger+velocyto or the kallisto pipeline.

Starting from these count tables we show below a minimal step-by-step workflow to illustrate the main features of Pyro-Velocity in a Jupyter Notebook:

Step 0. Even though pyrovelocity is a cluster-free method to evaluate uncertainty of cell fate, it will dependend on 2 dimensional embedding results for evaluation of uncertainty and generate visualization, we would suggest run your new datasets using scanpy tutorial.

Step 1. Load your data, load your data(e.g. localfile.h5ad_) with scvelo by using:

python import scvelo as scv adata = scv.read("local_file.h5ad")

Step 2. Minimally preprocess your adata object:

python adata.layers['raw_spliced'] = adata.layers['spliced'] adata.layers['raw_unspliced'] = adata.layers['unspliced'] adata.obs['u_lib_size_raw'] = adata.layers['raw_unspliced'].toarray().sum(-1) adata.obs['s_lib_size_raw'] = adata.layers['raw_spliced'].toarray().sum(-1) scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000) scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

Step 3. Train the Pyro-Velocity model:

```python from pyrovelocity.api import trainmodel from pyrovelocity.plot import vectorfielduncertainty numepochs = 1000 # large data

num_epochs = 4000 # small data

Model 1

adatamodelpos = trainmodel(adata, maxepochs=numepochs, svitrain=True, logevery=100, patientinit=45, batchsize=4000, usegpu=0, cellstate='stateinfo', includeprior=True, offset=False, librarysize=True, patientimprove=1e-3, modeltype='auto', guidetype='autot0constraint', trainsize=1.0)

Or Model 2

adatamodelpos = trainmodel(adata, maxepochs=numepochs, svitrain=True, logevery=100, patientinit=45, batchsize=4000, usegpu=0, cellstate='stateinfo', includeprior=True, offset=True, librarysize=True, patientimprove=1e-3, modeltype='auto', guidetype='auto', trainsize=1.0)

adatamodelpos is a returned list in which 0th element is the trained model,

the 1st element is the posterior samples of all random variables

trainedmodel = adatamodelpos[0] posteriorsamples = adatamodelpos[1] vmapall, embedsradian, fdri = vectorfielduncertainty( adata, posteriorsamples=posterior_samples, basis="umap" )

saveres = True if saveres: trainedmodel.save('savedmodel', overwrite=True) resultdict = {"adatamodelpos": posteriorsamples, "vmapall": vmapall, "embedsradian": embedsradian, "fdri": fdri} #, "embedmean": embedmean} import pickle with open("posteriorsamples.pkl", "wb") as f: pickle.dump(resultdict, f) ```

Step 4: Generate Pyro-Velocity's vector field and shared time plots with uncertainty estimation.

```python from pyrovelocity.plot import plotstateuncertainty from pyrovelocity.plot import plotposteriortime, plotgeneranking,\ vectorfielduncertainty, plotvectorfielduncertain,\ plotmeanvectorfield, projectgridpoints,rainbowplot,denoisedumap,\ usrainbowplot, plotarrowexamples, set_colorbar

import numpy as np import matplotlib.pyplot as plt import seaborn as sns

embedding = 'umap' # change to umap or tsne based on your embedding method

This generates the posterior samples of all vector fields

and statistical testing results from Rayleigh test

vmapall, embedsradian, fdri = vectorfielduncertainty(adata, posteriorsamples, basis=embedding, denoised=False, n_jobs=30) fig, ax = plt.subplots()

This returns the posterior mean of the vector field

embedmean = plotmeanvectorfield(posteriorsamples, adata, ax=ax, njobs=30, basis=embedding)

This plot single-cell level vector field uncertainty

and averaged cell vector field uncertainty on the grid points

based on angular standard deviation

fig, ax = plt.subplots(1, 2) fig.setsizeinches(11.5, 5) plotvectorfielduncertain(adata, embedmean, embeds_radian, ax=ax, fig=fig, cbar=False, basis=embedding, scale=None)

This generates shared time uncertainty plot with contour lines

fig, ax = plt.subplots(1, 3) fig.setsizeinches(12, 2.8) adata.obs['sharedtimeuncertain'] = posteriorsamples['celltime'].std(0).flatten() axcb = scv.pl.scatter(adata, c='sharedtimeuncertain', ax=ax[0], show=False, cmap='inferno', fontsize=7, s=20, colorbar=True, basis=embedding) select = adata.obs['sharedtimeuncertain'] > np.quantile(adata.obs['sharedtime_uncertain'], 0.9) sns.kdeplot(adata.obsm[f'X_{embedding}'][:, 0][select], adata.obsm[f'X_{embedding}'][:, 1][select], ax=ax[0], levels=3, fill=False)

This generates vector field uncertainty based on Rayleigh test.

adata.obs.loc[:, 'vectorfieldrayleightest'] = fdri im = ax[1].scatter(adata.obsm[f'X_{embedding}'][:, 0], adata.obsm[f'X_{embedding}'][:, 1], s=3, alpha=0.9, c=adata.obs['vectorfieldrayleightest'], cmap='infernor', linewidth=0) setcolorbar(im, ax[1], labelsize=5, fig=fig, position='right') select = adata.obs['vectorfieldrayleightest'] > np.quantile(adata.obs['vectorfieldrayleightest'], 0.95) sns.kdeplot(adata.obsm[f'X{embedding}'][:, 0][select], adata.obsm[f'X_{embedding}'][:, 1][select], ax=ax[1], levels=3, fill=False) ax[1].axis('off') ax[1].settitle("vector field\nrayleigh test\nfdr<0.05: %s%%" % (round((fdri < 0.05).sum()/fdri.shape[0], 2)*100), fontsize=7) ```

Step 5: Prioritize putative cell fate marker genes based on negative mean absolute errors and pearson correlation between denoised spliced expression and posterior mean shared time, and then visualize the top one with rainbow plots

```python fig = plt.figure(figsize=(7.07, 4.5)) subfig = fig.subfigures(1, 2, wspace=0.0, hspace=0, width_ratios=[1.6, 4]) ax = fig.subplots(1)

This generates the selected cell fate markers and output in DataFrame

volcanodata, _ = plotgeneranking([posteriorsamples], [adata], ax=ax, timecorrelationwith='st', assemble=True)

This generates the rainbow plots for the selected markers.

_ = rainbowplot(volcanodata, adata, posteriorsamples, subfig[1], data=['st', 'ut'], num_genes=4) ```

Illustrative examples of Pyro-Velocity analyses on different single-cell datasets

Pyro-Velocity applied to a PBMC dataset

See the data referred to here.

This is a scRNA-seq dataset of fully mature peripheral blood mononuclear cells (PBMC) generated using the 10X genomics kit and containing 65,877 cells with 11 fully differentiated immune cell types. This dataset doesn\'t contain stem and progenitor cells or other signature of and undergoing dynamical differentiation, thus no consistent velocity flow should be detected.

Below we show the main output generated by Pyro-Velocity Model 1 analysis. Pyro-Velocity failed to detect high-confidence trajectories in the mature blood cell states, consistent with what is known about the biology underlying these cells.

Vector field with uncertainty

PBMC vector field uncertainty

These 6 plots from left to right show: 1. cell types, 2. stream plot of Pyro-velocity vector field based on the posterior mean of 30 posterior samples, 3. single cell vector field examples showing all 30 posterior samples as vectors for 3 arbitrarily selected cells; 4. single cell vector field with uncertainty based on angular standard deviation across 30 posterior samples, 5. averaged vector field uncertainty from 4. 6. Rayleigh test of posterior samples vector field, the title shows the expected false discovery rate using a 5% threshold.

The full example can be reproduced using the PBMC Jupyter notebook.

Pyro-Velocity applied to a pancreas development dataset

See the data referred to here.

Here we apply Pyro-Velocity to a single cell RNA-seq dataset of mouse pancreas in the E15.5 embryo developmental stage. This dataset was generated using the 10X genomics kit and contains 3,696 cells with 8 cell types including progenitor cells, intermediate and terminal cell states.

Below we show the main output generated by Pyro-Velocity Model 1 analysis. Pyro-Velocity was able to define well-known developmental cell hierarchies identifying cell trajectories originating from ductal progenitor cells and culminated in the production of mature Alpha, Beta, Delta, and Epsilon cells.

Vector field with uncertainty

Pancreas vector field uncertainty

These 6 plots from left to right are showing the same analyses presented as in the example above.

Shared time with uncertainty

Pancreas vector field uncertainty

The left figure shows the average of 30 posterior samples for the cell shared time, the title of the figure shows the Spearman\'s correlation with the Cytotrace score, an orthogonal state-of-the-art method used to predict cell differentiation based on the number of expressed genes per cell (Gulati et. al, Science 2020). The right figure shows the standard deviation across posterior samples of shared time.

Gene selection and visualization

To uncover potential cell fate determinant markers genes of the mouse pancreas, we first select the top 300 genes with the best velocity model fit (we use negative mean absolute error ), then we rank the filtered genes using Pearson\'s correlation between denoised spliced expression and the posterior mean of the recovered shared time across cells.

Pancreas Volcano plot for gene selection

For the selected genes, it is possible to explore in depth their dynamic, using phase portraits, rainbow plots, and UMAP rendering of denoised splicing gene expression across cells.

Pancreas vector field uncertainty

The full example can be reproduced using the Pancreas jupyter notebook.

Pyro-Velocity applied to the LARRY dataset

See the data referred to here.

This last example, present the analysis of a recent scRNA-seq dataset profiling mouse hematopoises at high resolution thanks to lineage relationship information captured by the Lineage And RNA RecoverY (LARRY) system. LARRY leverages unique lentiviral barcodes that enables to clonally trace cell fates over time (Weinrab et al. Cell 20).

Below we show the main output generated by Pyro-Velocity analysis.

Vector field with uncertainty

LARRY vector field uncertainty

These 5 plots from left to right shows: 1) Cell types, 2) Clone progression vector field by using centroid of cells belonging to the same barcode for generating directed connection between consecutive physical times, 3) single cell vector field with uncertainty based on angular standard deviation across 30 posterior samples, 4. averaged vector field uncertainty from 3. 5. Rayleigh test of posterior samples vector field, the title shows the false discovery rate using threshold 5%.

Shared time with uncertainty

To quantitatively assess the quality of the the receovered shared time we also considered the agreement of our method with Cospar, a state-of-the-art method specifically designed for predicting fate potency based on LARRY data.

Pancreas shared time uncertainty

The leftmost figure shows the Cospar fate potency score, the middle figure shows the average of 30 posterior samples from Pyro-Velocity shared time per cell, the title of the figure shows the Spearman's correlation between cell latent shared time and fate potency scores derived from Cospar, the right figure shows the standard deviation across posterior samples of shared time.

The full example can be reproduced using the LARRY jupyter notebook.

Troubleshooting

If you are having an issue using pyrovelocity, please feel free to start a discussion or file an issue containing a MRE.

Also see contributing.

Owner

  • Login: lijc0804
  • Kind: user
  • Company: Shanghai Jiao Tong University

Citation (CITATION.bib)

@ARTICLE{Qin2022-ls,
  title    = "{Pyro-Velocity}: Probabilistic {RNA} Velocity inference from
              single-cell data",
  author   = "Qin, Qian and Bingham, Eli and La Manno, Gioele and Langenau,
              David M and Pinello, Luca",
  journal  = "bioRxiv",
  pages    = "2022.09.12.507691",
  month    =  Sep,
  year     =  2022,
  language = "en",
  doi      = "10.1101/2022.09.12.507691"
}

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels