cube

Intuitive Nonparametric Gene Network Search Algorithm

https://github.com/connerlambden/cube

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords

gene-network gene-relationships network-analysis scanpy single-cell-rna-seq
Last synced: 6 months ago · JSON representation ·

Repository

Intuitive Nonparametric Gene Network Search Algorithm

Basic Info
  • Host: GitHub
  • Owner: connerlambden
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 9.72 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
gene-network gene-relationships network-analysis scanpy single-cell-rna-seq
Created over 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme Citation

README.md

Cubé: Intuitive Gene Network Search Algorithm

Cubé


How It Works

Given a single-cell dataset and an input gene(s), Cubé looks for simple & nonlinear gene-gene relationships to construct a regulation network informed by prior gene signatures. For example, Cubé might give you the result that GeneA * GeneB ~= GeneC, potentially meaning that genes A & B coregulate to produce C, or there is some other nonlinear relationship. Cubé then recursively feeds outputs back into itself to great a gene network.

Cubé


Install

$ python3 -m pip install git+https://github.com/connerlambden/Cube.git


Running Cubé

``` from sccube import cube import scanpy as sc adata = sc.readh5ad('myexpressiondata.h5ad') # Load AnnData Object containing logged expression matrix gofiles = ['BioPlanet2019.tsv', 'GeneSigDB.tsv'] # Load Gene Signatures to Search In

cube.runcube(adata=adata, seedgene1='ifng', seedgene2='tbx21', gofiles=gofiles, outdirectory='CubéResults', numsearchchildren=4, searchdepth=2) ```

Example Outputs


Inputs

adata: AnnData Object with logged expression matrix

seedgene1: Starting search gene of interest

seedgene2: Optional: Additional seed gene of interest (to search for seedgene1 * seedgene2)

go_files: List of Pathway files to search in. Each edge in Cubé requires all connected genes to be present in at least 2 pathways. Examples To Download or Download More From Enrichr

out_directory: Folder to put results in

numsearchchildren: How many search children to add to the network on each iteration. For example, a value of 2 will add two children to each node.

search_depth: Recursive search depth. Values above 2 may take a long time to run


Outputs

Cubédatatable.csv: Table showing the genes, pathways, and weight for each edge in the network. Positive correlations will have small edge weights and negative correlations will have large edge weights.

*.graphml file. Network file that can be visualized in programs like Cytoscape

Cubé_network.png: Network visualization where green edges are positive correlation & red edges are negative correlation. For better visualizations, we recommend loading the .graphml file into Cytoscape


Visualizing The Product of 2 Genes Using Scanpy

``` import numpy as np

Visualizing Product of 2 Genes using Scanpy (assuming adata.X is logged and sparse)

gene1 = 'ifng' gene2 = 'tbx21' adataexpressingboth = adata[(adata[:,gene1].X.toarray().flatten() > 0) & (adata[:,gene2].X.toarray().flatten() > 0),:] adataexpressingboth.obs[gene1 + ' * ' + gene2] = np.exp(adataexpressingboth[:,gene1].X.toarray() + adataexpressingboth[:,gene2].X.toarray()) sc.pl.umap(adataexpressingboth, color=[gene1 + ' * ' + gene2]) ```


Why Cubé?

Cubé

Single-cell RNA sequencing has allowed for unprecedented resolution into the transcriptome of single cells, however the sheer complexity of the data and high rates of dropout have posed interpretive and computational challenges to create biological meanings and gene relationships. Many methods have been proposed for inferring gene regulatory networks, leading to sometimes dramatic differences depending upon the initial assumptions made 😬. Even in the case of unsupervised learning (UMAP) or clustering (Leiden), it’s not clear how to balance local/global structure or what data features are most important. Additionally, these “black-box” machine learning methods are closed to scrutiny of their inner workings and cannot explicate logical, understandable steps and tend to be fragile to model parameters. Cubé addresses the dropout issue by only comparing sets of genes together in cells that have nonzero expression in all cells. This removes the need for biased imputation methods and focuses each relationship to relevant cells. Cubé addresses the interpretability problem by presenting solutions in the form of expression(gene1) ~= expression(gene2) * expression(gene3) which succinctly express nonlinear relationships between specific genes in an understandable way without any pesky parameters. Since Cubé samples from the space of all possible nonlinear gene-gene pairs, results have high representational capacity and low ambiguity. Cubé is a descriptive search algorithm that optimizes for biologically & statistically informed gene patterns.


How It Works Under The Hood

Cubé



Special Thanks to Vijay Kuchroo, Ana Anderson, Lloyd Bod, & Aviv Regev

Contact: conner@connerpro.com

Owner

  • Login: connerlambden
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Lambden
    given-names: Conner
    orcid:  https://orcid.org/0000-0003-0162-6622
title: "Cubé Intuitive Nonparametric Gene Network Search Algorithm"
version: 1.0.1
date-released: 2020-06-10

GitHub Events

Total
Last Year

Dependencies

setup.py pypi
  • jit *
  • networkx *
  • numpy *
  • pandas *
  • scipy *