pyscenic

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

https://github.com/aertslab/pyscenic

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary

Keywords

gene-regulatory-network single-cell transcription-factors transcriptomics

Keywords from Contributors

genomics interactive serializer cycles packaging network-simulation shellcodes hacking autograding observability
Last synced: 6 months ago · JSON representation

Repository

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

Basic Info
  • Host: GitHub
  • Owner: aertslab
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage: http://scenic.aertslab.org
  • Size: 34.8 MB
Statistics
  • Stars: 533
  • Watchers: 17
  • Forks: 195
  • Open Issues: 230
  • Releases: 47
Topics
gene-regulatory-network single-cell transcription-factors transcriptomics
Created almost 8 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Code of conduct

README.rst

pySCENIC
========

|buildstatus|_ |pypipackage|_ |docstatus|_


pySCENIC is a lightning-fast python implementation of the SCENIC_ pipeline (Single-Cell rEgulatory Network Inference and
Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from
single-cell RNA-seq data.

The pioneering work was done in R and results were published in Nature Methods [1]_.
A new and comprehensive description of this Python implementation of the SCENIC pipeline is available in Nature Protocols [4]_.

pySCENIC can be run on a single desktop machine but easily scales to multi-core clusters to analyze thousands of cells
in no time. The latter is achieved via the dask_ framework for distributed computing [2]_.

**Full documentation** for pySCENIC is available on `Read the Docs `_

----

pySCENIC is part of the SCENIC Suite of tools! 
See the main `SCENIC website `_ for additional information and a full list of tools available.

----


News and releases
-----------------

0.12.1 | 2022-11-21
^^^^^^^^^^^^^^^^^^^

* Add support for running arboreto_with_multiprocessing.py with spawn instead of fork as multiprocessing method.Pool
* Use ravel instead of flatten to avoid unnecessary memory copy in aucell
* Update Docker image file and add separated Docker file for pySCENIC with scanpy.

0.12.0 | 2022-08-16
^^^^^^^^^^^^^^^^^^^

* Only databases in Feather v2 format are supported now (`ctxcore `_ ``>= 0.2``),
  which allow uses recent versions of pyarrow (``>=8.0.0``) instead of very old ones (``<0.17``).
  Databases in the new format can be downloaded from https://resources.aertslab.org/cistarget/databases/
  and end with ``*.genes_vs_motifs.rankings.feather`` or ``*.genes_vs_tracks.rankings.feather``.
* Support clustered motif databases.
* Use custom multiprocessing instead of dask, by default.
* Docker image uses python 3.10 and contains only needed pySCENIC dependencies for CLI usage.
* Remove unneeded scripts and notebooks for unused/deprecated database formats.

0.11.2 | 2021-05-07
^^^^^^^^^^^^^^^^^^^

* Split some core cisTarget functions out into a separate repository, `ctxcore `_. This is now a required package for pySCENIC.

0.11.1 | 2021-02-11
^^^^^^^^^^^^^^^^^^^

* Fix bug in motif url construction (#275)
* Fix for export2loom with sparse dataframe (#278)
* Fix sklearn t-SNE import (#285)
* Updates to Docker image (expose port 8787 for Dask dashboard)

0.11.0 | 2021-02-10
^^^^^^^^^^^^^^^^^^^

**Major features:**

* Updated arboreto_ release (GRN inference step) includes:

  * Support for sparse matrices (using the ``--sparse`` flag in ``pyscenic grn``, or passing a sparse matrix to ``grnboost2``/``genie3``).
  * Fixes to avoid dask metadata mismatch error

* Updated cisTarget:

  * Fix for metadata mismatch in ctx prune2df step
  * Support for databases Apache Parquet format
  * Faster loading from feather databases
  * Bugfix: loading genes from a database (previously missing the last gene name in the database)

* Support for Anndata input and output

* Package updates:

  * Upgrade to newer pandas version
  * Upgrade to newer numba version
  * Upgrade to newer versions of dask, distributed

* Input checks and more descriptive error messages.

  * Check that regulons loaded are not empty.

* Bugfixes:

  * In the regulons output from the cisTarget step, the gene weights were incorrectly assigned to their respective target genes (PR #254).
  * Motif url construction fixed when running ctx without pruning
  * Compression of intermediate files in the CLI steps
  * Handle loom files with non-standard gene/cell attribute names
  * Reformat the genesig gmt input/output
  * Fix AUCell output to loom with non-standard loom attributes


0.10.4 | 2020-11-24
^^^^^^^^^^^^^^^^^^^

* Included new CLI option to add correlation information to the GRN adjacencies file. This can be called with ``pyscenic add_cor``.



See also the extended `Release Notes `_.

Overview
--------

The pipeline has three steps:

1. First transcription factors (TFs) and their target genes, together defining a regulon, are derived using gene inference methods which solely rely on correlations between expression of genes across cells. The arboreto_ package is used for this step.
2. These regulons are refined by pruning targets that do not have an enrichment for a corresponding motif of the TF effectively separating direct from indirect targets based on the presence of cis-regulatory footprints.
3. Finally, the original cells are differentiated and clustered on the activity of these discovered regulons.

The most impactful speed improvement is introduced by the arboreto_ package in step 1. This package provides an alternative to GENIE3 [3]_ called GRNBoost2. This package can be controlled from within pySCENIC.


All the functionality of the original R implementation is available and in addition:

1. You can leverage multi-core and multi-node clusters using dask_ and its distributed_ scheduler.
2. We implemented a version of the recovery of input genes that takes into account weights associated with these genes.
3. Regulons, i.e. the regulatory network that connects a TF with its target genes, with targets that are repressed are now also derived and used for cell enrichment analysis.


Additional resources
--------------------

For more information, please visit LCB_, 
the main `SCENIC website `_,
or `SCENIC (R version) `_.
There is a tutorial to `create new cisTarget databases `_.
The CLI to pySCENIC has also been streamlined into a pipeline that can be run with a single command, using the Nextflow workflow manager.
There are two Nextflow implementations available:

* `SCENICprotocol`_: A Nextflow DSL1 implementation of pySCENIC alongside a basic "best practices" expression analysis. Includes details on pySCENIC installation, usage, and downstream analysis, along with detailed tutorials.
* `VSNPipelines`_: A Nextflow DSL2 implementation of pySCENIC with a comprehensive and customizable pipeline for expression analysis. Includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).


Acknowledgments
---------------

We are grateful to all providers of TF-annotated position weight matrices, in particular Martha Bulyk (UNIPROBE), Wyeth Wasserman and Albin Sandelin (JASPAR), BioBase (TRANSFAC), Scot Wolfe and Michael Brodsky (FlyFactorSurvey) and Timothy Hughes (cisBP).


References
----------

.. [1] Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Meth 14, 1083–1086 (2017). `doi:10.1038/nmeth.4463 `_
.. [2] Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. conference.scipy.org
.. [3] Huynh-Thu, V. A. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, (2010). `doi:10.1371/journal.pone.0012776 `_
.. [4] Van de Sande B., Flerin C., et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc. June 2020:1-30. `doi:10.1038/s41596-020-0336-2 `_

.. |buildstatus| image:: https://travis-ci.org/aertslab/pySCENIC.svg?branch=master
.. _buildstatus: https://travis-ci.org/aertslab/pySCENIC

.. |pypipackage| image:: https://img.shields.io/pypi/v/pySCENIC?color=%23026aab
.. _pypipackage: https://pypi.org/project/pyscenic/

.. |docstatus| image:: https://readthedocs.org/projects/pyscenic/badge/?version=latest
.. _docstatus: http://pyscenic.readthedocs.io/en/latest/?badge=latest

.. _SCENIC: http://scenic.aertslab.org
.. _dask: https://dask.pydata.org/en/latest/
.. _distributed: https://distributed.readthedocs.io/en/latest/
.. _arboreto: https://arboreto.readthedocs.io
.. _LCB: https://aertslab.org
.. _`SCENICprotocol`: https://github.com/aertslab/SCENICprotocol
.. _`VSNPipelines`: https://github.com/vib-singlecell-nf/vsn-pipelines
.. _notebooks: https://github.com/aertslab/pySCENIC/tree/master/notebooks
.. _issue: https://github.com/aertslab/pySCENIC/issues/new
.. _PyPI: https://pypi.python.org/pypi/pyscenic

Owner

  • Name: aertslab
  • Login: aertslab
  • Kind: organization
  • Location: Leuven, Belgium

GitHub Events

Total
  • Commit comment event: 1
  • Issues event: 34
  • Watch event: 89
  • Issue comment event: 95
  • Push event: 1
  • Pull request event: 2
  • Fork event: 19
Last Year
  • Commit comment event: 1
  • Issues event: 34
  • Watch event: 89
  • Issue comment event: 95
  • Push event: 1
  • Pull request event: 2
  • Fork event: 19

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 613
  • Total Committers: 18
  • Avg Commits per committer: 34.056
  • Development Distribution Score (DDS): 0.385
Past Year
  • Commits: 27
  • Committers: 2
  • Avg Commits per committer: 13.5
  • Development Distribution Score (DDS): 0.037
Top Committers
Name Email Commits
bramvandesande v****m@g****m 377
Chris Flerin c****n@g****m 119
Gert Hulselmans g****s@k****e 45
Bram Van de Sande b****e@u****m 15
Gert Hulselmans g****s@k****e 14
Carmen Bravo c****s@k****e 11
Bram Van de Sande b****s 7
Chris Campbell Flerin C****n@g****m 4
dweemx m****r@k****e 4
dependabot[bot] 4****] 4
Sara Aibar 2****r 3
cbravo c****b@g****m 2
simonvh s****n@g****m 2
PascalP p****t 2
u0125489 u****9@g****e 1
Lorena Mendez p****8@g****m 1
Carlos Enriquez c****6@g****m 1
dweemx m****r@k****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 249
  • Total pull requests: 15
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 months
  • Total issue authors: 190
  • Total pull request authors: 5
  • Average comments per issue: 2.67
  • Average comments per pull request: 1.07
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 8
Past Year
  • Issues: 54
  • Pull requests: 3
  • Average time to close issues: 12 days
  • Average time to close pull requests: 1 minute
  • Issue authors: 36
  • Pull request authors: 1
  • Average comments per issue: 0.96
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • KOBE24DUNK (7)
  • wangjiawen2013 (7)
  • Sirin24 (5)
  • Erfan1369 (5)
  • yulchen810 (4)
  • JensFGG (4)
  • rayajallad (4)
  • GeneVector5 (3)
  • Flu09 (3)
  • Sayyam-Shah (3)
  • stefanerb89 (2)
  • ken-chen-18 (2)
  • robertzeibich (2)
  • baptisteavot-ukdri (2)
  • apal6 (2)
Pull Request Authors
  • dependabot[bot] (8)
  • sabeck123 (3)
  • carlos-a-enriquez (2)
  • MGMCN (1)
  • pcm32 (1)
Top Labels
Issue Labels
bug (149) question (44) results (12) python (2) duplicate (1)
Pull Request Labels
dependencies (8)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 4,173 last-month
  • Total docker downloads: 461
  • Total dependent packages: 6
  • Total dependent repositories: 14
  • Total versions: 63
  • Total maintainers: 3
pypi.org: pyscenic

Python implementation of the SCENIC pipeline for transcription factor inference from single-cell transcriptomics experiments.

  • Versions: 63
  • Dependent Packages: 6
  • Dependent Repositories: 14
  • Downloads: 4,173 Last month
  • Docker Downloads: 461
Rankings
Dependent packages count: 2.3%
Docker downloads count: 3.1%
Stargazers count: 3.4%
Forks count: 3.8%
Average: 3.9%
Dependent repos count: 3.9%
Downloads: 6.6%
Maintainers (3)
Last synced: 6 months ago

Dependencies

requirements.doc.txt pypi
  • restructuredtext_lint *
  • sphinx *
  • sphinx_rtd_theme *
requirements.txt pypi
  • aiohttp *
  • arboreto >=0.1.6
  • attrs *
  • boltons *
  • cloudpickle *
  • ctxcore >=0.2.0
  • cytoolz *
  • dask *
  • distributed *
  • frozendict *
  • fsspec *
  • interlap *
  • llvmlite *
  • loompy *
  • multiprocessing_on_dill *
  • networkx *
  • numba >=0.51.2
  • numpy *
  • pandas >=1.3.5
  • pyyaml *
  • requests *
  • scikit-learn >=0.22.2
  • scipy *
  • setuptools *
  • tqdm *
  • umap-learn *
Dockerfile docker
  • python 3.10.6-slim-bullseye build
requirements_docker_with_scanpy.txt pypi
  • MulticoreTSNE ==0.1
  • ipykernel ==6.15.1
  • louvain ==0.7.1
  • papermill ==2.3.4
  • scanpy ==1.9.1
setup.py pypi