arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.

https://github.com/aertslab/arboreto

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary

Keywords

dask ensemble-learning gene-regulation gradient-boosting inference machine-learning network python random-forest scalable
Last synced: 6 months ago · JSON representation

Repository

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.

Basic Info
  • Host: GitHub
  • Owner: aertslab
  • License: bsd-3-clause
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 63.9 MB
Statistics
  • Stars: 47
  • Watchers: 6
  • Forks: 24
  • Open Issues: 26
  • Releases: 2
Topics
dask ensemble-learning gene-regulation gradient-boosting inference machine-learning network python random-forest scalable
Created over 8 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.rst

.. image:: img/arboreto.png
    :alt: arboreto
    :scale: 100%
    :align: left

.. image:: https://travis-ci.com/aertslab/arboreto.svg?branch=master
    :alt: Build Status
    :target: https://travis-ci.com/aertslab/arboreto

.. image:: https://readthedocs.org/projects/arboreto/badge/?version=latest
    :alt: Documentation Status
    :target: http://arboreto.readthedocs.io/en/latest/?badge=latest

.. image:: https://anaconda.org/bioconda/arboreto/badges/version.svg
    :alt: Bioconda package
    :target: https://anaconda.org/bioconda/arboreto

.. image:: https://img.shields.io/pypi/v/arboreto
    :alt: PyPI package
    :target: https://pypi.org/project/arboreto/

----

.. epigraph::

    *The most satisfactory definition of man from the scientific point of view is probably Man the Tool-maker.*

.. _arboreto: https://arboreto.readthedocs.io
.. _`arboreto documentation`: https://arboreto.readthedocs.io
.. _notebooks: https://github.com/tmoerman/arboreto/tree/master/notebooks
.. _issue: https://github.com/tmoerman/arboreto/issues/new

.. _dask: https://dask.pydata.org/en/latest/
.. _`dask distributed`: https://distributed.readthedocs.io/en/latest/

.. _GENIE3: http://www.montefiore.ulg.ac.be/~huynh-thu/GENIE3.html
.. _`Random Forest`: https://en.wikipedia.org/wiki/Random_forest
.. _ExtraTrees: https://en.wikipedia.org/wiki/Random_forest#ExtraTrees
.. _`Stochastic Gradient Boosting Machine`: https://en.wikipedia.org/wiki/Gradient_boosting#Stochastic_gradient_boosting
.. _`early-stopping`: https://en.wikipedia.org/wiki/Early_stopping

Inferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances
in high-throughput gene profiling technology.

The arboreto_ software library addresses this issue by providing a computational strategy that allows executing the class of GRN inference algorithms
exemplified by GENIE3_ [1] on hardware ranging from a single computer to a multi-node compute cluster. This class of GRN inference algorithms is defined by
a series of steps, one for each target gene in the dataset, where the most important candidates from a set of regulators are determined from a regression
model to predict a target gene's expression profile.

Members of the above class of GRN inference algorithms are attractive from a computational point of view because they are parallelizable by nature. In arboreto,
we specify the parallelizable computation as a dask_ graph [2], a data structure that represents the task schedule of a computation. A dask scheduler assigns the
tasks in a dask graph to the available computational resources. Arboreto uses the `dask distributed`_ scheduler to
spread out the computational tasks over multiple processes running on one or multiple machines.

Arboreto currently supports 2 GRN inference algorithms:

1. **GRNBoost2**: a novel and fast GRN inference algorithm using `Stochastic Gradient Boosting Machine`_ (SGBM) [3] regression with `early-stopping`_ regularization.
2. **GENIE3**: the classic GRN inference algorithm using `Random Forest`_ (RF) or ExtraTrees_ (ET) regression.

Get Started
***********

Arboreto was conceived with the working bioinformatician or data scientist in mind. We provide extensive documentation and examples to help you get up to speed with the library.

* Read the `arboreto documentation`_.
* Browse example notebooks_.
* Report an issue_.

License
*******

BSD 3-Clause License

pySCENIC
========

.. _pySCENIC: https://github.com/aertslab/pySCENIC
.. _SCENIC: https://aertslab.org/#scenic

Arboreto is a component in pySCENIC_: a lightning-fast python implementation of
the SCENIC_ pipeline [5] (Single-Cell rEgulatory Network Inference and Clustering)
which enables biologists to infer transcription factors, gene regulatory networks
and cell types from single-cell RNA-seq data.

References
**********

1. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE
2. Rocklin, M. (2015). Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference (pp. 130-136).
3. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.
4. Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., ... & Dream5 Consortium. (2012). Wisdom of crowds for robust gene network inference. Nature methods, 9(8), 796-804.
5. Aibar S, Bravo Gonzalez-Blas C, Moerman T, Wouters J, Huynh-Thu VA, Imrichova H, Kalender Atak Z, Hulselmans G, Dewaele M, Rambow F, Geurts P, Aerts J, Marine C, van den Oord J, Aerts S. SCENIC: Single-cell regulatory network inference and clustering. Nature Methods 14, 1083–1086 (2017). doi: 10.1038/nmeth.4463

Owner

  • Name: aertslab
  • Login: aertslab
  • Kind: organization
  • Location: Leuven, Belgium

GitHub Events

Total
  • Watch event: 8
  • Issue comment event: 3
  • Pull request review event: 1
  • Fork event: 8
Last Year
  • Watch event: 8
  • Issue comment event: 3
  • Pull request review event: 1
  • Fork event: 8

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 200
  • Total Committers: 4
  • Avg Commits per committer: 50.0
  • Development Distribution Score (DDS): 0.045
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Thomas Moerman t****n@g****m 191
Chris Flerin c****n@g****m 7
redst4r r****r@w****e 1
Ann-Holmes 4****s 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 32
  • Total pull requests: 4
  • Average time to close issues: 2 months
  • Average time to close pull requests: 7 months
  • Total issue authors: 28
  • Total pull request authors: 4
  • Average comments per issue: 1.19
  • Average comments per pull request: 0.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tmoerman (3)
  • mr-september (2)
  • divyanshusrivastava (2)
  • rjb67 (1)
  • boegel (1)
  • jfouyang (1)
  • ZhangDengwei (1)
  • binonteji (1)
  • DHelix (1)
  • YimengQiao (1)
  • cflerin (1)
  • scyrusm (1)
  • topherconley (1)
  • scastlara (1)
  • Seandelao (1)
Pull Request Authors
  • Ann-Holmes (1)
  • gennadyFauna (1)
  • cflerin (1)
  • opoirion (1)
  • redst4r (1)
Top Labels
Issue Labels
enhancement (3) bug (1) deprecation (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 4,562 last-month
  • Total docker downloads: 472
  • Total dependent packages: 6
  • Total dependent repositories: 10
  • Total versions: 4
  • Total maintainers: 3
pypi.org: arboreto

Scalable gene regulatory network inference using tree-based ensemble regressors

  • Versions: 4
  • Dependent Packages: 6
  • Dependent Repositories: 10
  • Downloads: 4,562 Last month
  • Docker Downloads: 472
Rankings
Docker downloads count: 2.8%
Dependent packages count: 3.2%
Dependent repos count: 4.6%
Average: 5.9%
Downloads: 6.8%
Forks count: 8.0%
Stargazers count: 9.9%
Maintainers (3)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • dask *
  • distributed *
  • numpy >=1.16.5
  • pandas *
  • scikit-learn *
  • scipy *
setup.py pypi