cnr-flow

CUT&RUN-Flow, A Nextflow pipeline for QC, tag trimming, normalization, and peak calling for data from CUT&RUN experiments.

https://github.com/rennelab/cnr-flow

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords

chip-seq-pipelines cutandrun cutandrun-seq cutrun genomics nextflow peak-calling
Last synced: 6 months ago · JSON representation ·

Repository

CUT&RUN-Flow, A Nextflow pipeline for QC, tag trimming, normalization, and peak calling for data from CUT&RUN experiments.

Basic Info
  • Host: GitHub
  • Owner: RenneLab
  • License: gpl-3.0
  • Language: Nextflow
  • Default Branch: master
  • Homepage:
  • Size: 23.7 MB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 4
  • Open Issues: 0
  • Releases: 1
Topics
chip-seq-pipelines cutandrun cutandrun-seq cutrun genomics nextflow peak-calling
Created over 5 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License Citation

README.rst

***********************
CUT&RUN-Flow (CnR-flow)
***********************
.. image:: https://img.shields.io/github/v/release/rennelab/cnr-flow?include_prereleases&logo=github
   :target: https://github.com/rennelab/cnr-flow/releases
   :alt: GitHub release (latest by date including pre-releases)
.. image:: https://circleci.com/gh/RenneLab/CnR-flow.svg?style=shield&circle-token=0c2e0d49a95709cbb3f0bb8b7d8d05ffa4547d14
   :target: https://app.circleci.com/pipelines/github/RenneLab/CnR-flow
   :alt: CircleCI Build Status
.. image:: https://img.shields.io/readthedocs/cnr-flow?logo=read-the-docs
   :target: https://CnR-flow.readthedocs.io/en/latest/?badge=latest
   :alt: ReadTheDocs Documentation Status
.. image:: https://img.shields.io/badge/nextflow-%3E%3D20.10.6-green
   :target: https://www.nextflow.io/
   :alt: Nextflow Version Required >= 20.10.6
.. image:: https://img.shields.io/badge/License-GPLv3+-blue?logo=GNU
   :target: https://www.gnu.org/licenses/gpl-3.0.en.html
   :alt: GNU GPLv3+ License
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4015698.svg
   :target: https://doi.org/10.5281/zenodo.4015698
   :alt: Zenodo DOI:10.5281/zenodo.4015698

| Welcome to *CUT&RUN-Flow* (*CnR-flow*), a Nextflow pipeline for QC, tag 
  trimming, normalization, and peak calling for paired-end sequencing 
  data from CUT&RUN experiments.
| This software is available via GitHub, at 
  http://www.github.com/RenneLab/CnR-flow .
| Full project documentation is available at |docs_link|_.

Pipeline Design:
    | CUT&RUN-Flow is built using `Nextflow`_, a powerful 
      domain-specific workflow language built to create flexible and 
      efficient bioinformatics pipelines. 
      Nextflow provides extensive flexibility in utilizing cluster 
      computing environments such as `PBS`_ and `SLURM`_, 
      and in automated and compartmentalized handling of dependencies using 
      `Conda`_ / `Bioconda`_, `Docker`_, `Singularity`_ or `Environment Modules`_.
    
Dependencies:
    | In addition to local configurations, Nextflow handles 
      dependencies in separated working environments within the same pipeline 
      using `Conda`_ or `Environment Modules`_ within your working environment,
      or using container-encapsulated execution with `Docker`_ or `Singularity`_. 
      **CnR-flow is pre-configured to auto-acquire dependencies with no additional setup,
      either using Conda recipes from the Bioconda project, 
      or by using Docker or Singularity to execute Docker images hosted by the
      BioContainers project** (`Bioconda`_; `BioContainers`_).

    | CUT&RUN-Flow utilizes 
      `UCSC Genome Browser Tools`_ and  `Samtools`_
      for reference library preparation,
      `FastQC`_ for tag quality control,
      `Trimmomatic`_ for tag trimming, `Bowtie2`_ for tag alignment,
      `Samtools`_, `bedtools`_ and `UCSC Genome Browser Tools`_
      for alignment manipulation, and `MACS2`_ and/or `SEACR`_
      for peak calling, as well as their associated language subdependencies of
      Java, Python2/3, R, and C++.

Pipeline Features:
    * One-step reference database prepration using a path (or URL)
      to a FASTA file.
    * Ability to specify groups
      of samples containing both treatment (Ex: H3K4me3) and 
      control (Ex: IgG) antibody
      groups, with automated association of each control sample with the 
      respective treatment samples during the peak calling step
    * Built-in normalization
      protocol to normalize to a sequence library of the user's choice
      when spike-in DNA is used in the CUT&RUN Protocol (Optional, includes an 
      *E. coli* reference genome for utiliziation of *E. coli* 
      as a spike-in control as described by |Meers2019| 
      [see the |References| section of |docs_link|_])
    * OR: CPM-normalization to normalize total read counts between samples (beta).
    * SLURM, PBS... and many other job scheduling environments 
      enabled natively by Nextflow
    * Output of memory-efficient CRAM (alignment), 
      bedgraph (genome coverage), 
      and bigWig (genome coverage) file formats

    |pipe_dotgraph|

| For a full list of required dependencies and tested versions, see 
  the |Dependencies| section of |docs_link|_, and for dependency 
  configuration options see the |Dependency Config| section.

.. _Quickstart:

Quickstart
------------
    Here is a brief introduction on how to install and get started using the pipeline. 
    For full details, see |docs_link|_.
    
    Prepare Task Directory:
        | Create a task directory, and navigate to it.
    
        .. code-block:: bash   
    
                $ mkdir ./my_task  # (Example)
                $ cd ./my_task     # (Example)
    
    Install Nextflow (if necessary):
        | Download the nextflow executable to your current directory.
        | (You can move the nextflow executable and add to $PATH for 
          future usage)
    
        .. code-block:: bash
    
            $ curl -s https://get.nextflow.io | bash
    
            # For the following steps, use:
            nextflow    # If nextflow executable on $PATH (assumed)
            ./nextflow  # If running nextflow executable from local directory
    
    Download and Install CnR-flow:
        | Nextflow will download and store the pipeline in the 
          user's Nextflow info directory (Default: ``~/.nextflow/``)
    
        .. code-block:: bash
    
            $ nextflow run RenneLab/CnR-flow --mode initiate    
    
    Configure, Validate, and Test:
        Conda: 
          * Install miniconda (if necessary).
            `Installation instructions `_
          * The CnR-flow configuration with Conda should then work "out-of-the-box."

        Docker:
          * Add '-profile docker' to all nextflow commands

        Singularity:
          * Add '-profile singularity' to all nextflow commands

        | If using an alternative configuration, see the |Dependency Config|
          section of |docs_link|_ for dependency configuration options.
        |
        | Once dependencies have been configured, validate all dependencies:
    
        .. code-block:: bash

            # Conda or other configs:    
            $ nextflow run CnR-flow --mode validate_all

            # OR Docker Configuration:    
            $ nextflow run CnR-flow -profile docker --mode validate_all

            # OR Singularity Configuration:    
            $ nextflow run CnR-flow -profile singularity --mode validate_all
    
        | Fill the required task input parameters in "nextflow.config"
          For detailed setup instructions, see the  |Task Setup| 
          section of |docs_link|_
          *Additionally, for usage on a SLURM, PBS, or other cluster systems, 
          configure your system executor, time, and memory settings.*
    
        .. code-block:: bash
    
            # Configure:
            $  nextflow.config   # Task Input, Steps, etc. Configuration
        
            #REQUIRED values to enter (all others *should* work as default):
            # ref_fasta               (or some other ref-mode/location)
            # treat_fastqs            (input paired-end fastq[.gz] file paths)
            #   [OR fastq_groups]     (mutli-group input paired-end .fastq[.gz] file paths)
    
    Prepare and Execute Pipeline:
        | Prepare your reference databse (and normalization reference) from .fasta[.gz]
          file(s): 
    
        .. code-block:: bash
    
            $ nextflow run CnR-flow --mode prep_fasta
    
        | Perform a test run to check inputs, paramater setup, and process execution:
    
        .. code-block:: bash
    
            $ nextflow run CnR-flow --mode dry_run
    
        | If satisifed with the pipeline setup, execute the pipeline:
    
        .. code-block:: bash
    
            $ nextflow run CnR-flow --mode run
    
        | Further documentation on CUT&RUN-Flow components, setup, and usage can
          be found in |docs_link|_.
    
.. |References| replace:: *References*
.. |Meers2019| replace:: *Meers et. al. (eLife 2019)*
.. |Dependency Config| replace:: *Dependency Configuration*
.. |Dependencies| replace:: *Dependencies*
.. |Task Setup| replace:: *Task Setup*
.. |pipe_dotgraph| image:: build_info/dotgraph_parsed.png
    :alt: CUT&RUN-Flow Pipe Flowchart
.. |docs_link| replace:: CUT&RUN-Flow's ReadTheDocs
.. _docs_link: https://cnr-flow.readthedocs.io#

.. _Nextflow: http://www.nextflow.io
.. _Bioconda: https://bioconda.github.io/
.. _CUTRUNTools: https://bitbucket.org/qzhudfci/cutruntools/src
.. _SEACR: https://github.com/FredHutch/SEACR
.. _R: https://www.r-project.org/
.. _Bowtie2: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
.. _faCount: https://hgdownload.cse.ucsc.edu/admin/exe/
.. _Samtools: http://www.htslib.org/
.. _FastQC: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
.. _Trimmomatic: http://www.usadellab.org/cms/?page=trimmomatic
.. _bedtools: https://bedtools.readthedocs.io/en/latest/
.. _bedGraphToBigWig: https://hgdownload.cse.ucsc.edu/admin/exe/
.. _MACS2: https://github.com/macs3-project/MACS
.. _PBS: https://www.openpbs.org/
.. _SLURM: https://slurm.schedmd.com/
.. _CONDA: https://anaconda.org/
.. _Environment Modules: http://modules.sourceforge.net/
.. _Docker: http://www.docker.com/
.. _Singularity: https://sylabs.io/
.. _BioContainers: https://biocontainers.pro/
.. _UCSC Genome Browser Tools: https://hgdownload.cse.ucsc.edu/admin/exe/
.. _kseq_test: https://bitbucket.org/qzhudfci/cutruntools/src
.. _CUT&RUN-Tools: https://bitbucket.org/qzhudfci/cutruntools/src

Owner

  • Name: Renne Lab
  • Login: RenneLab
  • Kind: organization
  • Location: Gainesville, FL

Citation (Citations.bib)

@article{di2017nextflow,
  title={Nextflow enables reproducible computational workflows},
  author={Di Tommaso, Paolo and Chatzou, Maria and Floden, Evan W and Barja, Pablo Prieto and Palumbo, Emilio and Notredame, Cedric},
  journal={Nature biotechnology},
  volume={35},
  number={4},
  pages={316--319},
  year={2017},
  publisher={Nature Publishing Group}
}

@article{gruning2018bioconda,
  title={Bioconda: sustainable and comprehensive software distribution for the life sciences},
  author={Gr{\"u}ning, Bj{\"o}rn and Dale, Ryan and Sj{\"o}din, Andreas and Chapman, Brad A and Rowe, Jillian and Tomkins-Tinch, Christopher H and Valieris, Renan and K{\"o}ster, Johannes},
  journal={Nature methods},
  volume={15},
  number={7},
  pages={475--476},
  year={2018},
  publisher={Nature Publishing Group}
}

@article{zhu2019cut,
  title={CUT\&RUNTools: a flexible pipeline for CUT\&RUN processing and footprint analysis},
  author={Zhu, Qian and Liu, Nan and Orkin, Stuart H and Yuan, Guo-Cheng},
  journal={Genome biology},
  volume={20},
  number={1},
  pages={192},
  year={2019},
  publisher={Springer}
}

@article{meers2019peak,
  title={Peak calling by Sparse Enrichment Analysis for CUT\&RUN chromatin profiling},
  author={Meers, Michael P and Tenenbaum, Dan and Henikoff, Steven},
  journal={Epigenetics \& chromatin},
  volume={12},
  number={1},
  pages={42},
  year={2019},
  publisher={Springer}
}

@Manual{,
  title = {R: A Language and Environment for Statistical Computing},
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2017},
  url = {https://www.R-project.org/},
}

@article{10.1093/bioinformatics/btx192,
    author = {da Veiga Leprevost, Felipe and Grüning, Björn A and Alves Aflitos, Saulo and Röst, Hannes L and Uszkoreit, Julian and Barsnes, Harald and Vaudel, Marc and Moreno, Pablo and Gatto, Laurent and Weber, Jonas and Bai, Mingze and Jimenez, Rafael C and Sachsenberg, Timo and Pfeuffer, Julianus and Vera Alvarez, Roberto and Griss, Johannes and Nesvizhskii, Alexey I and Perez-Riverol, Yasset},
    title = "{BioContainers: an open-source and community-driven framework for software standardization}",
    journal = {Bioinformatics},
    volume = {33},
    number = {16},
    pages = {2580-2582},
    year = {2017},
    month = {03},
    abstract = "{BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters).The software is freely available at github.com/BioContainers/.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btx192},
    url = {https://doi.org/10.1093/bioinformatics/btx192},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/33/16/2580/25163480/btx192.pdf},
}

@article{langmead2012fast,
  title={Fast gapped-read alignment with Bowtie 2},
  author={Langmead, Ben and Salzberg, Steven L},
  journal={Nature methods},
  volume={9},
  number={4},
  pages={357},
  year={2012},
  publisher={Nature Publishing Group}
}

@article{kent2002human,
  title={The human genome browser at UCSC},
  author={Kent, W James and Sugnet, Charles W and Furey, Terrence S and Roskin, Krishna M and Pringle, Tom H and Zahler, Alan M and Haussler, David},
  journal={Genome research},
  volume={12},
  number={6},
  pages={996--1006},
  year={2002},
  publisher={Cold Spring Harbor Lab}
}

@article{li2009sequence,
  title={The sequence alignment/map format and SAMtools},
  author={Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, Richard},
  journal={Bioinformatics},
  volume={25},
  number={16},
  pages={2078--2079},
  year={2009},
  publisher={Oxford University Press}
}

@misc{andrews2015quality,
  title={A quality control tool for high throughput sequence data. 2010},
  author={Andrews, Simon and FastQC, A},
  year={2015}
}

@article{bolger2014trimmomatic,
  title={Trimmomatic: a flexible trimmer for Illumina sequence data},
  author={Bolger, Anthony M and Lohse, Marc and Usadel, Bjoern},
  journal={Bioinformatics},
  volume={30},
  number={15},
  pages={2114--2120},
  year={2014},
  publisher={Oxford University Press}
}

@article{quinlan2010bedtools,
  title={BEDTools: a flexible suite of utilities for comparing genomic features},
  author={Quinlan, Aaron R and Hall, Ira M},
  journal={Bioinformatics},
  volume={26},
  number={6},
  pages={841--842},
  year={2010},
  publisher={Oxford University Press}
}

@article{kent2010bigwig,
  title={BigWig and BigBed: enabling browsing of large distributed datasets},
  author={Kent, W James and Zweig, Ann S and Barber, G and Hinrichs, Angie S and Karolchik, Donna},
  journal={Bioinformatics},
  volume={26},
  number={17},
  pages={2204--2207},
  year={2010},
  publisher={Oxford University Press}
}

@article{zhang2008model,
  title={Model-based analysis of ChIP-Seq (MACS)},
  author={Zhang, Yong and Liu, Tao and Meyer, Clifford A and Eeckhoute, J{\'e}r{\^o}me and Johnson, David S and Bernstein, Bradley E and Nusbaum, Chad and Myers, Richard M and Brown, Myles and Li, Wei and others},
  journal={Genome biology},
  volume={9},
  number={9},
  pages={1--9},
  year={2008},
  publisher={BioMed Central}
}

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 227
  • Total Committers: 1
  • Avg Commits per committer: 227.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Dan Stribling ds@u****u 227
Committer Domains (Top 20 + Academic)
ufl.edu: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 6
  • Total pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 5.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mhriris (2)
  • EllieDuan (2)
  • zakiF (1)
  • Xinming-W (1)
Pull Request Authors
Top Labels
Issue Labels
bug (1)
Pull Request Labels