rich_directrna

Nanopore directRNA (hopefully cDNA too ;)) workflow with many transcript reconstruction alternatives

https://github.com/number-25/rich_directrna

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary

Keywords

genomics nextflow transcriptomics transcriptomics-data-integration
Last synced: 4 months ago · JSON representation ·

Repository

Nanopore directRNA (hopefully cDNA too ;)) workflow with many transcript reconstruction alternatives

Basic Info
  • Host: GitHub
  • Owner: number-25
  • License: mit
  • Language: HTML
  • Default Branch: dev
  • Homepage:
  • Size: 7.78 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
genomics nextflow transcriptomics transcriptomics-data-integration
Created over 1 year ago · Last pushed 5 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

nf-core CI nf-core linting comment GitHub Actions Linting Status Cite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Introduction

rich_directRNA is a bioinformatics pipeline that is still in the works… It is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction, and quantification options with the use of a reference genome, and transcriptome annotation. Additionally, it performs post transcriptome reconstruction assessment, and recovery.

The pipeline currently only accepts sequencing data from directRNA Oxford Nanopore Technologies (ONT) libraries. It is recommended to provide raw FASTQ files to the pipeline, however, it will also accept already mapped sequencing reads in BAM format. These are provided to the samplesheet as input.

  1. QC of FASTQ input files ( NANOQ, SEQUALI )
  2. Mapping to reference genome ( minimap2 )
  3. Sort and index alignments ( samtools )
  4. Create bigWig coverage files ( bedtools, bedGraphToBigWig )
  5. Extensive QC of alignments
    1. samtools
    2. cramino
    3. alfred
    4. ngs-bits
  6. Multiple transcriptome reconstruction options, with read correction options.
    1. FLAIR - allows read correction
    2. bambu - very minor read correction
    3. IsoQuant - allows read correction
    4. StringTie
  7. Fusion gene detection JAFFA
  8. Transcriptome assessment gffutils
  9. Transcript quantification ( TranSigner, oarfish )

Small test datasets for the pipeline are included in the assets directory.

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Now, you can run the pipeline using:

bash nextflow run . \ -profile <docker/singularity/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR>

To run a minimal, quick test dataset, use:

```bash mkdir testing_dir

nextflow run . \ -profile test,singularity \ --outdir testing_dir \ -c conf/test.config ``

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

number-25/rich_directRNA was originally written by Dean Bašić.

We thank the following people for their extensive assistance in the development of this pipeline:

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: Dean Basic
  • Login: number-25
  • Kind: user
  • Company: University of Queensland Genome Innovation Hub

Computation, Evolution, Communes

Citation (CITATIONS.md)

# number-25/rich_directRNA: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Public event: 1
  • Push event: 53
Last Year
  • Public event: 1
  • Push event: 53