Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: icgc-argo-workflows
- License: mit
- Language: Python
- Default Branch: main
- Size: 2.01 MB
Statistics
- Stars: 0
- Watchers: 10
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Introduction
icgc-argo-workflows/prealnqc is a reproducible bioinformatics best-practice analysis pipeline of ICGC ARGO Pre Alignment QC Workflow for DNA/RNA Sequencing Reads.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, many processes have been installed from nf-core/modules. Specifically, ICGC ARGO specific modules have been submitted and installed form icgc-argo-workflows/argo-modules, in order to make them available to all ICGC ARGO pipelines!
Quick Start
Install
Nextflow(>=22.10.1)Install
Docker.Test the workflow running in
Localmode on a minimal dataset with a single command:
bash
nextflow run icgc-argo-workflows/prealnqc -profile test,standard
Test the workflow running in
RDPCmode with a single command if you have access toRDPC-QAenv and have your valid apitoken available: ```bash nextflow run icgc-argo-workflows/prealnqc -profile rdpcqa,testrdpcqa,standard --apitoken <YOURAPI_TOKEN> ```Start running your own analysis!
If you are getting the input data from & sending output data to ICGC-ARGO data center, and you have valid apitoken, you can run the workflow with:
```bash
nextflow run icgc-argo-workflows/prealnqc -profile <rdpc,rdpcqa,rdpcdev>,standard --apitoken
Otherwise, you can provide the path to the input data in `samplesheet.csv` and run the workflow with:
bash
nextflow run icgc-argo-workflows/prealnqc -profile standard --input samplesheet.csv --outdir
Pipeline summary
Depending on where the input data are coming from and output data are sending to, the workflow can be running in two modes: Local and RDPC . The major tasks performed in the workflow are:
- (RDPC mode only) Download input sequencing metadata/data from data center using SONG/SCORE client tools
- (RDPC mode only) Preprocess input sequencing reads (in FASTQ or BAM) into FASTQ file(s) per read group
- Perform FastQC analysis for FASTQ file(s) per read group
- Perform Cutadapt analysis for FASTQ file(s) per read group
- Perform MultiQC analysis to generate aggregated results
- (RDPC mode only) Generate SONG metadata for all collected QC metrics files and upload them to SONG/SCORE
Inputs
Local mode
First, prepare a sample sheet with your input data that looks as following:
sample_sheet.csv:
csv
sample,lane,fastq_1,fastq_2,single_end(optional)
TEST,C0HVY.2,C0HVY.2_r1.fq.gz,C0HVY.2_r2.fq.gz,false
TEST,D0RE2.1,D0RE2.1_r1.fq.gz,D0RE2.1_r2.fq.gz
TEST,D0RH0.2,D0RH0.2_r1.fq.gz,D0RH0.2_r2.fq.gz
Each row represents a read_group of sequencing reads from a sample.
Now, you can run the workflow using:
bash
nextflow run icgc-argo-workflows/prealnqc \
-profile <standard/singularity> \
--local_mode true \
--input sample_sheet.csv \
--outdir <OUTDIR>
RDPC mode
You can run the workflow in RDPC mode by using:
bash
nextflow run icgc-argo-workflows/prealnqc \
-profile <rdpc,rdpc_qa,rdpc_dev>,<standard/singularity> \
--local_mode false \
--study_id <STUDY_ID> \
--analysis_ids <ANALYSIS_IDS> \
--api_token <YOUR_API_TOKEN> \
--outdir <OUTDIR>
NOTE Please provide workflow parameters via the CLI or Nextflow
-params-fileoption.
Outputs
Upon completion, you can find the aggregated QC metrics under directory:
/path/to/outdir/prep_metrics/<sample_id>.argo_metrics.json
Credits
icgc-argo-workflows/prealnqc was mostly written by Linda Xiang (@lindaxiang), with contributions from Andrej Benjak, Charlotte Ng, Desiree Schnidrig, Edmund Su, Miguel Vazquez, Morgan Taschuk, Raquel Manzano Garcia, Romina Royo and ICGC-ARGO Quality Control Working Group.
Authors (alphabetical) - Andrej Benjak - Charlotte Ng - Desiree Schnidrig - Edmund Su - Linda Xiang - Miguel Vazquez - Morgan Taschuk - Raquel Manzano Garcia - Romina Royo
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: ICGC ARGO Workflows
- Login: icgc-argo-workflows
- Kind: organization
- Location: Toronto, Ontario
- Website: https://www.icgc-argo.org
- Repositories: 26
- Profile: https://github.com/icgc-argo-workflows
Home of the ICGC ARGO (Accelerate Research in Genomic Oncology) Data Platform Scientific Workflows
Citation (CITATIONS.md)
# icgc-argo-workflows/prealnqc: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Release event: 1
- Delete event: 4
- Issue comment event: 1
- Push event: 3
- Pull request event: 2
- Create event: 1
Last Year
- Release event: 1
- Delete event: 4
- Issue comment event: 1
- Push event: 3
- Pull request event: 2
- Create event: 1