https://github.com/ciberer/nf-cbra-snvs

https://github.com/ciberer/nf-cbra-snvs

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: CIBERER
  • License: mit
  • Language: Nextflow
  • Default Branch: master
  • Size: 24.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 15
  • Releases: 0
Created almost 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Introduction

nf-CBRA-snvs (nf-core - CIBERER Bioinformatics for Rare diseases Analysis - Small Nucleotide Variant) is a workflow optimized for the analysis of rare diseases, designed to detect SNVs and INDELs in targeted sequencing data (CES/WES) as well as whole genome sequencing (WGS).

This pipeline is developed using Nextflow, a workflow management system that enables an easy execution across various computing environments. It uses Docker or Singularity containers, simplifying setup and ensuring reproducibility of results. The pipeline assigns a container to each process, which simplifies the management and updating of software dependencies. When possible, processes are sourced from nf-core/modules, promoting reusability across all nf-core pipelines and contributing to the broader Nextflow community.

Pipeline summary

The pipeline can perform the following steps:

  • Mapping of the reads to reference (BWA-MEM)
  • Process BAM file (GATK MarkDuplicates, GATK BaseRecalibrator and GATK ApplyBQSR)
  • Variant calling with the following tools:

    • GATK4 Haplotypecaller (run_gatk = true). This subworkflow includes:
    • GATK4 Haplotypecaller.
    • Hard Filters and VarianFiltration to mark PASS variants. More information here.
    • Bcftools Filter to keep PASS variants on chr1-22, X, Y.
    • Split Multialletic.
    • Dragen (run_dragen = true). This subworkflow includes:
    • GATK4 Calibratedragstrmodel
    • GATK4 Haplotypecaller with --dragen-mode.
    • VarianFiltration with --filter-expression "QUAL < 10.4139" --filter-name "DRAGENHardQUAL"to mark PASS variants. More information here.
    • Bcftools Filter to keep PASS variants on chr1-22, X, Y.
    • Split Multialletic.
    • DeepVariant (run_deepvariant = true). This subworkflow includes:
    • DeepVariant makeexamples: Converts the input alignment file to a tfrecord format suitable for the deep learning model.
    • DeepVariant callvariants: Call variants based on input tfrecords. The output is also in tfrecord format, and needs postprocessing to convert it to vcf.
    • DeepVariant postprocessvariants: Convert variant calls from callvariants to VCF, and also create GVCF files based on genomic information from makeexamples. More information here.
    • Bcftools Filter to keep PASS variants on chr1-22, X, Y.
    • Split Multialletic.
  • Merge and integration of the vcfs obtained with the different tools.

  • Annotation of the variants:

    • Regions of homozygosity (ROHs) with AUTOMAP
    • Effect of the variants with Ensembl VEP using the flag --everything, which includes the following options: --sift b, --polyphen b, --ccds, --hgvs, --symbol, --numbers, --domains, --regulatory, --canonical, --protein, --biotype, --af, --af_1kg, --af_esp, --af_gnomade, --af_gnomadg, --max_af, --pubmed, --uniprot, --mane, --tsl, --appris, --variant_class, --gene_phenotype, --mirna
    • Postvep format VEP tab demilited output and filter variants by minor allele frequency (--maf).
    • You can enhance the annotation by incorporating gene rankings from GLOWgenes, a network-based algorithm developed to prioritize novel candidate genes associated with rare diseases. Precomputed rankings based on PanelApp gene panels are available here. To include a specific GLOWgenes ranking, use the option --glowgenes_panel (path to the panel.txt), for example: --glowgenes_panel https://raw.githubusercontent.com/TBLabFJD/GLOWgenes/refs/heads/master/precomputed_panelAPP/GLOWgenes_prioritization_Neurological_ciliopathies_GA.txt. Additionally, you can include the Gene-Disease Specificity Score (SGDS) using: --glowgenes_sgds https://raw.githubusercontent.com/TBLabFJD/GLOWgenes/refs/heads/master/SGDS.csv. This score ranges from 0 to 1, where 1 indicates a gene ranks highly for only a few specific diseases (high specificity), and 0 indicates the gene consistently ranks highly across many diseases (low specificity).

Usage

First, prepare a samplesheet with your input data:

sample,fastq_1,fastq_2 SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz

Each row represents a pair of paired end fastq files.

You can run the pipeline using:

nextflow run nf-cbra-snvs/main.nf \ -profile <docker/singularity/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR>

For more details and further functionality, please refer to the usage documentation.

Pipeline output

For details about the output files and reports, please refer to the output documentation.

Credits

nf-CBRA-snvs was developed within the framework of a call for intramural cooperative and complementary actions (ACCI) funded by CIBERER (Biomedical Research Network Centre for Rare Diseases).

Main Developer - Yolanda Benítez Quesada

Coordinator - Carlos Ruiz Arenas

Other contributors - Graciela Uría Regojo - Pedro Garrido Rodríguez - Rafa Farias Varona - Pablo Minguez - Daniel Lopez

Owner

  • Name: CIBERER
  • Login: CIBERER
  • Kind: organization
  • Location: Spain

Citation (CITATIONS.md)

# CIBERER/nf-CBRA-snvs: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Issues event: 2
  • Delete event: 6
  • Issue comment event: 4
  • Push event: 20
  • Pull request review comment event: 12
  • Pull request event: 9
  • Pull request review event: 11
  • Create event: 1
Last Year
  • Issues event: 2
  • Delete event: 6
  • Issue comment event: 4
  • Push event: 20
  • Pull request review comment event: 12
  • Pull request event: 9
  • Pull request review event: 11
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 15 hours
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 15 hours
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • yocra3 (1)
  • yolandabq (1)
Pull Request Authors
  • yolandabq (3)
  • yocra3 (1)
Top Labels
Issue Labels
documentation (1)
Pull Request Labels