expansionhunter

Nextflow pipeline for Expansion Hunter

https://github.com/kubranarci/expansionhunter

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Nextflow pipeline for Expansion Hunter

Basic Info
  • Host: GitHub
  • Owner: kubranarci
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 83 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License Citation

README.md

odcf/expansionhunter

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity

Introduction

odcf/expansionhunter is a bioinformatics pipeline to analyse Short Tandem Repeats (STRs) using a combination of Expansion Hunter, samtools, and STRANGER. It is designed to be flexible, supporting various analysis types including single-sample, trio, and somatic cases, and includes an automated sex determination step.

It is designed to be flexible, supporting various analysis types including single-sample, trio, and somatic cases, and includes an automated sex determination step.

Pipeline Description

The pipeline's core logic adapts to your input, following these key steps:

  1. Sex Determination: If the sex of a sample is not provided, the pipeline will automatically run the ngs-bits/samplegender tool. This script analyzes the ratio of reads on chromosomes 19, X, and Y to predict the sample's sex.

  2. Expansion Hunter: The pipeline runs the Expansion Hunter tool to analyze each sample's BAM file and identify STR expansions.

  3. Sample Merging (Conditional): Uses SVDB merge tool.

  • Trio Analysis: If you provide three samples (child, father, mother), the pipeline will merge the Expansion Hunter VCF files into a single output.

  • Somatic Analysis: For tumor/control sample pairs, the pipeline merges the respective VCF files.

  • Single Sample: For a single sample, this merging step is skipped.

  1. STRANGER Analysis: The final step runs STRANGER on the merged or single-sample VCF files.

  2. MultiQC: Produces final QC reports as well as the versionings of the tools used in this pipeline.

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

  1. Download the pipeline and test it on a minimal dataset with a single command:

bash git clone https://github.com/kubranarci/ExpansionHunter.git

  1. Set up samplesheet.csv

A samplesheet has to have following columns

sample: The sample name, will be used to create final reports and rename vcf files.

bam: BAM file path to the sample

bai: Index of bam file.

sex: Gender of the sample. Can be female, male, or unknown.

case_id: Case if will be used for trio or somatic analysis to merge and name case samples.

phenotype: Describes the phenotype of the analysis.

  • Trio: father, mather or child

  • Somatic: tumor or control

  • Single: single

samplesheet.csv:

csv sample,bam,bai,sex,case_id,phenotype triofather,triofather.bam,triofather.bam.bai,male,father,triocase triomother,triomother.bam,triomother.bam.bai,female,mother,triocase triochild,triochild.bam,triochild.bam.bai,male,child,triocase somaticcontrol,somaticcontrol.bam,somaticcontrol.bam.bai,unknown,control,somaticcase somatictumor,somatictumor.bam,somatictumor.bam.bai,unknown,tumor,somaticcasecase test,test.bam,test.bam.bai,unknown,single,singlecase

Check out /assets/samplesheet.csv for an example sample sheet file.

  1. Prepare reference data

This pipelines needs a fasta reference with fai index and variant catalog file to run Expansion Hunter and Stranger.

  • fasta: Path to the FASTA reference
  • fai: Path to FAI of FASTA file
  • variant_catalog: Path to the variant catalog, GRCh37 and GRCh38 versions from STRANGER can be find in assets/. For more, check https://github.com/Clinical-Genomics/stranger/tree/master/stranger/resources
  1. Now, you can run the pipeline using:

bash nextflow run odcf/expansionhunter \ -profile <docker/singularity/.../institute> \ --input samplesheet.csv \ --fasta reference.fa \ --fai reference.fai \ --variant_catalog variant_catalog.json \ --outdir <OUTDIR>

Credits

odcf/expansionhunter was originally written by @kubranarci.

This pipeline was inspired from repeat analysis subworkflow of nf-core/rarediseases pipeline.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

  • Name: Kübra Narcı
  • Login: kubranarci
  • Kind: user
  • Location: Heidelberg
  • Company: @ghga-de @DKFZ-ODCF

Citation (CITATIONS.md)

# odcf/expansionhunter: Citations


## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [BCFtools](https://academic.oup.com/gigascience/article/10/2/giab008/6137722) & [SAMtools](https://academic.oup.com/bioinformatics/article/25/16/2078/204688)

  > Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. doi:10.1093/gigascience/giab008

- [ExpansionHunter](https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz431/5499079)

  > Dolzhenko E, Deshpande V, Schlesinger F, et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Birol I, ed. Bioinformatics. 2019;35(22):4754-4756. doi:10.1093/bioinformatics/btz431
  
- [ngs-bits-samplegender](https://github.com/imgag/ngs-bits/tree/master)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Picard](https://broadinstitute.github.io/picard/)

- [stranger](https://github.com/Clinical-Genomics/stranger)

  > Nilsson D, Magnusson M. moonso/stranger v0.7.1. Published online February 18, 2021. doi:10.5281/ZENODO.4548873

- [svdb](https://github.com/J35P312/SVDB)

  > Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Res. 2017;6:664. doi:10.12688/f1000research.11168.2

- [Tabix](https://academic.oup.com/bioinformatics/article/27/5/718/262743)

  > Li H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011;27(5):718-719. doi:10.1093/bioinformatics/btq671

## Software packaging/containerisation tools

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Issues event: 2
  • Push event: 5
Last Year
  • Issues event: 2
  • Push event: 5

Dependencies

modules/nf-core/multiqc/meta.yml cpan
subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfschema_plugin/meta.yml cpan
modules/nf-core/multiqc/environment.yml pypi