expansionhunter
Nextflow pipeline for Expansion Hunter
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Repository
Nextflow pipeline for Expansion Hunter
Basic Info
- Host: GitHub
- Owner: kubranarci
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 83 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
odcf/expansionhunter
Introduction
odcf/expansionhunter is a bioinformatics pipeline to analyse Short Tandem Repeats (STRs) using a combination of Expansion Hunter, samtools, and STRANGER. It is designed to be flexible, supporting various analysis types including single-sample, trio, and somatic cases, and includes an automated sex determination step.
It is designed to be flexible, supporting various analysis types including single-sample, trio, and somatic cases, and includes an automated sex determination step.
Pipeline Description
The pipeline's core logic adapts to your input, following these key steps:
Sex Determination: If the sex of a sample is not provided, the pipeline will automatically run the ngs-bits/samplegender tool. This script analyzes the ratio of reads on chromosomes 19, X, and Y to predict the sample's sex.
Expansion Hunter: The pipeline runs the Expansion Hunter tool to analyze each sample's BAM file and identify STR expansions.
Sample Merging (Conditional): Uses SVDB merge tool.
Trio Analysis: If you provide three samples (child, father, mother), the pipeline will merge the Expansion Hunter VCF files into a single output.
Somatic Analysis: For tumor/control sample pairs, the pipeline merges the respective VCF files.
Single Sample: For a single sample, this merging step is skipped.
STRANGER Analysis: The final step runs STRANGER on the merged or single-sample VCF files.
MultiQC: Produces final QC reports as well as the versionings of the tools used in this pipeline.
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
- Download the pipeline and test it on a minimal dataset with a single command:
bash
git clone https://github.com/kubranarci/ExpansionHunter.git
- Set up samplesheet.csv
A samplesheet has to have following columns
sample: The sample name, will be used to create final reports and rename vcf files.
bam: BAM file path to the sample
bai: Index of bam file.
sex: Gender of the sample. Can be female, male, or unknown.
case_id: Case if will be used for trio or somatic analysis to merge and name case samples.
phenotype: Describes the phenotype of the analysis.
Trio: father, mather or child
Somatic: tumor or control
Single: single
samplesheet.csv:
csv
sample,bam,bai,sex,case_id,phenotype
triofather,triofather.bam,triofather.bam.bai,male,father,triocase
triomother,triomother.bam,triomother.bam.bai,female,mother,triocase
triochild,triochild.bam,triochild.bam.bai,male,child,triocase
somaticcontrol,somaticcontrol.bam,somaticcontrol.bam.bai,unknown,control,somaticcase
somatictumor,somatictumor.bam,somatictumor.bam.bai,unknown,tumor,somaticcasecase
test,test.bam,test.bam.bai,unknown,single,singlecase
Check out /assets/samplesheet.csv for an example sample sheet file.
- Prepare reference data
This pipelines needs a fasta reference with fai index and variant catalog file to run Expansion Hunter and Stranger.
- fasta: Path to the FASTA reference
- fai: Path to FAI of FASTA file
- variant_catalog: Path to the variant catalog, GRCh37 and GRCh38 versions from STRANGER can be find in assets/. For more, check https://github.com/Clinical-Genomics/stranger/tree/master/stranger/resources
- Now, you can run the pipeline using:
bash
nextflow run odcf/expansionhunter \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--fasta reference.fa \
--fai reference.fai \
--variant_catalog variant_catalog.json \
--outdir <OUTDIR>
Credits
odcf/expansionhunter was originally written by @kubranarci.
This pipeline was inspired from repeat analysis subworkflow of nf-core/rarediseases pipeline.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
Owner
- Name: Kübra Narcı
- Login: kubranarci
- Kind: user
- Location: Heidelberg
- Company: @ghga-de @DKFZ-ODCF
- Twitter: kubranarci
- Repositories: 3
- Profile: https://github.com/kubranarci
Citation (CITATIONS.md)
# odcf/expansionhunter: Citations ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [BCFtools](https://academic.oup.com/gigascience/article/10/2/giab008/6137722) & [SAMtools](https://academic.oup.com/bioinformatics/article/25/16/2078/204688) > Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. doi:10.1093/gigascience/giab008 - [ExpansionHunter](https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz431/5499079) > Dolzhenko E, Deshpande V, Schlesinger F, et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Birol I, ed. Bioinformatics. 2019;35(22):4754-4756. doi:10.1093/bioinformatics/btz431 - [ngs-bits-samplegender](https://github.com/imgag/ngs-bits/tree/master) - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [Picard](https://broadinstitute.github.io/picard/) - [stranger](https://github.com/Clinical-Genomics/stranger) > Nilsson D, Magnusson M. moonso/stranger v0.7.1. Published online February 18, 2021. doi:10.5281/ZENODO.4548873 - [svdb](https://github.com/J35P312/SVDB) > Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Res. 2017;6:664. doi:10.12688/f1000research.11168.2 - [Tabix](https://academic.oup.com/bioinformatics/article/27/5/718/262743) > Li H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011;27(5):718-719. doi:10.1093/bioinformatics/btq671 ## Software packaging/containerisation tools - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Issues event: 2
- Push event: 5
Last Year
- Issues event: 2
- Push event: 5