https://github.com/aa9gj/low-pass-wgs-pipeline
Step by step analysis for variant calling and imputation from low-pass sequencing data
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary
Repository
Step by step analysis for variant calling and imputation from low-pass sequencing data
Basic Info
- Host: GitHub
- Owner: aa9gj
- Language: Shell
- Default Branch: main
- Size: 148 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Low-pass Whole Genome Sequencing (LP-WGS)
Background
Traditional whole genome sequencing (WGS) relies on deep coverage—typically 30x to 50x—to capture nearly every base pair in an individual's genome. In contrast, shallow whole genome sequencing, also known as low-pass WGS (LP-WGS), sequences the genome at a much lower depth, usually between 0.1x and 5x coverage. Although this reduced coverage can miss some rare variants, it remains highly effective for detecting common genetic variations across the genome. Importantly, LP-WGS dramatically lowers both the cost and turnaround time for sequencing while still delivering valuable genomic insights.
In addition, when compared to genotyping arrays that only assess pre-selected genetic variants, LP-WGS offers increased statistical power and a broader view of the genome. LP-WGS can achieve up to 99% accuracy in variant detection and requires minimal DNA input.
This repository offers a modular, SLURM-compatible pipeline for low-pass sequencing analysis using GATK and related tools. The workflow encompasses raw data quality control, reference indexing, alignment, per-sample BAM QC, coverage estimation, duplicate marking, base quality score recalibration (BQSR), variant calling (GVCF and joint genotyping), variant filtering (VQSR), VCF merging, compression and indexing, and genotype imputation using BEAGLE.
Core Steps & Modules
- Raw FASTQ QC (modules/00fastqqc)
- trim_galore: Run adapter trimming and FastQC to catch adapter contamination and base quality issues.
- MultiQC: Collate all fastqc reports.
Reference Indexing (modules/01_reference)
- index_reference.sh: BWA, samtools, and GATK dictionary/fai.
Alignment & Read Groups (modules/02_alignment)
- bwa_align.slurm: Align reads with BWA-MEM, output to BAM.
- Calculateavgdepth.slurm: produce sorted bam files and calculate average depth
- Calculatealignmentstats.slurm: Check basic alignment stats including the number of reads mapped and properly paired
- Extractrelevantstats: Extract relevant information from calculatealignmentstats.slurm and report as a tsv files.
Coverage Estimation (modules/04_coverage)
- calculate_depth.sh: Compute average and region-specific depth using samtools or mosdepth.
Post-Alignment Processing (modules/05_processing)
- dedup.sh: Mark duplicates with Picard/GATK MarkDuplicates.
- bqsr.sh: BaseRecalibrator (requires known-sites VCFs). Beware, only works for high confidence species (e.g. human/mouse)
- apply_bqsr.sh: ApplyBQSR to recalibrate BAM. Beware, only works for high confidence species (e.g. human/mouse)
- collectinsertsize:
- collect_metrics:
Variant Calling & Genotyping (modules/06variantcalling)
- haplotypecaller_gvcf.sh: GATK HaplotypeCaller in -ERC GVCF mode.
- genomicDBImport.sh: Create a joint variant db by chr
- genotype_gvcfs.sh: GenotypeGVCFs to produce cohort VCF.
- mergeindexjointvcf.slurm
- qc_plots.R: Custom functions to evaluate qc of variant calling
- rawvariantqc.slurm
- snp_VQSR.slurm
- apply_VQSR.slurm
Genotype Imputation (modules/09_imputation)
- beagle_impute.sh: Run BEAGLE with reference panel, genetic maps, and compute dosage R².
Running the tests
The optional tests use bats. After installing bats (for example with apt-get install bats), run:
bash
bats tests
This executes the test suite under the tests/ directory.
Owner
- Name: Arby Abood
- Login: aa9gj
- Kind: user
- Location: Charlottesville,VA
- Company: University of Virginia
- Repositories: 6
- Profile: https://github.com/aa9gj
Graduate student @cphg | big data enthusiast
GitHub Events
Total
- Push event: 8
- Pull request event: 4
- Create event: 2
Last Year
- Push event: 8
- Pull request event: 4
- Create event: 2