nwgs_pipeline
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: VilhelmMagnusLab
- License: mit
- Language: Perl
- Default Branch: main
- Size: 70.4 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
nWGS_pipeline: Nanopore Whole Genome Sequencing Pipeline
Overview
nWGS_pipeline is a comprehensive bioinformatics pipeline for analyzing Central Nervous System (CNS) samples using Oxford Nanopore sequencing data. It integrates multiple analyses including CNV detection, methylation profiling, structural variant calling, and MGMT promoter status determination.
Pipeline Schematic
The nWGS pipeline follows a modular architecture with three main Nextflow modules (runmodemergebam, runmodeepi2me and runmodeanalysis) that can be run independently or sequentially:
Pipeline workflow showing the flow from Nanopore BAM files through Mergebam, Epi2me, and Analysis modules to final PDF reports.
Quick Start
Prerequisites
- Docker (Desktop/Local) or Singularity/Apptainer (HPC)
- Nextflow (auto-installed by setup scripts)
One-Command Setup & Run
For Docker (Desktop/Local):
bash
git clone https://github.com/VilhelmMagnusLab/nWGS_pipeline.git
cd nWGS_pipeline
chmod +x setup_docker.sh
./setup_docker.sh
./run_pipeline_docker.sh --run_mode_order --sample_id YOUR_SAMPLE_ID
For Singularity/Apptainer (HPC):
bash
git clone https://github.com/VilhelmMagnusLab/nWGS_pipeline.git
cd nWGS_pipeline
chmod +x setup_singularity.sh
./setup_singularity.sh
./run_pipeline_singularity.sh --run_mode_order --sample_id YOUR_SAMPLE_ID
Pipeline Modules
The pipeline consists of three main modules that can be run independently or sequentially:
1. Mergebam Pipeline
- Merges multiple BAM files per sample
- Extracts regions of interest using OCC.protein_coding.bed
2. Epi2me Pipeline
Three independent analysis types:
| Analysis | Tool | Purpose | Output |
|----------|------|---------|---------|
| Modified Base Calling | Modkit | DNA modifications (5mC, 5hmC) | *_wf_mods.bedmethyl.gz |
| Structural Variants | Sniffles2 | Structural variant detection | *.vcf.gz |
| Copy Number Variation | QDNAseq | CNV detection | *_segs.bed, *_bins.bed, *_segs.vcf |
3. Analysis Pipeline
- MGMT methylation analysis using EPIC array sites
- NanoDx neural network classification
- Structural variant annotation with Svanna
- CNV analysis with ACE tumor content determination
- Comprehensive reporting (HTML, IGV snapshots, Circos plots, Markdown)
Container Systems
| Feature | Docker | Singularity/Apptainer |
|---------|--------|----------------------|
| Best for | Desktop/Local | HPC/Shared systems |
| Setup Script | setup_docker.sh | setup_singularity.sh |
| Run Script | run_pipeline_docker.sh | run_pipeline_singularity.sh |
All containers are automatically downloaded from vilhelmmagnuslab Docker Hub.
Usage Examples
Complete Pipeline (Recommended)
```bash
Docker
./runpipelinedocker.sh --runmodeorder --sample_id T001
Singularity/Apptainer
./runpipelinesingularity.sh --runmodeorder --sample_id T001 ```
Individual Modules
Docker Commands: ```bash
Mergebam only
./runpipelinedocker.sh --runmodemergebam
Epi2me analyses
./runpipelinedocker.sh --runmodeepi2me all # All Epi2me analyses ./runpipelinedocker.sh --runmodeepi2me modkit # Modified base calling only ./runpipelinedocker.sh --runmodeepi2me cnv # CNV analysis only ./runpipelinedocker.sh --runmodeepi2me sv # Structural variants only
Analysis modules
./runpipelinedocker.sh --runmodeanalysis all # All analyses ./runpipelinedocker.sh --runmodeanalysis mgmt # MGMT analysis only ./runpipelinedocker.sh --runmodeanalysis cnv # CNV analysis only ./runpipelinedocker.sh --runmodeanalysis svannasv # Svanna SV annotation only ./runpipelinedocker.sh --runmodeanalysis terp # TERTp promoter analysis only ./runpipelinedocker.sh --runmodeanalysis occ # # clair3 and claisrs-to annotation using occ region of interest bam file ./runpipelinedocker.sh --runmodeanalysis rmd # Markdown report only ```
Singularity/Apptainer Commands: ```bash
Mergebam only
./runpipelinesingularity.sh --runmodemergebam
Epi2me analyses
./runpipelinesingularity.sh --runmodeepi2me all # All Epi2me analyses ./runpipelinesingularity.sh --runmodeepi2me modkit # Modified base calling only ./runpipelinesingularity.sh --runmodeepi2me cnv # CNV analysis only ./runpipelinesingularity.sh --runmodeepi2me sv # Structural variants only
Analysis modules
./runpipelinesingularity.sh --runmodeanalysis all # All analyses ./runpipelinesingularity.sh --runmodeanalysis mgmt # MGMT analysis only ./runpipelinesingularity.sh --runmodeanalysis cnv # CNV analysis only ./runpipelinesingularity.sh --runmodeanalysis svannasv # Svanna SV annotation only ./runpipelinesingularity.sh --runmodeanalysis terp # TERT promoter analysis only ./runpipelinesingularity.sh --runmodeanalysis occ # clair3 and claisrs-to annotation using occ region of interest bam file ./runpipelinesingularity.sh --runmodeanalysis rmd # Markdown report only ```
Input Requirements
Sample ID File Format
```
For analysis pipeline (with tumor content)
sampleid1 0.75 # 75% tumor content sampleid2 # Auto-calculate with ACE
For mergebam pipeline (with flowcell)
sampleid1 flowcellid1 sampleid2 flowcellid2 ```
Directory Structure
/path/to/data/
reference/ # Reference files
humandb/ # Annotation files
testdata/ # Input data
sample_ids.txt # Sample ID file
single_bam_folder/ # BAM files
results/ # Output (auto-created)
Required Reference Data
Reference Files Required
The following reference files must be downloaded and placed in the data/reference/ directory:
Analysis-specific files:
- OCC.fusions.bed - Fusion genes
- EPIC_sites_NEW.bed - Methylation sites
- MGMT_CpG_Island.hg38.bed - MGMT CpG islands
- OCC.SNV.screening.bed - SNV screening regions (region of interest bed file)
- TERTp_variants.bed - TERT promoter variants
- human_GRCh38_trf.bed - Tandem repeat regions
- Others file downloaded from Zenado should be put into data/reference/
Annotation databases (place in data/humandb/):
- hg38_refGene.txt - RefGene annotation
- hg38_refGeneMrna.fa - RefGene mRNA sequences
- hg38_clinvar_20240611.txt - ClinVar annotations
- hg38_cosmic100coding2024.txt - Cosmic annotations
svanna databases (place in data/reference/):
- svanna-data.zip - svanna database need to be unzip after download and place into the reference folder or the database can be downloaded from (https://github.com/monarch-initiative/SvAnna)
nanoDX script and files (place in data/reference/):
The nanoDX folder in the pipeline root should be moved into the data/reference folder and copy the following downloded files into nanoDx/static/:
- Capper_et_al.h5 (model file)
- Capper_et_al.h5.md5 (checksum)
- Capper_et_al_NN.pkl (neural network)
Download files from Zenodo and place them in the appropriate directories.
Directory Structure Setup
After downloading the reference files, your directory structure should look like this:
``` data/ reference/ # Reference files GRCh38.fa GRCh38.fa.fai gencode.v48.annotation.gff3 OCC.fusions.bed EPICsitesNEW.bed MGMTCpGIsland.hg38.bed OCC.SNV.screening.bed TERTpvariants.bed humanGRCh38_trf.bed etc
humandb/ # Annotation databases hg38refGene.txt hg38refGeneMrna.fa hg38clinvar20240611.txt hg38cosmic100coding2024.txt testdata/ # Your input data sampleids.txt bams/ # BAM files results/ # Output (auto-created) ```
External Downloads Required
Place this file in data/reference/:
- Gencode annotation: Gencode v48
- Download and place as: gencode.v48.annotation.gff3
ACE Tumor Content Calculation
The pipeline intelligently handles tumor content:
- Provided value: Use directly if specified in sample ID file
- Auto-calculation: ACE analyzes copy number profiles to estimate tumor cellularity
- Multiple estimates: ACE provides several estimates and selects the best fit
- Results: Saved in ${sample_id}_ace_results/threshold_value.txt
Output Structure
results/
mergebam/
merge_bam/ # Merged BAM files
occ_bam/ # Regions of interest BAMs
epi2me/
episv/ # Structural variants
modkit/ # Modified base calling
epicnv/ # Copy number variations
analysis/
cnv/ # CNV analysis with ACE
sv/ # Structural variant annotation
methylation/ # MGMT methylation analysis
reports/ # Comprehensive reports
Report Generation
Standard Report Generation
PDF reports are automatically generated when running the pipeline with the following modes:
- --run_mode_analysis rmd - Generate reports only
- --run_mode_analysis all - Run all analyses and generate reports
- --run_mode_order - Run complete pipeline sequentially and generate reports
The reports are automatically created in the results/report/ directory with the name {sample_id}_markdown_pipeline_report.pdf.
Additional Report Generation
The generate_report.sh script is provided for additional report generation in cases where:
- You want to regenerate reports after re-running specific processes
- You need to create reports for samples that were processed separately
- You need to generate reports after the pipeline has already completed
Configuration
Path Configuration
Update the base path in all configuration files to point to your data directory:
groovy
// conf/analysis.config, conf/epi2me.config, conf/mergebam.config
params {
path = "/path/to/your/data/directory" // Update this to your data directory
}
Container Configuration
Choose your preferred container engine:
For Docker:
- Uncomment Docker containers in configuration files
- Comment out Singularity/Apptainer containers
- Run: ./setup_docker.sh
For Singularity/Apptainer:
- Use default Singularity/Apptainer containers
- Run: ./setup_singularity.sh
Quick Setup Guide
- Download reference files from Zenodo
- Place files in appropriate directories (
data/reference/anddata/humandb/) - Update paths in configuration files (
conf/*.config) - Choose container engine (Docker or Singularity/Apptainer)
- Run setup script: ```bash # For Docker ./setup_docker.sh
# For Singularity/Apptainer
./setupsingularity.sh
6. **Test the pipeline**:
bash
# For Docker
./testpipeline_docker.sh
# For Singularity/Apptainer ./testpipelinesingularity.sh ```
Work Directory Customization
You can specify a custom temporary work directory using the -w flag. This is useful for:
- Managing disk space on different storage locations
- Avoiding permission issues
- Organizing temporary files
Example: ```bash
Docker
./runpipelinedocker.sh --runmodeanalysis tertp -w /path/to/your/work/dir
Singularity/Apptainer
./runpipelinesingularity.sh --runmodeanalysis tertp -w /home/chbope/extension/trash/tmp ```
Note: The -w flag sets Nextflow's work directory where temporary files and intermediate results are stored during pipeline execution. By default nextflow create a folder work in the working directory.
Log Output Customization
You can specify a custom log directory using the --log-dir flag.
Example: ```bash
Docker
./runpipelinedocker.sh --runmodeanalysis mgmt --log-dir /path/to/logs
Singularity/Apptainer
./runpipelinesingularity.sh --runmodeanalysis mgmt --log-dir /path/to/logs ```
Note: Logs include execution reports, timelines, traces, and Nextflow logs, automatically organized by sample ID.
Troubleshooting
Common Issues
- Container engine conflict: Ensure only one container system is enabled
- Missing reference files: Download required external files
- Permission issues: Check container and file permissions
Verification Commands
```bash
Check containers
docker images | grep vilhelmmagnuslab # Docker ls -la containers/*.sif # Singularity
Test pipeline
./testpipelinedocker.sh # Docker ./testpipelinesingularity.sh # Singularity ```
Support
- Documentation: DOCKER_SETUP.md, SINGULARITY_SETUP.md
- Issues: GitHub Issues
- Contact:
- Christian Domilongo Bope (chbope@ous-hf.no / christianbope@gmail.com)
- Skarphedinn Halldorsson (skahal@ous-hf.no / skabbi@gmail.com)
- Richard Nagymihaly (ricnag@ous-hf.no)
Citation
If you use this pipeline in your research, please cite:
[Citation details to be added]
License
This project is licensed under the MIT License - see the LICENSE file for details.
Disclaimer
This nanopore whole genome sequencing (nWGS) pipleine is a research tool currently under development. It has not been clinically validated in sufficiently large cohorts. Interpretation and implementation of the results in a clinical setting is in the sole responsibility of the treating physician.
Owner
- Login: VilhelmMagnusLab
- Kind: user
- Repositories: 1
- Profile: https://github.com/VilhelmMagnusLab
Citation (CITATIONS.md)
# wf-human-gbm: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools <!-- TODO nf-core: Add citation for all tools used in your pipeline --> * [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) ## Data <!-- TODO nf-core: Add citation for reference data used in your pipeline -->
GitHub Events
Total
- Push event: 11
- Public event: 1
Last Year
- Push event: 11
- Public event: 1