nwgs_pipeline

https://github.com/vilhelmmagnuslab/nwgs_pipeline

Last synced: 8 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: VilhelmMagnusLab
License: mit
Language: Perl
Default Branch: main
Size: 70.4 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 8 months ago

Metadata Files

Readme Changelog License Citation

nWGS_pipeline: Nanopore Whole Genome Sequencing Pipeline

Overview

nWGS_pipeline is a comprehensive bioinformatics pipeline for analyzing Central Nervous System (CNS) samples using Oxford Nanopore sequencing data. It integrates multiple analyses including CNV detection, methylation profiling, structural variant calling, and MGMT promoter status determination.

Pipeline Schematic

The nWGS pipeline follows a modular architecture with three main Nextflow modules (runmodemergebam, runmodeepi2me and runmodeanalysis) that can be run independently or sequentially:

![nWGS Pipeline Schematic](nWGS.png)

Pipeline workflow showing the flow from Nanopore BAM files through Mergebam, Epi2me, and Analysis modules to final PDF reports.

Quick Start

Prerequisites

Docker (Desktop/Local) or Singularity/Apptainer (HPC)
Nextflow (auto-installed by setup scripts)

One-Command Setup & Run

For Docker (Desktop/Local): bash git clone https://github.com/VilhelmMagnusLab/nWGS_pipeline.git cd nWGS_pipeline chmod +x setup_docker.sh ./setup_docker.sh ./run_pipeline_docker.sh --run_mode_order --sample_id YOUR_SAMPLE_ID

For Singularity/Apptainer (HPC): bash git clone https://github.com/VilhelmMagnusLab/nWGS_pipeline.git cd nWGS_pipeline chmod +x setup_singularity.sh ./setup_singularity.sh ./run_pipeline_singularity.sh --run_mode_order --sample_id YOUR_SAMPLE_ID

Pipeline Modules

The pipeline consists of three main modules that can be run independently or sequentially:

1. Mergebam Pipeline

Merges multiple BAM files per sample
Extracts regions of interest using OCC.protein_coding.bed

2. Epi2me Pipeline

Three independent analysis types:

| Analysis | Tool | Purpose | Output | |----------|------|---------|---------| | Modified Base Calling | Modkit | DNA modifications (5mC, 5hmC) | *_wf_mods.bedmethyl.gz | | Structural Variants | Sniffles2 | Structural variant detection | *.vcf.gz | | Copy Number Variation | QDNAseq | CNV detection | *_segs.bed, *_bins.bed, *_segs.vcf |

3. Analysis Pipeline

MGMT methylation analysis using EPIC array sites
NanoDx neural network classification
Structural variant annotation with Svanna
CNV analysis with ACE tumor content determination
Comprehensive reporting (HTML, IGV snapshots, Circos plots, Markdown)

Container Systems

| Feature | Docker | Singularity/Apptainer | |---------|--------|----------------------| | Best for | Desktop/Local | HPC/Shared systems | | Setup Script | setup_docker.sh | setup_singularity.sh | | Run Script | run_pipeline_docker.sh | run_pipeline_singularity.sh |

All containers are automatically downloaded from vilhelmmagnuslab Docker Hub.

Usage Examples

Complete Pipeline (Recommended)

```bash

Docker

./runpipelinedocker.sh --runmodeorder --sample_id T001

Singularity/Apptainer

./runpipelinesingularity.sh --runmodeorder --sample_id T001 ```

Individual Modules

Docker Commands: ```bash

Mergebam only

./runpipelinedocker.sh --runmodemergebam

Epi2me analyses

./runpipelinedocker.sh --runmodeepi2me all # All Epi2me analyses ./runpipelinedocker.sh --runmodeepi2me modkit # Modified base calling only ./runpipelinedocker.sh --runmodeepi2me cnv # CNV analysis only ./runpipelinedocker.sh --runmodeepi2me sv # Structural variants only

Analysis modules

./runpipelinedocker.sh --runmodeanalysis all # All analyses ./runpipelinedocker.sh --runmodeanalysis mgmt # MGMT analysis only ./runpipelinedocker.sh --runmodeanalysis cnv # CNV analysis only ./runpipelinedocker.sh --runmodeanalysis svannasv # Svanna SV annotation only ./runpipelinedocker.sh --runmodeanalysis terp # TERTp promoter analysis only ./runpipelinedocker.sh --runmodeanalysis occ # # clair3 and claisrs-to annotation using occ region of interest bam file ./runpipelinedocker.sh --runmodeanalysis rmd # Markdown report only ```

Singularity/Apptainer Commands: ```bash

Mergebam only

./runpipelinesingularity.sh --runmodemergebam

Epi2me analyses

./runpipelinesingularity.sh --runmodeepi2me all # All Epi2me analyses ./runpipelinesingularity.sh --runmodeepi2me modkit # Modified base calling only ./runpipelinesingularity.sh --runmodeepi2me cnv # CNV analysis only ./runpipelinesingularity.sh --runmodeepi2me sv # Structural variants only

Analysis modules

./runpipelinesingularity.sh --runmodeanalysis all # All analyses ./runpipelinesingularity.sh --runmodeanalysis mgmt # MGMT analysis only ./runpipelinesingularity.sh --runmodeanalysis cnv # CNV analysis only ./runpipelinesingularity.sh --runmodeanalysis svannasv # Svanna SV annotation only ./runpipelinesingularity.sh --runmodeanalysis terp # TERT promoter analysis only ./runpipelinesingularity.sh --runmodeanalysis occ # clair3 and claisrs-to annotation using occ region of interest bam file ./runpipelinesingularity.sh --runmodeanalysis rmd # Markdown report only ```

Input Requirements

Sample ID File Format

```

For analysis pipeline (with tumor content)

sampleid1 0.75 # 75% tumor content sampleid2 # Auto-calculate with ACE

For mergebam pipeline (with flowcell)

sampleid1 flowcellid1 sampleid2 flowcellid2 ```

Directory Structure

/path/to/data/ reference/ # Reference files humandb/ # Annotation files testdata/ # Input data sample_ids.txt # Sample ID file single_bam_folder/ # BAM files results/ # Output (auto-created)

Required Reference Data

Reference Files Required

The following reference files must be downloaded and placed in the data/reference/ directory:

Analysis-specific files: - OCC.fusions.bed - Fusion genes - EPIC_sites_NEW.bed - Methylation sites
- MGMT_CpG_Island.hg38.bed - MGMT CpG islands - OCC.SNV.screening.bed - SNV screening regions (region of interest bed file) - TERTp_variants.bed - TERT promoter variants - human_GRCh38_trf.bed - Tandem repeat regions - Others file downloaded from Zenado should be put into data/reference/

Annotation databases (place in data/humandb/): - hg38_refGene.txt - RefGene annotation - hg38_refGeneMrna.fa - RefGene mRNA sequences - hg38_clinvar_20240611.txt - ClinVar annotations - hg38_cosmic100coding2024.txt - Cosmic annotations

svanna databases (place in data/reference/): - svanna-data.zip - svanna database need to be unzip after download and place into the reference folder or the database can be downloaded from (https://github.com/monarch-initiative/SvAnna)

nanoDX script and files (place in data/reference/):

The nanoDX folder in the pipeline root should be moved into the data/reference folder and copy the following downloded files into nanoDx/static/: - Capper_et_al.h5 (model file) - Capper_et_al.h5.md5 (checksum) - Capper_et_al_NN.pkl (neural network)

Download files from Zenodo and place them in the appropriate directories.

Directory Structure Setup

After downloading the reference files, your directory structure should look like this:

``` data/ reference/ # Reference files GRCh38.fa GRCh38.fa.fai gencode.v48.annotation.gff3 OCC.fusions.bed EPICsitesNEW.bed MGMTCpGIsland.hg38.bed OCC.SNV.screening.bed TERTpvariants.bed humanGRCh38_trf.bed etc

humandb/ # Annotation databases hg38refGene.txt hg38refGeneMrna.fa hg38clinvar20240611.txt hg38cosmic100coding2024.txt testdata/ # Your input data sampleids.txt bams/ # BAM files results/ # Output (auto-created) ```

External Downloads Required

Place this file in data/reference/: - Gencode annotation: Gencode v48 - Download and place as: gencode.v48.annotation.gff3

ACE Tumor Content Calculation

The pipeline intelligently handles tumor content: - Provided value: Use directly if specified in sample ID file - Auto-calculation: ACE analyzes copy number profiles to estimate tumor cellularity - Multiple estimates: ACE provides several estimates and selects the best fit - Results: Saved in ${sample_id}_ace_results/threshold_value.txt

Output Structure

results/ mergebam/ merge_bam/ # Merged BAM files occ_bam/ # Regions of interest BAMs epi2me/ episv/ # Structural variants modkit/ # Modified base calling epicnv/ # Copy number variations analysis/ cnv/ # CNV analysis with ACE sv/ # Structural variant annotation methylation/ # MGMT methylation analysis reports/ # Comprehensive reports

Report Generation

Standard Report Generation

PDF reports are automatically generated when running the pipeline with the following modes: - --run_mode_analysis rmd - Generate reports only - --run_mode_analysis all - Run all analyses and generate reports - --run_mode_order - Run complete pipeline sequentially and generate reports

The reports are automatically created in the results/report/ directory with the name {sample_id}_markdown_pipeline_report.pdf.

Additional Report Generation

The generate_report.sh script is provided for additional report generation in cases where: - You want to regenerate reports after re-running specific processes - You need to create reports for samples that were processed separately - You need to generate reports after the pipeline has already completed

Configuration

Path Configuration

Update the base path in all configuration files to point to your data directory:

groovy // conf/analysis.config, conf/epi2me.config, conf/mergebam.config params { path = "/path/to/your/data/directory" // Update this to your data directory }

Container Configuration

Choose your preferred container engine:

For Docker: - Uncomment Docker containers in configuration files - Comment out Singularity/Apptainer containers - Run: ./setup_docker.sh

For Singularity/Apptainer: - Use default Singularity/Apptainer containers - Run: ./setup_singularity.sh

Quick Setup Guide

Download reference files from Zenodo
Place files in appropriate directories (data/reference/ and data/humandb/)
Update paths in configuration files (conf/*.config)
Choose container engine (Docker or Singularity/Apptainer)
Run setup script: ```bash # For Docker ./setup_docker.sh

# For Singularity/Apptainer
./setupsingularity.sh 6. **Test the pipeline**:bash # For Docker ./testpipeline_docker.sh

# For Singularity/Apptainer ./testpipelinesingularity.sh ```

Work Directory Customization

You can specify a custom temporary work directory using the -w flag. This is useful for: - Managing disk space on different storage locations - Avoiding permission issues - Organizing temporary files

Example: ```bash

Docker

./runpipelinedocker.sh --runmodeanalysis tertp -w /path/to/your/work/dir

Singularity/Apptainer

./runpipelinesingularity.sh --runmodeanalysis tertp -w /home/chbope/extension/trash/tmp ```

Note: The -w flag sets Nextflow's work directory where temporary files and intermediate results are stored during pipeline execution. By default nextflow create a folder work in the working directory.

Log Output Customization

You can specify a custom log directory using the --log-dir flag.

Example: ```bash

Docker

./runpipelinedocker.sh --runmodeanalysis mgmt --log-dir /path/to/logs

Singularity/Apptainer

./runpipelinesingularity.sh --runmodeanalysis mgmt --log-dir /path/to/logs ```

Note: Logs include execution reports, timelines, traces, and Nextflow logs, automatically organized by sample ID.

Troubleshooting

Common Issues

Container engine conflict: Ensure only one container system is enabled
Missing reference files: Download required external files
Permission issues: Check container and file permissions

Verification Commands

```bash

Check containers

docker images | grep vilhelmmagnuslab # Docker ls -la containers/*.sif # Singularity

Test pipeline

./testpipelinedocker.sh # Docker ./testpipelinesingularity.sh # Singularity ```

Support

Documentation: DOCKER_SETUP.md, SINGULARITY_SETUP.md
Issues: GitHub Issues
Contact:
- Christian Domilongo Bope (chbope@ous-hf.no / christianbope@gmail.com)
- Skarphedinn Halldorsson (skahal@ous-hf.no / skabbi@gmail.com)
- Richard Nagymihaly (ricnag@ous-hf.no)

Citation

If you use this pipeline in your research, please cite: [Citation details to be added]

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This nanopore whole genome sequencing (nWGS) pipleine is a research tool currently under development. It has not been clinically validated in sufficiently large cohorts. Interpretation and implementation of the results in a clinical setting is in the sole responsibility of the treating physician.

Owner

Login: VilhelmMagnusLab
Kind: user

Repositories: 1
Profile: https://github.com/VilhelmMagnusLab

Citation (CITATIONS.md)

# wf-human-gbm: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

<!-- TODO nf-core: Add citation for all tools used in your pipeline -->

* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

## Data

<!-- TODO nf-core: Add citation for reference data used in your pipeline -->

nwgs_pipeline

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

nWGS_pipeline: Nanopore Whole Genome Sequencing Pipeline

Overview

Pipeline Schematic

Quick Start

Prerequisites

One-Command Setup & Run

Pipeline Modules

1. Mergebam Pipeline

2. Epi2me Pipeline

3. Analysis Pipeline

Container Systems

Usage Examples

Complete Pipeline (Recommended)

Docker

Singularity/Apptainer

Individual Modules

Mergebam only

Epi2me analyses

Analysis modules

Mergebam only

Epi2me analyses

Analysis modules

Input Requirements

Sample ID File Format

For analysis pipeline (with tumor content)

For mergebam pipeline (with flowcell)

Directory Structure

Required Reference Data

Reference Files Required

Directory Structure Setup

External Downloads Required

ACE Tumor Content Calculation

Output Structure

Report Generation

Standard Report Generation

Additional Report Generation

Configuration

Path Configuration

Container Configuration

Quick Setup Guide

Work Directory Customization

Docker

Singularity/Apptainer

Log Output Customization

Docker

Singularity/Apptainer

Troubleshooting

Common Issues

Verification Commands

Check containers

Test pipeline

Support

Citation

License

Disclaimer

Owner

Citation (CITATIONS.md)

GitHub Events

Total

Last Year