cnakepit

A Snakemake pipeline for copy number variant calling without normal tissue samples

https://github.com/pedricolino/cnakepit

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords

bioinformatics bwa cna cnv cnvkit conda copynumbervariation mutect2 panels purecn sequencing-data snakemake targeted-sequencing
Last synced: 6 months ago · JSON representation ·

Repository

A Snakemake pipeline for copy number variant calling without normal tissue samples

Basic Info
  • Host: GitHub
  • Owner: pedricolino
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 3.33 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics bwa cna cnv cnvkit conda copynumbervariation mutect2 panels purecn sequencing-data snakemake targeted-sequencing
Created about 2 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

cnakepit 🐍

Definition

A snake pit is, in a literal sense, a hole filled with snakes. In idiomatic speech, "snake pits" are places of horror, torture and death in European legends and fairy tales. (Wikipedia)

In the field of bioinformatics, a cnakepit is now a Copy Number Alterations detection snaKEmake PIpeline for tumor-only Targeted sequencing data. In the truest sense of the word, this horrific pit is filled with snakes such as pythons, mambas and anacondas.

What is it?

This is a pipeline to (attempt to) call CNAs / CNVs by addressing the following challenges: - requires no matched normal sample - ideally, requires no reference sample or panel of normals - suited for short reads (Illumina) - detects (and filters for) somatic CNVs - suitable for panels (/targeted sequencing) - maintained - publicly used (>50 citations if not too new) - ideally infers tumor structure and estimates tumor characteristics such as tumor ploidy & purity

This pipeline was originally forked (and later unforked) from this private repo named FFPE-panel-pipeline which pre-processes raw reads and later maps them.

Installation

  1. Install conda (usually already installed on clusters)
  2. Install a conda environment with SnakeMake: conda create -c conda-forge -c bioconda -n snakemake-vanilla snakemake=7.32.3
    Note that I specified an older version of SnakeMake because, as of January '24, the newest version is incompatible with the current SnakeMake profile of the CUBI cluster and our pipeline_job.sh.
    All other required tools and dependencies will be installed automatically by SnakeMake during the first run of the pipeline.

Usage

  1. Set your configuration in the config/config.yaml file. Most options are documented or self-explanatory.
  2. In the resources directory, add your panel design BED file and your samples (sheet) to the data subdirectory.
  3. Activate your conda environment with SnakeMake installed: conda activate snakemake-vanilla.
  4. Before actually running the pipeline, test your configuration with a dry run by adding -n to the SnakeMake command. The pipeline can be run either by submitting the job script with sbatch pipeline_job.sh to SLURM or by calling SnakeMake directly with a command like snakemake --use-conda or snakemake --use-conda -c 8 (preferably on a compute node).

General workflow

  • Read preprocessing: QC of the raw data, trimming, QC of the trimmed data
  • Mapping and QC
  • Variant detection and filtering with Mutect2 / GATK
  • CNV calling by CNVkit with different segmentation methods
  • Tumor analysis and CNV correction by PureCN:
    • either hierarchical clustering of CNVs or
    • minimal re-segmentation with PSCBS
  • For testing purposes, there is now an option to include a premade panel of normals file (RDS format) for PureCN, which will skip the coverage computation and segmentation by CNVkit and consequently any additional round of CNV calling with any other panel of normals.

mermaid flowchart TD subgraph Main part A[raw reads] -->|QC & trimming| B[trimmed reads]; B -->|QC & mapping| C[mapped reads]; C -->|variant calling| D[variants]; D -->|filtering| E[germline variants]; D -->|filtering| X[somatic variants]; C -->|compute positional read depth| I[coverage data]; I -->|CNV calling by CNVkit| F[provisional CNVs]; E -->|B-allele analysis by CNVkit| F[provisional CNVs]; F -->|CNV calls correction by PureCN| G[final CNVs]; X -->|Tumor structure inference by PureCN| G[final CNVs]; end subgraph Optional 2nd round of CNV calling w/ panel of normals H -->|bias correction by CNVkit| J[provisional CNVs]; I -->|CNV calling by CNVkit| J[provisional CNVs]; E -->|B-allele analysis by CNVkit| J[provisional CNVs]; G -->|choose appropriate samples among cohort| H[panel of normals]; J -->|CNV calls correction by PureCN by PureCN| K[final CNVs]; X -->|Tumor structure inference by PureCN| K[final CNVs]; end

Owner

  • Login: pedricolino
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Cnakepit
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Cédric
    family-names: Moris
    email: cedric.moris@bih-charite.de
    affiliation: Berlin Institute of Health at Charité
    orcid: 'https://orcid.org/0009-0007-1978-1600'
  - given-names: Nina
    family-names: Okrožnik
    email: nina.okroznik@charite.de
    affiliation: Charité Comprehensive Cancer Center (CCCC)
    orcid: 'https://orcid.org/0009-0005-8145-0090'
repository-code: 'https://github.com/pedricolino/cnakepit'
abstract: >-
  A Snakemake pipeline for copy number variant calling
  without normal tissue samples.
keywords:
  - copy number changes
  - panel sequencing
  - tumor
  - Snakemake
license: MIT
year: 2024
commit: 0343ea85f0cada08fc6b00a177e4ac6d03116886

GitHub Events

Total
  • Watch event: 1
  • Push event: 17
Last Year
  • Watch event: 1
  • Push event: 17