cnakepit
A Snakemake pipeline for copy number variant calling without normal tissue samples
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Keywords
Repository
A Snakemake pipeline for copy number variant calling without normal tissue samples
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
cnakepit 🐍
Definition
A snake pit is, in a literal sense, a hole filled with snakes. In idiomatic speech, "snake pits" are places of horror, torture and death in European legends and fairy tales. (Wikipedia)
In the field of bioinformatics, a cnakepit is now a Copy Number Alterations detection snaKEmake PIpeline for tumor-only Targeted sequencing data. In the truest sense of the word, this horrific pit is filled with snakes such as pythons, mambas and anacondas.
What is it?
This is a pipeline to (attempt to) call CNAs / CNVs by addressing the following challenges: - requires no matched normal sample - ideally, requires no reference sample or panel of normals - suited for short reads (Illumina) - detects (and filters for) somatic CNVs - suitable for panels (/targeted sequencing) - maintained - publicly used (>50 citations if not too new) - ideally infers tumor structure and estimates tumor characteristics such as tumor ploidy & purity
This pipeline was originally forked (and later unforked) from this private repo named FFPE-panel-pipeline which pre-processes raw reads and later maps them.
Installation
- Install conda (usually already installed on clusters)
- Install a conda environment with SnakeMake:
conda create -c conda-forge -c bioconda -n snakemake-vanilla snakemake=7.32.3
Note that I specified an older version of SnakeMake because, as of January '24, the newest version is incompatible with the current SnakeMake profile of the CUBI cluster and our pipeline_job.sh.
All other required tools and dependencies will be installed automatically by SnakeMake during the first run of the pipeline.
Usage
- Set your configuration in the config/config.yaml file. Most options are documented or self-explanatory.
- In the resources directory, add your panel design BED file and your samples (sheet) to the data subdirectory.
- Activate your conda environment with SnakeMake installed:
conda activate snakemake-vanilla. - Before actually running the pipeline, test your configuration with a dry run by adding
-nto the SnakeMake command. The pipeline can be run either by submitting the job script withsbatch pipeline_job.shto SLURM or by calling SnakeMake directly with a command likesnakemake --use-conda or snakemake --use-conda -c 8(preferably on a compute node).
General workflow
- Read preprocessing: QC of the raw data, trimming, QC of the trimmed data
- Mapping and QC
- Variant detection and filtering with Mutect2 / GATK
- CNV calling by CNVkit with different segmentation methods
- Tumor analysis and CNV correction by PureCN:
- either hierarchical clustering of CNVs or
- minimal re-segmentation with PSCBS
- For testing purposes, there is now an option to include a premade panel of normals file (RDS format) for PureCN, which will skip the coverage computation and segmentation by CNVkit and consequently any additional round of CNV calling with any other panel of normals.
mermaid
flowchart TD
subgraph Main part
A[raw reads] -->|QC & trimming| B[trimmed reads];
B -->|QC & mapping| C[mapped reads];
C -->|variant calling| D[variants];
D -->|filtering| E[germline variants];
D -->|filtering| X[somatic variants];
C -->|compute positional read depth| I[coverage data];
I -->|CNV calling by CNVkit| F[provisional CNVs];
E -->|B-allele analysis by CNVkit| F[provisional CNVs];
F -->|CNV calls correction by PureCN| G[final CNVs];
X -->|Tumor structure inference by PureCN| G[final CNVs];
end
subgraph Optional 2nd round of CNV calling w/ panel of normals
H -->|bias correction by CNVkit| J[provisional CNVs];
I -->|CNV calling by CNVkit| J[provisional CNVs];
E -->|B-allele analysis by CNVkit| J[provisional CNVs];
G -->|choose appropriate samples among cohort| H[panel of normals];
J -->|CNV calls correction by PureCN by PureCN| K[final CNVs];
X -->|Tumor structure inference by PureCN| K[final CNVs];
end
Owner
- Login: pedricolino
- Kind: user
- Repositories: 1
- Profile: https://github.com/pedricolino
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Cnakepit
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Cédric
family-names: Moris
email: cedric.moris@bih-charite.de
affiliation: Berlin Institute of Health at Charité
orcid: 'https://orcid.org/0009-0007-1978-1600'
- given-names: Nina
family-names: Okrožnik
email: nina.okroznik@charite.de
affiliation: Charité Comprehensive Cancer Center (CCCC)
orcid: 'https://orcid.org/0009-0005-8145-0090'
repository-code: 'https://github.com/pedricolino/cnakepit'
abstract: >-
A Snakemake pipeline for copy number variant calling
without normal tissue samples.
keywords:
- copy number changes
- panel sequencing
- tumor
- Snakemake
license: MIT
year: 2024
commit: 0343ea85f0cada08fc6b00a177e4ac6d03116886
GitHub Events
Total
- Watch event: 1
- Push event: 17
Last Year
- Watch event: 1
- Push event: 17