cnakepit

A Snakemake pipeline for copy number variant calling without normal tissue samples

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary

Keywords

bioinformatics bwa cna cnv cnvkit conda copynumbervariation mutect2 panels purecn sequencing-data snakemake targeted-sequencing

Last synced: 6 months ago · JSON representation ·

Repository

A Snakemake pipeline for copy number variant calling without normal tissue samples

Basic Info

Host: GitHub
Owner: pedricolino
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 3.33 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

bioinformatics bwa cna cnv cnvkit conda copynumbervariation mutect2 panels purecn sequencing-data snakemake targeted-sequencing

Created about 2 years ago · Last pushed 9 months ago

Metadata Files

Readme License Citation

cnakepit 🐍

Definition

A snake pit is, in a literal sense, a hole filled with snakes. In idiomatic speech, "snake pits" are places of horror, torture and death in European legends and fairy tales. (Wikipedia)

In the field of bioinformatics, a cnakepit is now a Copy Number Alterations detection snaKEmake PIpeline for tumor-only Targeted sequencing data. In the truest sense of the word, this horrific pit is filled with snakes such as pythons, mambas and anacondas.

What is it?

This is a pipeline to (attempt to) call CNAs / CNVs by addressing the following challenges: - requires no matched normal sample - ideally, requires no reference sample or panel of normals - suited for short reads (Illumina) - detects (and filters for) somatic CNVs - suitable for panels (/targeted sequencing) - maintained - publicly used (>50 citations if not too new) - ideally infers tumor structure and estimates tumor characteristics such as tumor ploidy & purity

This pipeline was originally forked (and later unforked) from this private repo named FFPE-panel-pipeline which pre-processes raw reads and later maps them.

Installation

Install conda (usually already installed on clusters)
Install a conda environment with SnakeMake: conda create -c conda-forge -c bioconda -n snakemake-vanilla snakemake=7.32.3
Note that I specified an older version of SnakeMake because, as of January '24, the newest version is incompatible with the current SnakeMake profile of the CUBI cluster and our pipeline_job.sh.
All other required tools and dependencies will be installed automatically by SnakeMake during the first run of the pipeline.

Usage

Set your configuration in the config/config.yaml file. Most options are documented or self-explanatory.
In the resources directory, add your panel design BED file and your samples (sheet) to the data subdirectory.
Activate your conda environment with SnakeMake installed: conda activate snakemake-vanilla.
Before actually running the pipeline, test your configuration with a dry run by adding -n to the SnakeMake command. The pipeline can be run either by submitting the job script with sbatch pipeline_job.sh to SLURM or by calling SnakeMake directly with a command like snakemake --use-conda or snakemake --use-conda -c 8 (preferably on a compute node).

General workflow

Read preprocessing: QC of the raw data, trimming, QC of the trimmed data
Mapping and QC
Variant detection and filtering with Mutect2 / GATK
CNV calling by CNVkit with different segmentation methods
Tumor analysis and CNV correction by PureCN:
- either hierarchical clustering of CNVs or
- minimal re-segmentation with PSCBS
For testing purposes, there is now an option to include a premade panel of normals file (RDS format) for PureCN, which will skip the coverage computation and segmentation by CNVkit and consequently any additional round of CNV calling with any other panel of normals.

Owner

Login: pedricolino
Kind: user

Repositories: 1
Profile: https://github.com/pedricolino

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Cnakepit
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Cédric
    family-names: Moris
    email: cedric.moris@bih-charite.de
    affiliation: Berlin Institute of Health at Charité
    orcid: 'https://orcid.org/0009-0007-1978-1600'
  - given-names: Nina
    family-names: Okrožnik
    email: nina.okroznik@charite.de
    affiliation: Charité Comprehensive Cancer Center (CCCC)
    orcid: 'https://orcid.org/0009-0005-8145-0090'
repository-code: 'https://github.com/pedricolino/cnakepit'
abstract: >-
  A Snakemake pipeline for copy number variant calling
  without normal tissue samples.
keywords:
  - copy number changes
  - panel sequencing
  - tumor
  - Snakemake
license: MIT
year: 2024
commit: 0343ea85f0cada08fc6b00a177e4ac6d03116886

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science