sv-callers

Snakemake-based workflow for detecting structural variants in genomic data

https://github.com/googlingthecancergenome/sv-callers

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

bioinformatics cancer-genomics germline-variants hpc-applications snakemake somatic-variants structural-variants sv-calling wgs workflow
Last synced: 6 months ago · JSON representation

Repository

Snakemake-based workflow for detecting structural variants in genomic data

Basic Info
Statistics
  • Stars: 80
  • Watchers: 3
  • Forks: 35
  • Open Issues: 7
  • Releases: 8
Topics
bioinformatics cancer-genomics germline-variants hpc-applications snakemake somatic-variants structural-variants sv-calling wgs workflow
Created almost 9 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License Citation Zenodo

README.md

sv-callers

DOI Published in PeerJ CI Codacy Badge Codacy Badge

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-callers is a Snakemake-based workflow that combines several state-of-the-art tools for detecting SVs in whole genome sequencing (WGS) data. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Dependencies

  • Python
  • Conda - package/environment management system
  • Snakemake - workflow management system
  • Xenon CLI - command-line interface to compute and storage resources
  • jq - command-line JSON processor (optional)
  • YAtiML - library for YAML type inference and schema validation

The workflow includes the following bioinformatics tools:

The software dependencies can be found in the conda environment files: [1],[2],[3].

1. Clone this repo.

bash git clone https://github.com/GooglingTheCancerGenome/sv-callers.git cd sv-callers

2. Install dependencies.

```bash

download Miniconda3 installer

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

install Conda (respond by 'yes')

bash miniconda.sh

update Conda

conda update -y conda

install Mamba

conda install -n base -c conda-forge -y mamba

create a new environment with dependencies & activate it

mamba env create -n wf -f environment.yaml conda activate wf ```

3. Configure the workflow.

  • config files:

    • analysis.yaml - analysis-specific settings (e.g., workflow mode, I/O files, SV callers, post-processing or resources used etc.)
    • samples.csv - list of (paired) samples
  • input files:

    • example data in workflow/data directory
    • reference genome in .fasta (incl. index files)
    • excluded regions in .bed (optional)
    • WGS samples in .bam (incl. index files)
  • output files:

    • (filtered) SVs per caller and merged calls in .vcf (incl. index files)

4. Execute the workflow.

bash cd workflow

Locally

```bash

'dry' run only checks I/O files

snakemake -np

'vanilla' run if echo_run set to 1 (default) in analysis.yaml,

it merely mimics the execution of SV callers by writing (dummy) VCF files;

SV calling if echo_run set to 0

snakemake --use-conda --jobs

```

Submit jobs to Slurm or GridEngine cluster

bash SCH=slurm # or gridengine snakemake --use-conda --latency-wait 30 --jobs \ --cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 1 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

Note: One sample or a tumor/normal pair generates in total 18 SV calling and post-processing jobs. See the workflow instance of single-sample (germline) or paired-sample (somatic) analysis.

To perform SV calling: - edit (default) parameters in analysis.yaml - set echo_run to 0 - choose between two workflow modes: single- (s) or paired-sample (p - default) - select one or more callers using enable_callers (default all)

  • use xenon CLI to set:

    • --max-run-time of workflow jobs (in minutes)
    • --temp-space (optional, in MB)
  • adjust compute requirements per SV caller according to the system used:

    • the number of threads,
    • the amount of memory(in MB),
    • the amount of temporary disk space or tmpspace (path in TMPDIR env variable) can be used for intermediate files by LUMPY and GRIDSS only.

Query job accounting information

bash SCH=slurm # or gridengine xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...

Owner

  • Name: Googling the cancer genome
  • Login: GooglingTheCancerGenome
  • Kind: organization
  • Location: Netherlands

Software repositories of the Netherlands eScience Center project: Googling the cancer genome

GitHub Events

Total
  • Watch event: 5
  • Push event: 1
  • Pull request event: 1
  • Fork event: 1
  • Create event: 1
Last Year
  • Watch event: 5
  • Push event: 1
  • Pull request event: 1
  • Fork event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • llbbl (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/ci.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • codacy/codacy-coverage-reporter-action master composite
  • docker/login-action v1 composite
test-requirements.txt pypi
  • codacy-coverage * test
  • pytest >=4.6 test
  • pytest-cov * test
  • snakemake ==6.15.3 test
  • tabulate ==0.8.10 test
  • yatiml ==0.7 test
environment.yaml conda
  • jq 1.6.*
  • pip
  • snakemake 6.15.3.*
  • tabulate 0.8.10.*
  • xenon-cli 3.0.5.*