sv-callers
Snakemake-based workflow for detecting structural variants in genomic data
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Keywords
Repository
Snakemake-based workflow for detecting structural variants in genomic data
Basic Info
- Host: GitHub
- Owner: GooglingTheCancerGenome
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://research-software.nl/software/sv-callers
- Size: 210 MB
Statistics
- Stars: 80
- Watchers: 3
- Forks: 35
- Open Issues: 7
- Releases: 8
Topics
Metadata Files
README.md
sv-callers
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-callers is a Snakemake-based workflow that combines several state-of-the-art tools for detecting SVs in whole genome sequencing (WGS) data. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.
Dependencies
- Python
- Conda - package/environment management system
- Snakemake - workflow management system
- Xenon CLI - command-line interface to compute and storage resources
- jq - command-line JSON processor (optional)
- YAtiML - library for YAML type inference and schema validation
The workflow includes the following bioinformatics tools:
The software dependencies can be found in the conda environment files: [1],[2],[3].
1. Clone this repo.
bash
git clone https://github.com/GooglingTheCancerGenome/sv-callers.git
cd sv-callers
2. Install dependencies.
```bash
download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
install Conda (respond by 'yes')
bash miniconda.sh
update Conda
conda update -y conda
install Mamba
conda install -n base -c conda-forge -y mamba
create a new environment with dependencies & activate it
mamba env create -n wf -f environment.yaml conda activate wf ```
3. Configure the workflow.
config files:
-
analysis.yaml- analysis-specific settings (e.g., workflow mode, I/O files, SV callers, post-processing or resources used etc.) -
samples.csv- list of (paired) samples
-
input files:
- example data in
workflow/datadirectory - reference genome in
.fasta(incl. index files) - excluded regions in
.bed(optional) - WGS samples in
.bam(incl. index files)
- example data in
output files:
- (filtered) SVs per caller and merged calls in
.vcf(incl. index files)
- (filtered) SVs per caller and merged calls in
4. Execute the workflow.
bash
cd workflow
Locally
```bash
'dry' run only checks I/O files
snakemake -np
'vanilla' run if echo_run set to 1 (default) in analysis.yaml,
it merely mimics the execution of SV callers by writing (dummy) VCF files;
SV calling if echo_run set to 0
snakemake --use-conda --jobs
```
Submit jobs to Slurm or GridEngine cluster
bash
SCH=slurm # or gridengine
snakemake --use-conda --latency-wait 30 --jobs \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 1 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&
Note: One sample or a tumor/normal pair generates in total 18 SV calling and post-processing jobs. See the workflow instance of single-sample (germline) or paired-sample (somatic) analysis.
To perform SV calling:
- edit (default) parameters in analysis.yaml
- set echo_run to 0
- choose between two workflow modes: single- (s) or paired-sample (p - default)
- select one or more callers using enable_callers (default all)
use
xenonCLI to set:-
--max-run-timeof workflow jobs (in minutes) -
--temp-space(optional, in MB)
-
adjust compute requirements per SV caller according to the system used:
- the number of
threads, - the amount of
memory(in MB), - the amount of temporary disk space or
tmpspace(path inTMPDIRenv variable) can be used for intermediate files by LUMPY and GRIDSS only.
- the number of
Query job accounting information
bash
SCH=slurm # or gridengine
xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...
Owner
- Name: Googling the cancer genome
- Login: GooglingTheCancerGenome
- Kind: organization
- Location: Netherlands
- Website: https://www.esciencecenter.nl/projects/googling-the-cancer-genome/
- Repositories: 3
- Profile: https://github.com/GooglingTheCancerGenome
Software repositories of the Netherlands eScience Center project: Googling the cancer genome
GitHub Events
Total
- Watch event: 5
- Push event: 1
- Pull request event: 1
- Fork event: 1
- Create event: 1
Last Year
- Watch event: 5
- Push event: 1
- Pull request event: 1
- Fork event: 1
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- llbbl (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-python v1 composite
- codacy/codacy-coverage-reporter-action master composite
- docker/login-action v1 composite
- codacy-coverage * test
- pytest >=4.6 test
- pytest-cov * test
- snakemake ==6.15.3 test
- tabulate ==0.8.10 test
- yatiml ==0.7 test
- jq 1.6.*
- pip
- snakemake 6.15.3.*
- tabulate 0.8.10.*
- xenon-cli 3.0.5.*