sv-gen

Snakemake-based workflow for generating artificial genomes with structural variants

https://github.com/googlingthecancergenome/sv-gen

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords

bioinformatics cancer-genomics hpc-applications simulator snakemake structural-variants wgs workflow
Last synced: 6 months ago · JSON representation ·

Repository

Snakemake-based workflow for generating artificial genomes with structural variants

Basic Info
Statistics
  • Stars: 7
  • Watchers: 3
  • Forks: 1
  • Open Issues: 1
  • Releases: 2
Topics
bioinformatics cancer-genomics hpc-applications simulator snakemake structural-variants wgs workflow
Created almost 8 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation Zenodo

README.md

sv-gen

DOI CI Codacy Badge Codacy Badge

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-gen is a Snakemake-based workflow to generate artificial short-read alignments based on a reference genome with(out) SVs. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Dependencies

  • Python 3
  • Conda - package/environment management system
  • Snakemake - workflow management system
  • Xenon CLI - command-line interface to compute and storage resources
  • jq - command-line JSON processor (optional)
  • YAtiML - library for YAML type inference and schema validation

The workflow (DAG) includes the following tools:

The software dependencies and versions can be found in the conda environment.yaml files (1, 2).

1. Clone this repo.

bash git clone https://github.com/GooglingTheCancerGenome/sv-gen.git cd sv-gen

2. Install dependencies.

```bash

download Miniconda3 installer

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

install Conda (respond by 'yes')

bash miniconda.sh

update Conda

conda update -y conda

install Mamba

conda install -n base -c conda-forge -y mamba

create a new environment with dependencies & activate it

mamba env create -n wf -f environment.yaml conda activate wf ```

3. Configure the workflow.

4. Execute the workflow.

```bash cd workflow

'dry' run only checks I/O files

snakemake -np

run the workflow locally

snakemake --use-conda --cores ```

Submit jobs to Slurm/GridEngine-based cluster

bash SCH=slurm # or gridengine snakemake --use-conda --latency-wait 30 --jobs \ --cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --max-run-time 5 --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

Query job accounting information

bash SCH=slurm # or gridengine xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...

Owner

  • Name: Googling the cancer genome
  • Login: GooglingTheCancerGenome
  • Kind: organization
  • Location: Netherlands

Software repositories of the Netherlands eScience Center project: Googling the cancer genome

Citation (CITATION.cff)

# YAML 1.2
# Metadata for citation of this software according to the CFF format
# (https://citation-file-format.github.io/)
---
authors:
  -
    affiliation: "Netherlands eScience Center"
    family-names: Kuzniar
    given-names: Arnold
    orcid: "https://orcid.org/0000-0003-1711-7961"
  -
    affiliation: "University Medical Center Utrecht"
    family-names: Santuari
    given-names: Luca
    orcid: "https://orcid.org/0000-0001-8784-2507"

cff-version: "1.0.3"
date-released: 2023-01-18
doi: 10.5281/zenodo.3725663
keywords:
  - "bioinformatics"
  - "structural variants"
  - "cancer genomics"
  - "whole genome sequencing"
  - "workflow"
  - "simulation"
  - "high-performance computing"
  - "HPC"
  - "WGS"
  - "FASTA"
  - "BAM"
  - "VCF"
  - "BED"
license: Apache-2.0
message: "If you use this software, please cite it using these metadata"
repository-code: "https://github.com/GooglingTheCancerGenome/sv-gen"
title: sv-gen
version: "1.1.0"

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1