sv-gen
Snakemake-based workflow for generating artificial genomes with structural variants
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Keywords
Repository
Snakemake-based workflow for generating artificial genomes with structural variants
Basic Info
- Host: GitHub
- Owner: GooglingTheCancerGenome
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://research-software.nl/software/sv-gen
- Size: 22.5 MB
Statistics
- Stars: 7
- Watchers: 3
- Forks: 1
- Open Issues: 1
- Releases: 2
Topics
Metadata Files
README.md
sv-gen
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-gen is a Snakemake-based workflow to generate artificial short-read alignments based on a reference genome with(out) SVs. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.
Dependencies
- Python 3
- Conda - package/environment management system
- Snakemake - workflow management system
- Xenon CLI - command-line interface to compute and storage resources
- jq - command-line JSON processor (optional)
- YAtiML - library for YAML type inference and schema validation
The workflow (DAG) includes the following tools:
The software dependencies and versions can be found in the conda environment.yaml files (1, 2).
1. Clone this repo.
bash
git clone https://github.com/GooglingTheCancerGenome/sv-gen.git
cd sv-gen
2. Install dependencies.
```bash
download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
install Conda (respond by 'yes')
bash miniconda.sh
update Conda
conda update -y conda
install Mamba
conda install -n base -c conda-forge -y mamba
create a new environment with dependencies & activate it
mamba env create -n wf -f environment.yaml conda activate wf ```
3. Configure the workflow.
- config files:
-
analysis.yaml- analysis-specific settings -
environment.yaml- software dependencies and versions
-
4. Execute the workflow.
```bash cd workflow
'dry' run only checks I/O files
snakemake -np
run the workflow locally
snakemake --use-conda --cores ```
Submit jobs to Slurm/GridEngine-based cluster
bash
SCH=slurm # or gridengine
snakemake --use-conda --latency-wait 30 --jobs \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --max-run-time 5 --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&
Query job accounting information
bash
SCH=slurm # or gridengine
xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...
Owner
- Name: Googling the cancer genome
- Login: GooglingTheCancerGenome
- Kind: organization
- Location: Netherlands
- Website: https://www.esciencecenter.nl/projects/googling-the-cancer-genome/
- Repositories: 3
- Profile: https://github.com/GooglingTheCancerGenome
Software repositories of the Netherlands eScience Center project: Googling the cancer genome
Citation (CITATION.cff)
# YAML 1.2
# Metadata for citation of this software according to the CFF format
# (https://citation-file-format.github.io/)
---
authors:
-
affiliation: "Netherlands eScience Center"
family-names: Kuzniar
given-names: Arnold
orcid: "https://orcid.org/0000-0003-1711-7961"
-
affiliation: "University Medical Center Utrecht"
family-names: Santuari
given-names: Luca
orcid: "https://orcid.org/0000-0001-8784-2507"
cff-version: "1.0.3"
date-released: 2023-01-18
doi: 10.5281/zenodo.3725663
keywords:
- "bioinformatics"
- "structural variants"
- "cancer genomics"
- "whole genome sequencing"
- "workflow"
- "simulation"
- "high-performance computing"
- "HPC"
- "WGS"
- "FASTA"
- "BAM"
- "VCF"
- "BED"
license: Apache-2.0
message: "If you use this software, please cite it using these metadata"
repository-code: "https://github.com/GooglingTheCancerGenome/sv-gen"
title: sv-gen
version: "1.1.0"
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1