rdscan

A snakemake workflow for regions of difference discovery in Mycobacterium tuberculosis complex (MTBC) samples

https://github.com/dbespiatykh/rdscan

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.0%) to scientific vocabulary

Keywords

bacterial-genome-analysis bioinformatics mycobacterium mycobacterium-tuberculosis mycobacterium-tuberculosis-complex snakemake structural-variants
Last synced: 6 months ago · JSON representation

Repository

A snakemake workflow for regions of difference discovery in Mycobacterium tuberculosis complex (MTBC) samples

Basic Info
  • Host: GitHub
  • Owner: dbespiatykh
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 79.4 MB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 4
Topics
bacterial-genome-analysis bioinformatics mycobacterium mycobacterium-tuberculosis mycobacterium-tuberculosis-complex snakemake structural-variants
Created almost 5 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

pipeline for MTBC putative regions of difference discovery

citation Snakemake Tests

Description

RDscan is a snakemake workflow to find deletions and putative regions of difference (RDs) in mycobacterium tuberculosis complex (MTBC) genomes, it is also capable to determine already known or user defined RDs.

Installation

The usage of this workflow is described in the Snakemake Workflow Catalog, alternatively it can be installed as described below.

Use the Conda package manager and BioConda channel to install RDscan.

If you do not have conda installed do the following:

```bash

Download Conda installer

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Set permissions

chmod -X Miniconda3-latest-Linux-x86_64.sh

Install

bash Miniconda3-latest-Linux-x86_64.sh ```

Set up channels:

bash conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge

Get RDscan snakemake workflow:

bash git clone https://github.com/dbespiatykh/RDscan.git

Install all required dependencies:

bash cd RDscan conda install -c conda-forge mamba mamba env create --file environment.yml

Usage

Rulegraph of the pipeline ![Rulegraph](img/dag.svg)


Activate RDscan environment:

bash conda activate RDscan

Run pipeline:

bash snakemake --conda-frontend mamba --use-conda -j {Number of cores}

It is recommended to use dry run if you are running pipeline for the first time, to see if everything is in working order, for this you can use -n flag:

bash snakemake -n

Output

Output in the results directory will contain four tables: RD_putative.tsv, RD_known.tsv, RD_known.xlsx, and RD_known.bin.tsv

Example of the RD_putative.tsv: Table containing all discovered putative RDs.

RD - Known RDs that intersects with deletion breakpoints; SIZE - Estimated size of predicted deletion.

Values in cells represent deletion length in the sample.

| CHROM | START | END | SIZE | RD | TYPE | ERR015582 | ERR017778 | ERR017782 | ERR019852 | | --------- | ------ | ------ | ---- | --- | ---- | --------- | --------- | --------- | --------- | | NC000962 | 333828 | 338580 | 5800 | | DEL | 7113 | 7084 | 7050 | | NC000962 | 340400 | 340645 | 245 | | DEL | | | | | | NC000962 | 350935 | 351175 | 238 | | DEL | | 300 | 204 | 240 | | NC000962 | 361769 | 362988 | 1391 | | DEL | 1833 | 1392 | 1833 | 1390 |

Example of the RD_known.tsv:

Table containing proportion of coverage in particular RDs.

| Sample | N-RD25tbA | N-RD25tbB | N-RD25bov/cap | N-RD25das | | --------- | ---------- | ---------- | ------------- | --------- | | ERR015582 | 0.883562 | 0.856164 | 0.856164 | 0.808219 | | ERR017778 | 0 | 0 | 0 | 0.41791 | | ERR017782 | 1.021277 | 1.042553 | 1.106383 | 0.978723 | | ERR019852 | 0 | 0 | 0 | 0.386364 |

Example of the RD_known.xlsx:

Same as the RD_known.tsv, but in a XLSX format with applied contiditional formatting.\ Conditional formatting corresponds with threshold value in a config.yml file.

Binary version of the RD_known.bin.tsv:

| Sample | N-RD25tbA | N-RD25tbB | N-RD25bov/cap | N-RD25das | | --------- | ---------- | ---------- | ------------- | --------- | | ERR015582 | 0 | 0 | 0 | 0 | | ERR017778 | 1 | 1 | 1 | 0 | | ERR017782 | 0 | 0 | 0 | 0 | | ERR019852 | 1 | 1 | 1 | 0 |

Citation

If you use RDscan for your research, please cite the pipeline:

D. Bespiatykh, J. Bespyatykh, I. Mokrousov, and E. Shitikov, A Comprehensive Map of Mycobacterium tuberculosis Complex Regions of Difference, mSphere, Volume 6, Issue 4, 21 July 2021, Page e00535-21, https://doi.org/10.1128/mSphere.00535-21

All references for the tools utilized by the RDscan can be found in the CITATIONS.md file.

License

MIT

Owner

  • Name: Dmitry Bespiatykh
  • Login: dbespiatykh
  • Kind: user

Bioinformatician

GitHub Events

Total
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 3
  • Push event: 4
Last Year
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 3
  • Push event: 4

Dependencies

.github/workflows/main.yml actions
  • actions/checkout v1 composite
  • actions/checkout v2 composite
  • github/super-linter v4 composite
  • snakemake/snakemake-github-action v1.23.0 composite
environment.yml pypi