consensusv-core

A tool for getting consensus of SVs.

https://github.com/sfglab/consensusv-core

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A tool for getting consensus of SVs.

Basic Info
  • Host: GitHub
  • Owner: SFGLab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 9.18 MB
Statistics
  • Stars: 7
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 11
Created over 5 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

ConsensuSV-pipeline

Table of Contents

What is ConsensuSV?

The tool designed for getting consensus out of multiple SV callers' results.

Important: for the completly automatised fastq-to-vcf (8 SV callers + SNP / Indel calling included) pipeline see: https://github.com/SFGLab/ConsensuSV-pipeline

Citation

If you use ConsensuSV in your research, we kindly ask you to cite the following publication:

@article{Chilinski_ConsensuSVfrom_the_whole-genome_2022, author = {Chiliński, Mateusz and Plewczynski, Dariusz}, doi = {10.1093/bioinformatics/btac709}, journal = {Bioinformatics}, title = {{ConsensuSV—from the whole-genome sequencing data to the complete variant list}}, year = {2022} }

Requirements

Requirements: * bcftools (https://samtools.github.io/bcftools/) in PATH

Parameters

Options:

Short option | Long option | Description -------------- | --------------- | --------------- -f | --svfolder | older containing folders of samples with raw outputs from SV callers (comma-separated). More information on the structure of the samples folder is shown below. -mod | --model | Model used for SV discovery (default: pretrained.model). -o | --output | Output file prefix (default: consensuSV). -m | --minoverlap | File with minimum numbers of SVs in the neighbourhood for the SV to be reported (default minoverlaps). -of | --outputfolder | Output folder (default: "output/"). -s | --samples | Samples to include. By default all in the svfolder. Comma-separated -c | --callers | Callers to include. By default all in the folders. Comma-separated. -t | --train | Creates new model. Requires truth.vcf to be present in all the sv folders. VCF file truth.vcf is preprocessed even if flag --nopreprocess is set. If the model is trained, it is required to rerun the program to get the consensus. -np | --nopreprocess | Flag used for skipping the preprocessing process - all the preprocessed files should be in temp/ folder.

Structure of the data folder

The samples should follow the rule seen in the following figure:

Implementation details

The workflow of the algorithm is presented in the following figure:

Comparison to gold-standard set

Examples

The example command used for the training of the neural network model:

shell python main.py -f /home/ConsensuSV/data/ -t The example command used for getting the consensus SVs (the model included in the package is trained on the 11 SV callers shown on the example sample folder structure): shell python main.py -f /home/ConsensuSV/data/ -o consensuSV

Owner

  • Name: SFGLab
  • Login: SFGLab
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "ConsensuSV-core"
authors:
  - family-names: "Chiliński"
    given-names: "Mateusz"
    orcid: "https://orcid.org/0000-0001-6641-8504"
  - family-names: "Plewczynski"
    given-names: "Dariusz"
    orcid: "https://orcid.org/0000-0002-3840-7610"
preferred-citation:
  type: article
  authors:
    - family-names: "Chiliński"
      given-names: "Mateusz"
      orcid: "https://orcid.org/0000-0001-6641-8504"
    - family-names: "Plewczynski"
      given-names: "Dariusz"
      orcid: "https://orcid.org/0000-0002-3840-7610"
  doi: "10.1093/bioinformatics/btac709"
  journal: "Bioinformatics"
#  month: 12
#  start: 1 # First page number
#  end: 10 # Last page number
  title: "ConsensuSV—from the whole-genome sequencing data to the complete variant list"
#  issue: 1
#  volume: 1
  year: 2022

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

docs/requirements.txt pypi
  • luigi ==3.0.3
  • readthedocs-sphinx-search ==0.1.1
  • sphinx ==4.2.0
  • sphinx-autoapi ==1.8.4
  • sphinx_rtd_theme ==1.0.0