Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
A tool for getting consensus of SVs.
Basic Info
- Host: GitHub
- Owner: SFGLab
- License: mit
- Language: Python
- Default Branch: main
- Size: 9.18 MB
Statistics
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 11
Metadata Files
README.md
ConsensuSV-pipeline
Table of Contents
- What is ConsensuSV?
- Citation
- Requirements
- Parameters
- Structure of the data folder
- Comparison to gold-standard set
- Examples
What is ConsensuSV?
The tool designed for getting consensus out of multiple SV callers' results.
Important: for the completly automatised fastq-to-vcf (8 SV callers + SNP / Indel calling included) pipeline see: https://github.com/SFGLab/ConsensuSV-pipeline
Citation
If you use ConsensuSV in your research, we kindly ask you to cite the following publication:
@article{Chilinski_ConsensuSVfrom_the_whole-genome_2022,
author = {Chiliński, Mateusz and Plewczynski, Dariusz},
doi = {10.1093/bioinformatics/btac709},
journal = {Bioinformatics},
title = {{ConsensuSV—from the whole-genome sequencing data to the complete variant list}},
year = {2022}
}
Requirements
Requirements: * bcftools (https://samtools.github.io/bcftools/) in PATH
Parameters
Options:
Short option | Long option | Description -------------- | --------------- | --------------- -f | --svfolder | older containing folders of samples with raw outputs from SV callers (comma-separated). More information on the structure of the samples folder is shown below. -mod | --model | Model used for SV discovery (default: pretrained.model). -o | --output | Output file prefix (default: consensuSV). -m | --minoverlap | File with minimum numbers of SVs in the neighbourhood for the SV to be reported (default minoverlaps). -of | --outputfolder | Output folder (default: "output/"). -s | --samples | Samples to include. By default all in the svfolder. Comma-separated -c | --callers | Callers to include. By default all in the folders. Comma-separated. -t | --train | Creates new model. Requires truth.vcf to be present in all the sv folders. VCF file truth.vcf is preprocessed even if flag --nopreprocess is set. If the model is trained, it is required to rerun the program to get the consensus. -np | --nopreprocess | Flag used for skipping the preprocessing process - all the preprocessed files should be in temp/ folder.
Structure of the data folder
The samples should follow the rule seen in the following figure:
Implementation details
The workflow of the algorithm is presented in the following figure:
Comparison to gold-standard set
Examples
The example command used for the training of the neural network model:
shell
python main.py -f /home/ConsensuSV/data/ -t
The example command used for getting the consensus SVs (the model included in the package is trained on the 11 SV callers shown on the example sample folder structure):
shell
python main.py -f /home/ConsensuSV/data/ -o consensuSV
Owner
- Name: SFGLab
- Login: SFGLab
- Kind: organization
- Repositories: 5
- Profile: https://github.com/SFGLab
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "ConsensuSV-core"
authors:
- family-names: "Chiliński"
given-names: "Mateusz"
orcid: "https://orcid.org/0000-0001-6641-8504"
- family-names: "Plewczynski"
given-names: "Dariusz"
orcid: "https://orcid.org/0000-0002-3840-7610"
preferred-citation:
type: article
authors:
- family-names: "Chiliński"
given-names: "Mateusz"
orcid: "https://orcid.org/0000-0001-6641-8504"
- family-names: "Plewczynski"
given-names: "Dariusz"
orcid: "https://orcid.org/0000-0002-3840-7610"
doi: "10.1093/bioinformatics/btac709"
journal: "Bioinformatics"
# month: 12
# start: 1 # First page number
# end: 10 # Last page number
title: "ConsensuSV—from the whole-genome sequencing data to the complete variant list"
# issue: 1
# volume: 1
year: 2022
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Dependencies
- luigi ==3.0.3
- readthedocs-sphinx-search ==0.1.1
- sphinx ==4.2.0
- sphinx-autoapi ==1.8.4
- sphinx_rtd_theme ==1.0.0