https://github.com/biodataanalysisgroup/synth4bench
A framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 12 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Keywords
Repository
A framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.
Basic Info
Statistics
- Stars: 4
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Abstract
Table of Contents
- Abstract
- Motivation
- Description of Framework
- Installation
- Data Download
- Execution
- Documentation
- Contribute
- Citation
Motivation
Description of Framework
Data Download
All data are openly available on Zenodo. For specific instructions, refer to our User Guide.
Installation
Create the Conda environment:
bash conda env create -f environment.yml conda activate synth4benchInstall NEAT v3.3:
Download version v3.3.
To call the main script:bash python gen_reads.py --helpFor further details, see the NEAT README included in the download.
Install bam-readcount:
Follow their installation instructions.
After building, verify installation:bash build/bin/bam-readcount --helpIf you encounter issues during the
makeprocess, you can alternatively use the executable available here and place it in thebam-readcount/build/binfolder.Download VarScan Extra Script:
The extra script
vscan_pileup2cns2vcf.pyfor VarScan is available here.
Execution
Simply configure your parameters in the parameters.yaml file, then execute:
bash
bash s4b_run.sh
This single command generates synthetic data, runs variant calling for all selected tools, and performs downstream analysis and plotting.
For full execution instructions, see our User Guide.
Documentation
For further documentation, visit the documentation page.
Contribute
We welcome and greatly appreciate any feedback or contributions!
If you have questions, please open an issue here or email sfragkoul@certh.gr.
Citation
Our work has been submitted to the bioRxiv preprint repository. If you use synth4bench, please cite:
S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. E. Psomopoulos, “Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.” 2024, doi:10.1101/2024.03.07.582313.
Related Publications
- S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. Psomopoulos, synth4bench: Benchmarking Somatic Variant Callers – A Tale Unfolding In The Synthetic Genomics Feature Space, 23rd European Conference On Computational Biology (ECCB24), Sep 2024, Turku, Finland, doi: 10.5281/zenodo.14186509
- S.-C. Fragkouli, N. Pechlivanis, A. Anastasiadou, G. Karakatsoulis, A. Orfanou, P. Kollia, A. Agathangelidis, and F. Psomopoulos, “Exploring Somatic Variant Callers' Behavior: A Synthetic Genomics Feature Space Approach”, ELIXIR AHM24, Jun 2024, Uppsala, Sweden, doi: 10.7490/f1000research.1119793.1
- S.-C. Fragkouli, N. Pechlivanis, A. Orfanou, A. Anastasiadou, A. Agathangelidis and F. Psomopoulos, Synth4bench: a framework for generating synthetic genomics data for the evaluation of somatic variant calling algorithms, 17th Conference of Hellenic Society for Computational Biology and Bioinformatics (HSCBB), Oct 2023, Thessaloniki, Greece, doi:10.5281/zenodo.8432060
- S.-C. Fragkouli, N. Pechlivanis, A. Agathangelidis and F. Psomopoulos, Synthetic Genomics Data Generation and Evaluation for the Use Case of Benchmarking Somatic Variant Calling Algorithms, 31st Conference in Intelligent Systems For Molecular Biology and the 22nd European Conference On Computational Biology (ISΜB-ECCB23), Jul 2023, Lyon, France, doi:10.7490/f1000research.1119575.1
Owner
- Name: Biodata Analysis Group
- Login: BiodataAnalysisGroup
- Kind: organization
- Email: fpsom@certh.gr
- Website: https://biodataanalysisgroup.github.io/
- Repositories: 17
- Profile: https://github.com/BiodataAnalysisGroup
GitHub Events
Total
- Issues event: 6
- Delete event: 2
- Issue comment event: 1
- Push event: 78
- Pull request event: 27
- Fork event: 1
- Create event: 1
Last Year
- Issues event: 6
- Delete event: 2
- Issue comment event: 1
- Push event: 78
- Pull request event: 27
- Fork event: 1
- Create event: 1
Dependencies
- biopython ==1.80
- contourpy ==1.0.6
- cycler ==0.11.0
- fonttools ==4.38.0
- kiwisolver ==1.4.4
- matplotlib ==3.6.2
- matplotlib-venn ==0.11.7
- numpy ==1.24.0
- packaging ==22.0
- pandas ==1.5.2
- pillow ==9.3.0
- pyparsing ==3.0.9
- pysam ==0.19.1
- python-dateutil ==2.8.2
- pytz ==2022.7
- scipy ==1.9.3
- six ==1.16.0