eukavarizer
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Lupphes
- License: mit
- Language: Nextflow
- Default Branch: master
- Size: 747 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Introduction
nf-core/eukavarizer is a modular and reproducible bioinformatics pipeline designed for the detection and analysis of structural variants (SVs) in eukaryotic genomes. The pipeline supports both short- and long-read data, integrates a variety of SV callers (e.g., GRIDSS, DELLY, Sniffles, CuteSV), and unifies the results for further analysis. Key features include:
- Reference genome retrieval from RefSeq or use of a local reference
- Automated FASTQ preprocessing and alignment (BWA or Minimap2)
- Modular SV calling with customizable parameters
- Merging and filtering of SVs using SURVIVOR and BCFtools
- Visual and statistical reporting
Suitable for diverse eukaryotic organisms, from yeast to mammals.
Workflow Overview
- Reference genome retrieval or usage of user-provided genome
- Read QC and preprocessing
- Mapping using BWA or Minimap2 (long-read aware)
- SV calling via one or more tools:
- Short-read: DELLY, GRIDSS, TIDDIT, Manta
- Long-read: Sniffles, CuteSV
- SV merging & filtering with SURVIVOR and BCFtools
- Report generation (including MultiQC and HTML summaries)
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
Quick Start
Minimal example:
bash
nextflow run nf-core/eukavarizer \
-profile docker,short_quick \
--taxonomy_id 4932 \
--outdir results/
Local input example:
bash
nextflow run nf-core/eukavarizer \
-profile docker,short_full \
--taxonomy_id 4932 \
--reference_genome ./data/4932/ref/genome.fa.gz \
--sequence_dir ./data/4932/ena/ \
--outdir results/
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Profiles and Parameters
The pipeline provides several pre-defined profiles to optimise analysis based on input read types and analysis depth. Combine these with a compute environment profile such as docker, conda, mamba, or kube.
Read Type and Analysis Depth Profiles
short_quick,short_medium,short_full: For short-read datalong_quick,long_medium,long_full: For long-read datamix_quick,mix_medium,mix_full: For hybrid/mixed sequencing
Each level adjusts:
- Enabled SV callers
- Tool-specific arguments
- Filtering thresholds for SURVIVOR and BCFtools
Compute Environment Profiles
docker: Uses Docker containersconda/mamba: Uses Conda or Mamba for software installationkube: For Kubernetes environments
Use -profile docker,short_full for a full analysis on short-read data with Docker.
Important Parameters
| Parameter | Description |
| ------------------ | --------------------------------------------- |
| taxonomy_id | NCBI Taxonomy ID for reference retrieval |
| sequence_dir | Path to directory with raw sequence files |
| reference_genome | Path to a FASTA file for the reference genome |
| outdir | Output directory for results |
Pipeline output
The pipeline produces the following main outputs:
multiqc_report.html: quality control summaryreport.html: merged SVs and summary statisticsplots/: additional per-caller plots and images- VCF files (per caller and merged)
Credits
nf-core/eukavarizer was originally written by Ondrej Lupphes Sloup.
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #eukavarizer channel (you can join with this invite).
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: Ondřej Sloup
- Login: Lupphes
- Kind: user
- Location: Prague
- Company: Red Hat
- Website: https://www.lupp.es
- Repositories: 21
- Profile: https://github.com/Lupphes
Citation (CITATIONS.md)
# nf-core/eukavarizer: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [fastp](https://doi.org/10.1093/bioinformatics/bty560) > Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. > Chen S. (2023). Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta, 2:e107. https://doi.org/10.1002/imt2.107 - [fastplong] > Shifu Chen. 2023. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107 > Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560 - [SVYNC](https://github.com/nvnieuwk/svync) > Vannieuwkerke N. (2024). SVYNC: Structural Variant Integrator. https://github.com/nvnieuwk/svync - [BWA](https://doi.org/10.1093/bioinformatics/btp324) > Li H, Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760. - [BWA-MEM2](https://arxiv.org/abs/1907.12931) > Vasimuddin M, Misra S, Li H, Aluru S. (2019). Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. arXiv:1907.12931 - [Minimap2](https://doi.org/10.1093/bioinformatics/bty191) > Li H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094-3100. - [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) > Li H, Handsaker B, Wysoker A, et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. - [BCFtools](https://doi.org/10.1093/gigascience/giab008) > Danecek P, Bonfield JK, Liddle J, et al. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. - [BBMap](https://sourceforge.net/projects/bbmap/) > Bushnell B. (2014). BBMap: A fast, accurate, splice-aware aligner. https://sourceforge.net/projects/bbmap/ - [seqtk](https://github.com/lh3/seqtk) > Li H. (2012). seqtk: a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. https://github.com/lh3/seqtk - [DELLY](https://doi.org/10.1093/bioinformatics/bts378) > Rausch T, Zichner T, Schlattl A, Stüzer A, Benes V, Korbel JO. (2012). DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 28(18), i333-i339. - [GRIDSS](https://doi.org/10.1101/gr.222109.117) > Cameron DL, Di Stefano L, Papenfuss AT. (2017). GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res., 27(12), 2050-2060. - [TIDDIT](https://doi.org/10.12688/f1000research.11501.1) > Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. (2017). TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Research, 6:664. - [Dysgu](https://doi.org/10.1093/bioinformatics/btac144) > Cawte N, Stadler PF, Simmonds MJ. (2022). Dysgu: a tool for structural variant detection from short-read sequencing data. Bioinformatics, 38(9), 2465-2471. - [Sniffles](https://doi.org/10.1038/s41592-018-0001-7) > Sedlazeck FJ, Rescheneder P, Smolka M, et al. (2018). Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods, 15(6), 461-468. - [CuteSV](https://doi.org/10.1186/s13059-020-02107-y) > Jiang T, Liu Y, Jiang Y, et al. (2020). Long-read-based human genomic structural variation detection with cuteSV. Genome Biol., 21:189. - [SvABA](https://doi.org/10.1101/gr.221028.117) > Wala JA, Bandopadhayay P, Greenwald NF, et al. (2018). SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res., 28(4), 581–591. - [Manta](https://doi.org/10.1093/bioinformatics/btv710) > Chen X, Schulz-Trieglaff O, Shaw R, et al. (2016). Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32(8), 1220–1222. - [SURVIVOR](https://doi.org/10.1038/ncomms14061) > Jeffares DC, Jolly C, Hoti M, et al. (2017). Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun., 8, 14061. - [Varify] > Sloup O. et al. (2024). Varify: structural variant analysis summary and visualization. https://github.com/Lupphes/Varify - [SRA](https://doi.org/10.1093/nar/gkq1019) > Leinonen R, Sugawara H, Shumway M; International Nucleotide Sequence Database Collaboration. (2011). The Sequence Read Archive. Nucleic Acids Res., 39(Database issue): D19–D21. - [BioDbCore](https://github.com/luppo/biodbcore) > Sloup O. (2024). BioDbCore: ENA/RefSeq retrieval module. https://github.com/luppo/biodbcore - [coreutils](https://www.gnu.org/software/coreutils/) > GNU Project. coreutils (including gunzip). https://www.gnu.org/software/coreutils/
GitHub Events
Total
- Push event: 123
- Public event: 1
Last Year
- Push event: 123
- Public event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/upload-artifact v4 composite
- octokit/request-action v2.x composite
- seqeralabs/action-tower-launch v2 composite
- actions/upload-artifact v4 composite
- seqeralabs/action-tower-launch v2 composite
- mshick/add-pr-comment b8f338c590a895d50bcbfa6c5859251edc8952fc composite
- actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
- conda-incubator/setup-miniconda a4260408e20b96e80095f42ff7f1a15b27dd94ca composite
- eWaterCycle/setup-apptainer main composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- nf-core/setup-nextflow v2 composite
- actions/stale 28ca1036281a5e5922ead5184a1bbf96e5fc984e composite
- actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
- eWaterCycle/setup-apptainer 4bb22c52d4f63406c49e94c804632975787312b3 composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- nf-core/setup-nextflow v2 composite
- actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
- actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
- peter-evans/create-or-update-comment 71345be0265236311c031f5c7866368bd1eff043 composite
- actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
- actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
- actions/upload-artifact b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 composite
- nf-core/setup-nextflow v2 composite
- pietrobolcato/action-read-yaml 1.1.0 composite
- dawidd6/action-download-artifact 20319c5641d495c8a52e688b7dc5fada6c3a9fbc composite
- marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite
- rzr/fediverse-action master composite
- zentered/bluesky-post-action 80dbe0a7697de18c15ad22f4619919ceb5ccf597 composite
- actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
- mshick/add-pr-comment b8f338c590a895d50bcbfa6c5859251edc8952fc composite
- nichmor/minimal-read-yaml v0.0.2 composite
- biodbcore 0.1.3.*
- python 3.10.*
- biodbcore 0.1.3.*
- python 3.10.*
- fastplong 0.2.2.*
- coreutils 9.5.*
- grep 3.11.*
- gzip 1.13.*
- lbzip2 2.5.*
- sed 4.8.*
- tar 1.34.*
- seqkit 2.9.0.*
- python 3.10.*
- varify 0.2.6.*
- bbmap 39.18.*
- pigz 2.8.*
- bcftools 1.21.*
- htslib 1.21.*
- bwa-mem2 2.2.1.*
- htslib 1.21.*
- samtools 1.21.*
- bwa-mem2 2.2.1.*
- htslib 1.21.*
- samtools 1.21.*
- fastp 0.24.0.*
- minimap2 2.29.*
- samtools 1.21.*
- minimap2 2.29.*
- seqtk 1.4.*
- pigz 2.6.*
- sra-tools 3.0.8.*
- survivor 1.0.7.*