https://github.com/cbg-ethz/v-pipe
V-pipe is a pipeline designed for analysing NGS data of short viral genomes
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
11 of 19 committers (57.9%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Keywords
Repository
V-pipe is a pipeline designed for analysing NGS data of short viral genomes
Basic Info
- Host: GitHub
- Owner: cbg-ethz
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://cbg-ethz.github.io/V-pipe/
- Size: 17.4 MB
Statistics
- Stars: 138
- Watchers: 9
- Forks: 46
- Open Issues: 36
- Releases: 10
Topics
Metadata Files
README.md
V-pipe is a workflow designed for the analysis of next generation sequencing (NGS) data from viral pathogens. It produces a number of results in a curated format (e.g., consensus sequences, SNV calls, local/global haplotypes). V-pipe is written using the Snakemake workflow management system.
Usage
Different ways of initializing V-pipe are presented below. We strongly encourage you to deploy it using the quick install script, as this is our preferred method.
To configure V-pipe refer to the documentation present in config/README.md.
V-pipe expects the input samples to be organized in a two-level directory hierarchy,
and the sequencing reads must be provided in a sub-folder named raw_data. Further details can be found on the website.
Check the utils subdirectory for mass-importers tools that can assist you in generating this hierarchy.
We provide virus-specific base configuration files which contain handy defaults for, e.g., HIV and SARS-CoV-2. Set the virus in the general section of the configuration file:
yaml
general:
virus_base_config: hiv
Also see snakemake's documentation to learn more about the command-line options available when executing the workflow.
Tutorials
Tutorials for your first steps with V-pipe for different scenarios are available in the docs/ subdirectory.
Using quick install script
To deploy V-pipe, use the installation script with the following parameters:
bash
curl -O 'https://raw.githubusercontent.com/cbg-ethz/V-pipe/master/utils/quick_install.sh'
./quick_install.sh -w work
This script will download and install miniconda, checkout the V-pipe git repository (use -b to specify which branch/tag) and setup a work directory (specified with -w) with an executable script that will execute the workflow:
```bash cd work
edit config.yaml and provide samples/ directory
./vpipe --jobs 4 --printshellcmds --dry-run ```
Test data to test your installation is available with the tutorials provided in the docs/ subdirectory.
Using Docker
Note: the docker image is only setup with components to run the workflow for HIV and SARS-CoV-2 virus base configurations. Using V-pipe with other viruses or configurations might require internet connectivity for additional software components.
Create config.yaml or vpipe.config and then populate the samples/ directory.
For example, the following config file could be used:
```yaml general: virusbaseconfig: hiv
output: snv: true local: true global: false visualization: true QA: true ```
Then execute:
bash
docker run --rm -it -v $PWD:/work ghcr.io/cbg-ethz/v-pipe:master --jobs 4 --printshellcmds --dry-run
Using Snakedeploy
First install mamba, then create and activate an environment with Snakemake and Snakedeploy:
bash
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
conda activate snakemake
Snakemake's official workflow installer Snakedeploy can now be used:
```bash snakedeploy deploy-workflow https://github.com/cbg-ethz/V-pipe --tag master .
edit config/config.yaml and provide samples/ directory
snakemake --use-conda --jobs 4 --printshellcmds --dry-run ```
Dependencies
Conda is a cross-platform package management system and an environment manager application. Snakemake uses mamba as a package manager.
Snakemake is the central workflow and dependency manager of V-pipe. It determines the order in which individual tools are invoked and checks that programs do not exit unexpectedly.
VICUNA is a de novo assembly software designed for populations with high mutation rates. It is used to build an initial reference for mapping reads with ngshmmalign aligner when a references/cohort_consensus.fasta file is not provided. Further details can be found in the wiki pages.
Computational tools
Other dependencies are managed by using isolated conda environments per rule, and below we list some of the computational tools integrated in V-pipe:
FastQC gives an overview of the raw sequencing data. Flowcells that have been overloaded or otherwise fail during sequencing can easily be determined with FastQC.
Trimming and clipping of reads is performed by PRINSEQ. It is currently the most versatile raw read processor with many customization options.
We perform the alignment of the curated NGS data using our custom ngshmmalign that takes structural variants into account. It produces multiple consensus sequences that include either majority bases or ambiguous bases.
In order to detect specific cross-contaminations with other probes, the Burrows-Wheeler aligner is used. It quickly yields estimates for foreign genomic material in an experiment. Additionally, It can be used as an alternative aligner to ngshmmalign.
To standardise multiple samples to the same reference genome (say HXB2 for HIV-1), the multiple sequence aligner MAFFT is employed. The multiple sequence alignment helps in determining regions of low conservation and thus makes standardisation of alignments more robust.
The Swiss Army knife of alignment postprocessing and diagnostics. bcftools is also used to generate consensus sequence with indels.
We perform genomic liftovers to standardised reference genomes using our in-house developed python library of utilities for rewriting alignments.
ShoRAh performs SNV calling and local haplotype reconstruction by using bayesian clustering.
LoFreq (version 2) is SNVs and indels caller from next-generation sequencing data, and can be used as an alternative engine for SNV calling.
- SAVAGE and Haploclique
We use HaploClique or SAVAGE to perform global haplotype reconstruction for heterogeneous viral populations by using an overlap graph.
Citation
If you use this software in your research, please cite:
Fuhrmann, L., Jablonski, K. P., Topolsky, I., Batavia, A. A., Borgsmueller, N., Icer Baykal, P., Carrara, M. ... & Beerenwinkel, (2023). "V-Pipe 3.0: A Sustainable Pipeline for Within-Sample Viral Genetic Diversity Estimation." bioRxiv, doi:10.1101/2023.10.16.562462.
Contributions
- Ivan Topolsky*
,
- Pelin Icer Baykal
,
- Auguste Rimaite
,
- Lara Fuhrmann
,
- Uwe Schmitt
,
- Michal Okoniewski
,
- Monica Dragan
,
- Kim Philipp Jablonski***
,
- Susana Posada Céspedes***
,
- David Seifert***
,
- Tobias Marschall***
- Niko Beerenwinkel**
* software maintainer ; ** group leader ; *** group alumni and former contributors.
Contact
We encourage users to use the issue tracker. For further enquiries, you can also contact the V-pipe Dev Team v-pipe@bsse.ethz.ch.
Owner
- Name: Computational Biology Group (CBG)
- Login: cbg-ethz
- Kind: organization
- Location: Basel, Switzerland
- Website: https://www.bsse.ethz.ch/cbg
- Twitter: cbg_ethz
- Repositories: 91
- Profile: https://github.com/cbg-ethz
Beerenwinkel Lab at ETH Zurich
GitHub Events
Total
- Issues event: 9
- Watch event: 9
- Issue comment event: 60
- Push event: 41
- Pull request review comment event: 1
- Pull request review event: 7
- Pull request event: 15
- Fork event: 3
- Create event: 13
Last Year
- Issues event: 9
- Watch event: 9
- Issue comment event: 60
- Push event: 41
- Pull request review comment event: 1
- Pull request review event: 7
- Pull request event: 15
- Fork event: 3
- Create event: 13
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| kpj | k****i@g****m | 349 |
| LaraFuhrmann | 5****n | 236 |
| Ivan Blagoev Topolsky | i****y@b****h | 224 |
| Susana Posada-Cespedes | s****a@b****h | 160 |
| Lara Fuhrmann | l****n@e****h | 86 |
| Lara Fuhrmann | l****n@b****h | 54 |
| Monica Dragan | m****n@b****h | 31 |
| Uwe Schmitt | u****t@i****h | 21 |
| Monica Dragan | m****n@g****m | 5 |
| mcarrara | c****a@n****h | 3 |
| David Seifert | S****A | 3 |
| Pelin Icer Baykal | i****n@g****m | 2 |
| Monica Dragan | m****n@g****m | 2 |
| Alex Kanitz | a****z@a****h | 1 |
| Gordon J. Köhn | g****n@k****t | 1 |
| Susana Posada Cespedes | s****p@l****h | 1 |
| Susana Posada Cespedes | s****p@l****h | 1 |
| Michal Okoniewski | m****o@s****h | 1 |
| Prajwal Kulkarni | p****6@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 40
- Total pull requests: 88
- Average time to close issues: 7 months
- Average time to close pull requests: 3 months
- Total issue authors: 25
- Total pull request authors: 11
- Average comments per issue: 2.65
- Average comments per pull request: 1.22
- Merged pull requests: 50
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 17
- Average time to close issues: 4 days
- Average time to close pull requests: 3 months
- Issue authors: 5
- Pull request authors: 3
- Average comments per issue: 4.33
- Average comments per pull request: 1.18
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tobiasmarschall (5)
- sposadac (4)
- gordonkoehn (2)
- stefanches7 (2)
- vicfabienne (2)
- snaketron (2)
- Jamesped (2)
- Masterxilo (2)
- DrYak (2)
- ibseq (2)
- cmfield (1)
- poursalavati (1)
- zuber-bioinfo (1)
- uniqueg (1)
- chadisaad (1)
Pull Request Authors
- DrYak (31)
- kpj (16)
- gordonkoehn (16)
- sposadac (9)
- LaraFuhrmann (8)
- uweschmitt (4)
- prajwalkulkarni (3)
- monicadragan (3)
- corneliusroemer (2)
- hrishikeshh (1)
- dridk (1)
- uniqueg (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/metadata-action v3 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- styfle/cancel-workflow-action 0.9.1 composite
- actions/checkout v2 composite
- actions/upload-artifact v2 composite
- megalinter/megalinter v5 composite
- peter-evans/create-pull-request v3 composite
- stefanzweifel/git-auto-commit-action v4 composite
- actions/checkout v2 composite
- actions/upload-artifact v2 composite
- conda-incubator/setup-miniconda v2 composite
- snakemake/snakemake-github-action v1.19.0 composite
- styfle/cancel-workflow-action 0.9.1 composite
- snakemake/snakemake ${snaketag} build
- vpipe-tests-base latest build
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/upload-artifact v3 composite
- docker/build-push-action v5 composite
- docker/metadata-action v5 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3 composite
- actions/checkout v3 composite
- actions/download-artifact v2 composite
- actions/upload-artifact v3 composite
- conda-incubator/setup-miniconda v2 composite