pipelines-nextflow
A set of workflows written in Nextflow for Genome Annotation.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Keywords
Repository
A set of workflows written in Nextflow for Genome Annotation.
Basic Info
Statistics
- Stars: 45
- Watchers: 35
- Forks: 18
- Open Issues: 20
- Releases: 0
Topics
Metadata Files
README.md
NBIS Annotation service pipelines
Table of Contents
Overview
This Nextflow workflow is a compilation of several subworkflows for different stages of genome annotation. Specifically:
where the overall genome annotation process is:
mermaid
graph TD
preprocessing[Annotation Preprocessing] --> evidenceAlignment[Evidence alignment]
transcriptAssembly[Transcript Assembly] --> evidenceAlignment
evidenceAlignment --> evidenceMaker[Evidence-based Maker]
denovoRepeatLibrary[De novo Repeat Library] ---> evidenceMaker
transcriptAssembly --> pasa[PASA]
preprocessing --> pasa
pasa --> evidenceMaker
evidenceMaker --> abinitioTraining[Abinitio Training]
abinitioTraining --> abinitioMaker[Abinitio-based Maker]
evidenceMaker --> abinitioMaker
pasa --> functionalAnnotation[Functional Annotation]
abinitioMaker --> functionalAnnotation
functionalAnnotation --> EMBLmyGFF3
The subworkflow is selected using the subworkflow parameter.
Citation
If you use these pipelines in your work, please acknowledge NBIS within your communication according to this example: "Support by NBIS (National Bioinformatics Infrastructure Sweden) is gratefully acknowledged."
Acknowledgments
These workflows were based on the Bpipe workflows written by Marc Höppner (\@marchoeppner) and Jacques Dainat (\@Juke34).
Thank you to everyone who contributes to this project.
Maintainers
- Mahesh Binzer-Panchal (\@mahesh-panchal)
- Expertise: Nextflow workflow development
- Jacques Dainat (\@Juke34)
- Expertise: Genome annotation, Nextflow workflow development
- Lucile Soler (\@LucileSol)
- Expertise: Genome Annotation
Installation and Usage
Requirements:
- Nextflow
- A container platform (recommended) such as Singularity or Docker, or the
conda/mamba package manager if a container platform is not available.
If containers or conda/mamba are unavailable, then tool dependencies
must be accessible from your
PATH.
Nextflow
Install Nextflow directly:
bash
curl -s https://get.nextflow.io | bash
mv ./nextflow ~/bin
Alternatively, installation can be managed with conda (or mamba) in it's own conda environment:
bash
conda create -c conda-forge -c bioconda -n nextflow-env nextflow
conda activate nextflow-env
See Nextflow: Get started - installation for further details.
General Usage
A workflow is run in the following way:
bash
nextflow run NBISweden/pipelines-nextflow \
[-profile <profile_name1>[,<profile_name2>,...] ] \
[-c workflow.config ] \
[-resume] \
-params-file workflow_parameters.yml
where -profile selects from a predefined profile (select here for available profiles),
-c workflow.config loads a custom configuration for altering existing process settings (defined
in nextflow.config - loaded by default, such as the
number of cpus, time allocation, memory, output prefixes and tool command-line options ). The
-params-file is a YAML formatted file listing workflow parameters, e.g.
yaml
subworkflow: 'annotation_preprocessing'
genome: '/path/to/genome'
busco_lineage:
- 'eukaryota_odb10'
- 'bacteria_odb10'
outdir: '/path/to/save/results'
Note If running on a compute cluster infrastructure,
nextflowmust be able to communicate with the workload manager at all times, otherwise tasks will be cancelled. The best way to do this is to runnextflowusing ascreenortmuxterminal.E.g. Screen
```bash
Open a named screen terminal session
screen -S mynextflowrun
load nextflow with conda
conda activate nextflow-env
run nextflow
nextflow run -c
-profile "Detach" screen terminal
list screen sessions
screen -ls
"Attach" screen session
screen -r mynextflowrun ```
Profiles
- uppmax: A profile for the Uppmax clusters. Tasks are submitted to the SLURM workload manager,
executed within Singularity (unless otherwise noted), and use the
$SNIC_TMPscratch space. Note: The workflow parameterprojectis manadatory when using Uppmax clusters. - conda: A general purpose profile that uses conda to manage software dependencies.
- mamba: A general purpose profile that uses mamba to manage software dependencies.
- docker: A general purpose profile that uses docker to manage software dependencies.
- singularity: A general purpose profile that uses singularity to manage software dependencies.
- nbis: A profile for the NBIS annotation cluster. Tasks are submitted to the SLURM workload
manager, and use the disk space
/scratchfor task execution. Software should be managed using one of the general purpose profiles above. - gitpod: A profile to set local executor settings in the Gitpod environment.
- test: A profile supplying test data to check if the workflows run on your system.
- pipeline_report: Adds a folder in the
outdirwhich include workflow execution reports.
Uppmax profile good practices
Note
Nextflow is enabled using the module system on Uppmax.
bash module load bioinfo-tools NextflowThe following configuration in your
workflow.configis recommended when running workflows on Uppmax.
nextflow // Set your work directory to a folder in your project directory under nobackup workDir = '/proj/<snic_storage_project>/nobackup/work' // Restart workflows from last successful execution (i.e. use cached results where possible). resume = true // Add any overriding process directives here, e.g., process { withName: 'BLAST_BLASTN' { cpus = 12 time = 2.d } }
NBIS profile good practices
Note
Both singularity and conda are installed, however singularity is preferred for speed and reproducibility.
bash module load SingularityThe following configuration in your
workflow.configis recommended when running workflows on the annotation cluster.
nextflow // Set your work directory to a folder on the /active partition workDir = '/active/<project_id>/nobackup/work' // Restart workflows from last successful execution (i.e. use cached results where possible). resume = true // Add any overriding process directives here, e.g., process { withName: 'BLAST_BLASTN' { cpus = 12 time = 2.d } } // Use a shared cache folder singularity images singularity.cacheDir = '/active/nxf_singularity_cachedir' // If using conda, use a shared cache for conda environments conda.cacheDir = '/active/nxf_conda_cachedir' // Use mamba for speed over conda conda.useMamba = trueProject results should be published to
/projects, work directories should be on/active, while computations are performed on the local/scratchpartitions.
Owner
- Name: NBIS - National Bioinformatics Infrastructure Sweden
- Login: NBISweden
- Kind: organization
- Location: Sweden
- Website: https://nbis.se
- Twitter: NBISwe
- Repositories: 237
- Profile: https://github.com/NBISweden
NBIS is a distributed national bioinformatics infrastructure, supporting life sciences in Sweden.
Citation (CITATION.cff)
# YAML 1.2
---
abstract: "A set of workflows written in Nextflow for Genome Annotation. "
authors:
-
affiliation: "National Bioinformatics Infrastructure Sweden (NBIS) / Uppsala University"
family-names: "Binzer-Panchal"
given-names: Mahesh
orcid: "https://orcid.org/https://orcid.org/0000-0003-1675-0677"
-
family-names: Dainat
given-names: Jacques
orcid: "https://orcid.org/https://orcid.org/0000-0002-6629-0173"
-
family-names: Soler
given-names: Lucile
orcid: "https://orcid.org/https://orcid.org/0000-0002-0121-2393"
cff-version: "1.1.0"
date-released: 2021-08-05
keywords:
- Nextflow
- "Genome Annotation"
- "Functional Annotation"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/NBISweden/pipelines-nextflow"
title: "NBIS Genome Annotation Workflows"
version: "v1.0.0"
...
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mahesh Binzer-Panchal | m****l@n****e | 211 |
| Mahesh Binzer-Panchal | m****l@n****e | 82 |
| MartinPippel | m****l@n****e | 23 |
| Jacques Dainat | j****t@n****e | 17 |
| LucileSol | l****r@b****e | 4 |
| Andre Soares | a****1 | 2 |
| verku | v****a@s****e | 2 |
| Lucile Soler | l****5@i****e | 2 |
| Roy Francis | r****s@g****m | 1 |
| Pontus Frehult | p****b@s****t | 1 |
| LucileSol | l****r@n****e | 1 |
| Lucile Soler | l****5@n****s | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 99
- Total pull requests: 77
- Average time to close issues: 6 months
- Average time to close pull requests: 19 days
- Total issue authors: 13
- Total pull request authors: 9
- Average comments per issue: 3.01
- Average comments per pull request: 1.44
- Merged pull requests: 70
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Juke34 (16)
- mahesh-panchal (15)
- LucileSol (10)
- ViriatoII (5)
- royfrancis (2)
- Shicheng-Guo (1)
- fmsangdehi (1)
- aersoares81 (1)
- EmilieSmeets22 (1)
- Brent-Saylor-Canopy (1)
- unavailable-2374 (1)
- MartinPippel (1)
- apfuentes (1)
- verku (1)
- fraca (1)
Pull Request Authors
- mahesh-panchal (30)
- MartinPippel (5)
- LucileSol (4)
- Juke34 (3)
- aersoares81 (2)
- mkierczak (1)
- ViktorSade (1)
- verku (1)
- pontus (1)
- royfrancis (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite