pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.

https://github.com/nbisweden/pipelines-nextflow

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary

Keywords

genome-annotation nextflow workflow
Last synced: 6 months ago · JSON representation ·

Repository

A set of workflows written in Nextflow for Genome Annotation.

Basic Info
  • Host: GitHub
  • Owner: NBISweden
  • License: gpl-3.0
  • Language: Nextflow
  • Default Branch: master
  • Homepage:
  • Size: 338 KB
Statistics
  • Stars: 45
  • Watchers: 35
  • Forks: 18
  • Open Issues: 20
  • Releases: 0
Topics
genome-annotation nextflow workflow
Created about 6 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

NBIS Annotation service pipelines

Table of Contents

Overview

This Nextflow workflow is a compilation of several subworkflows for different stages of genome annotation. Specifically:

where the overall genome annotation process is:

mermaid graph TD preprocessing[Annotation Preprocessing] --> evidenceAlignment[Evidence alignment] transcriptAssembly[Transcript Assembly] --> evidenceAlignment evidenceAlignment --> evidenceMaker[Evidence-based Maker] denovoRepeatLibrary[De novo Repeat Library] ---> evidenceMaker transcriptAssembly --> pasa[PASA] preprocessing --> pasa pasa --> evidenceMaker evidenceMaker --> abinitioTraining[Abinitio Training] abinitioTraining --> abinitioMaker[Abinitio-based Maker] evidenceMaker --> abinitioMaker pasa --> functionalAnnotation[Functional Annotation] abinitioMaker --> functionalAnnotation functionalAnnotation --> EMBLmyGFF3

The subworkflow is selected using the subworkflow parameter.

Citation

If you use these pipelines in your work, please acknowledge NBIS within your communication according to this example: "Support by NBIS (National Bioinformatics Infrastructure Sweden) is gratefully acknowledged."

DOI

Acknowledgments

These workflows were based on the Bpipe workflows written by Marc Höppner (\@marchoeppner) and Jacques Dainat (\@Juke34).

Thank you to everyone who contributes to this project.

Maintainers

  • Mahesh Binzer-Panchal (\@mahesh-panchal)
    • Expertise: Nextflow workflow development
  • Jacques Dainat (\@Juke34)
    • Expertise: Genome annotation, Nextflow workflow development
  • Lucile Soler (\@LucileSol)
    • Expertise: Genome Annotation

Installation and Usage

Requirements:

  • Nextflow
  • A container platform (recommended) such as Singularity or Docker, or the conda/mamba package manager if a container platform is not available. If containers or conda/mamba are unavailable, then tool dependencies must be accessible from your PATH.

Nextflow

Install Nextflow directly:

bash curl -s https://get.nextflow.io | bash mv ./nextflow ~/bin

Alternatively, installation can be managed with conda (or mamba) in it's own conda environment:

bash conda create -c conda-forge -c bioconda -n nextflow-env nextflow conda activate nextflow-env

See Nextflow: Get started - installation for further details.

General Usage

A workflow is run in the following way:

bash nextflow run NBISweden/pipelines-nextflow \ [-profile <profile_name1>[,<profile_name2>,...] ] \ [-c workflow.config ] \ [-resume] \ -params-file workflow_parameters.yml

where -profile selects from a predefined profile (select here for available profiles), -c workflow.config loads a custom configuration for altering existing process settings (defined in nextflow.config - loaded by default, such as the number of cpus, time allocation, memory, output prefixes and tool command-line options ). The -params-file is a YAML formatted file listing workflow parameters, e.g.

yaml subworkflow: 'annotation_preprocessing' genome: '/path/to/genome' busco_lineage: - 'eukaryota_odb10' - 'bacteria_odb10' outdir: '/path/to/save/results'

Note If running on a compute cluster infrastructure, nextflow must be able to communicate with the workload manager at all times, otherwise tasks will be cancelled. The best way to do this is to run nextflow using a screen or tmux terminal.

E.g. Screen

```bash

Open a named screen terminal session

screen -S mynextflowrun

load nextflow with conda

conda activate nextflow-env

run nextflow

nextflow run -c -profile

"Detach" screen terminal

list screen sessions

screen -ls

"Attach" screen session

screen -r mynextflowrun ```

Profiles

  • uppmax: A profile for the Uppmax clusters. Tasks are submitted to the SLURM workload manager, executed within Singularity (unless otherwise noted), and use the $SNIC_TMP scratch space. Note: The workflow parameter project is manadatory when using Uppmax clusters.
  • conda: A general purpose profile that uses conda to manage software dependencies.
  • mamba: A general purpose profile that uses mamba to manage software dependencies.
  • docker: A general purpose profile that uses docker to manage software dependencies.
  • singularity: A general purpose profile that uses singularity to manage software dependencies.
  • nbis: A profile for the NBIS annotation cluster. Tasks are submitted to the SLURM workload manager, and use the disk space /scratch for task execution. Software should be managed using one of the general purpose profiles above.
  • gitpod: A profile to set local executor settings in the Gitpod environment.
  • test: A profile supplying test data to check if the workflows run on your system.
  • pipeline_report: Adds a folder in the outdir which include workflow execution reports.
Uppmax profile good practices

Note

Nextflow is enabled using the module system on Uppmax.

bash module load bioinfo-tools Nextflow

The following configuration in your workflow.config is recommended when running workflows on Uppmax.

nextflow // Set your work directory to a folder in your project directory under nobackup workDir = '/proj/<snic_storage_project>/nobackup/work' // Restart workflows from last successful execution (i.e. use cached results where possible). resume = true // Add any overriding process directives here, e.g., process { withName: 'BLAST_BLASTN' { cpus = 12 time = 2.d } }

NBIS profile good practices

Note

Both singularity and conda are installed, however singularity is preferred for speed and reproducibility.

bash module load Singularity

The following configuration in your workflow.config is recommended when running workflows on the annotation cluster.

nextflow // Set your work directory to a folder on the /active partition workDir = '/active/<project_id>/nobackup/work' // Restart workflows from last successful execution (i.e. use cached results where possible). resume = true // Add any overriding process directives here, e.g., process { withName: 'BLAST_BLASTN' { cpus = 12 time = 2.d } } // Use a shared cache folder singularity images singularity.cacheDir = '/active/nxf_singularity_cachedir' // If using conda, use a shared cache for conda environments conda.cacheDir = '/active/nxf_conda_cachedir' // Use mamba for speed over conda conda.useMamba = true

Project results should be published to /projects, work directories should be on /active, while computations are performed on the local /scratch partitions.

Owner

  • Name: NBIS - National Bioinformatics Infrastructure Sweden
  • Login: NBISweden
  • Kind: organization
  • Location: Sweden

NBIS is a distributed national bioinformatics infrastructure, supporting life sciences in Sweden.

Citation (CITATION.cff)

# YAML 1.2
---
abstract: "A set of workflows written in Nextflow for Genome Annotation. "
authors:
  -
    affiliation: "National Bioinformatics Infrastructure Sweden (NBIS) / Uppsala University"
    family-names: "Binzer-Panchal"
    given-names: Mahesh
    orcid: "https://orcid.org/https://orcid.org/0000-0003-1675-0677"
  -
    family-names: Dainat
    given-names: Jacques
    orcid: "https://orcid.org/https://orcid.org/0000-0002-6629-0173"
  -
    family-names: Soler
    given-names: Lucile
    orcid: "https://orcid.org/https://orcid.org/0000-0002-0121-2393"
cff-version: "1.1.0"
date-released: 2021-08-05
keywords:
  - Nextflow
  - "Genome Annotation"
  - "Functional Annotation"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/NBISweden/pipelines-nextflow"
title: "NBIS Genome Annotation Workflows"
version: "v1.0.0"
...

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 347
  • Total Committers: 12
  • Avg Commits per committer: 28.917
  • Development Distribution Score (DDS): 0.392
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Mahesh Binzer-Panchal m****l@n****e 211
Mahesh Binzer-Panchal m****l@n****e 82
MartinPippel m****l@n****e 23
Jacques Dainat j****t@n****e 17
LucileSol l****r@b****e 4
Andre Soares a****1 2
verku v****a@s****e 2
Lucile Soler l****5@i****e 2
Roy Francis r****s@g****m 1
Pontus Frehult p****b@s****t 1
LucileSol l****r@n****e 1
Lucile Soler l****5@n****s 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 99
  • Total pull requests: 77
  • Average time to close issues: 6 months
  • Average time to close pull requests: 19 days
  • Total issue authors: 13
  • Total pull request authors: 9
  • Average comments per issue: 3.01
  • Average comments per pull request: 1.44
  • Merged pull requests: 70
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Juke34 (16)
  • mahesh-panchal (15)
  • LucileSol (10)
  • ViriatoII (5)
  • royfrancis (2)
  • Shicheng-Guo (1)
  • fmsangdehi (1)
  • aersoares81 (1)
  • EmilieSmeets22 (1)
  • Brent-Saylor-Canopy (1)
  • unavailable-2374 (1)
  • MartinPippel (1)
  • apfuentes (1)
  • verku (1)
  • fraca (1)
Pull Request Authors
  • mahesh-panchal (30)
  • MartinPippel (5)
  • LucileSol (4)
  • Juke34 (3)
  • aersoares81 (2)
  • mkierczak (1)
  • ViktorSade (1)
  • verku (1)
  • pontus (1)
  • royfrancis (1)
Top Labels
Issue Labels
New pipeline (7) enhancement (7) bug (3) good first issue (2) Provide test data (2) wontfix (1) Provide example code (1) help wanted (1)
Pull Request Labels

Dependencies

.github/workflows/continuous_integration.yml actions
  • actions/checkout v2 composite
modules/nf-core/blast/makeblastdb/meta.yml cpan
modules/nf-core/busco/meta.yml cpan
modules/nf-core/fastp/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
modules/nf-core/interproscan/meta.yml cpan
modules/nf-core/blast/makeblastdb/environment.yml pypi
modules/nf-core/busco/environment.yml pypi
modules/nf-core/fastp/environment.yml pypi
modules/nf-core/fastqc/environment.yml pypi
modules/nf-core/interproscan/environment.yml pypi
modules/nf-core/multiqc/environment.yml pypi