genepal

A Nextflow pipeline for genome and pan-genome annotation

https://github.com/plant-food-research-open/genepal

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary

Keywords

annotation gene genome pangenome phased
Last synced: 6 months ago · JSON representation ·

Repository

A Nextflow pipeline for genome and pan-genome annotation

Basic Info
Statistics
  • Stars: 13
  • Watchers: 13
  • Forks: 7
  • Open Issues: 38
  • Releases: 10
Topics
annotation gene genome pangenome phased
Created over 2 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

plant-food-research-open/genepal

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda ❌ run with docker run with singularity Launch on Seqera Platform

Introduction

plant-food-research-open/genepal is a bioinformatics pipeline for single genome, phased genomes and pan-genome annotation. An overview is shown in the Pipeline Flowchart and the references for the tools are listed in CITATIONS.md. Protein coding gene structures are predicted with BRAKER which uses GeneMark-ES/ET/EP+/ETP. These tools require a license for commercial works.

Pipeline Flowchart

  • fasta_validator: Validate genome FASTA
  • RepeatModeler or EDTA: Create TE library
  • RepeatMasker: Soft mask the genome fasta
  • sra-tools: RNASeq data download from SRA
  • FastQC, fastp, SortMeRNA: QC, trim and filter RNASeq evidence
  • STAR: RNASeq alignment
  • cat: Concatenate protein FASTA files
  • BRAKER: Predict protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS
    • Directly provided BAM files should be --outSAMstrandField intronMotif compliant
    • With protein evidence alone, BRAKER workflow C is executed
    • With protein plus RNASeq evidence, BRAKER workflow D is executed
  • Liftoff: Optionally, liftoff annotations from reference genome FASTA/GFF
  • TSEBRA: Optionally, ensure that each BRAKER or both BRAKER and Liftoff models have full intron support
  • AGAT
    • Merge multi-reference liftoffs
    • Remove liftoff transcripts marked by validORF=False_
    • Remove liftoff genes with any intron shorter than 10 bp
    • Remove rRNA, tRNA and other non-protein coding models from liftoff
    • Optionally, allow or remove iso-forms
    • Remove BRAKER models from Liftoff loci
    • Merge Liftoff and BRAKER models
    • Optionally, remove models without any EggNOG-mapper hits
    • Optionally, remove models with ORFs shorter than N amino acids
  • EggNOG-mapper: Add functional annotation to gff
  • GenomeTools: GFF format validation
  • GffRead: Extraction of protein sequences and filtering of BRAKER models with invalid ORF(s)
  • OrthoFinder: Perform phylogenetic orthology inference across genomes
  • GffCompare: Compare and benchmark against an existing annotation
  • BUSCO: Completeness statistics for genome and annotation through proteins
  • R Markdown: Specialized pangene analysis
  • MultiQC: Exhaustive QC statistics

Usage

Refer to usage, parameters and output documents for details.

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare an assemblysheet with your input genomes that looks as follows:

assemblysheet.csv:

csv tag ,fasta ,is_masked a_thaliana ,/path/to/genome.fa ,yes

Each row represents an input genome and the fields are:

  • tag: A unique tag which represents the genome throughout the pipeline
  • fasta: fasta file for the genome
  • is_masked: yes or no to denote whether the fasta file is already masked or not

At minimum, a file with proteins as evidence is also required. Now, you can run the pipeline using:

bash nextflow run plant-food-research-open/genepal \ -revision <version> \ -profile <docker/singularity/.../institute> \ --input assemblysheet.csv \ --protein_evidence proteins.faa \ --outdir <OUTDIR>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Plant&Food Users

Download the pipeline to your /workspace/$USER folder. Change the parameters defined in the pfr/params.json file. Submit the pipeline to SLURM for execution.

bash sbatch ./pfr_genepal

Credits

plant-food-research-open/genepal workflows were originally scripted by Jason Shiller (@jasonshiller). Usman Rashid (@gallvp) wrote the Nextflow pipeline.

We thank the following people for extensive assistance in the development of the pipeline,

and for contributions to the codebase,

The pipeline uses nf-core modules contributed by following authors:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use plant-food-research-open/genepal for your analysis, please cite it as:

genepal: A Nextflow pipeline for genome and pan-genome annotation.

Usman Rashid, Jason Shiller, Ross Crowhurst, Chen Wu, Ting-Hsuan Chen, Leonardo Salgado, Charles David, Sarah Bailey, Ignacio Carvajal, Anand Rampadarath, Ken Smith, Liam Le Lievre, Cecilia Deng, Susan Thomson

zenodo. 2024. doi: 10.5281/zenodo.14195006.

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: Plant-Food-Research-Open
  • Login: Plant-Food-Research-Open
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this pipeline, please cite it as below."
authors:
  - family-names: "Rashid"
    given-names: "Usman"
    orcid: "https://orcid.org/0000-0002-1109-5493"
  - family-names: "Shiller"
    given-names: "Jason"
  - family-names: "Crowhurst"
    given-names: "Ross"
  - family-names: "Wu"
    given-names: "Chen"
  - family-names: "Chen"
    given-names: "Ting-Hsuan"
  - family-names: "Salgado"
    given-names: "Leonardo"
  - family-names: "David"
    given-names: "Charles"
  - family-names: "Bailey"
    given-names: "Sarah"
  - family-names: "Carvajal"
    given-names: "Ignacio"
  - family-names: "Rampadarath"
    given-names: "Anand"
  - family-names: "Smith"
    given-names: "Ken"
  - family-names: "Le Lievre"
    given-names: "Liam"
  - family-names: "Deng"
    given-names: "Cecilia"
  - family-names: "Thomson"
    given-names: "Susan"
title: "genepal: A Nextflow pipeline for genome and pan-genome annotation"
version: 0.7.2
date-released: 2024-11-21
url: "https://github.com/Plant-Food-Research-Open/genepal"
doi: 10.5281/zenodo.14195006

GitHub Events

Total
  • Create event: 41
  • Release event: 5
  • Issues event: 75
  • Watch event: 12
  • Delete event: 33
  • Issue comment event: 116
  • Push event: 83
  • Public event: 1
  • Pull request review comment event: 19
  • Pull request review event: 27
  • Pull request event: 90
  • Fork event: 7
Last Year
  • Create event: 41
  • Release event: 5
  • Issues event: 75
  • Watch event: 12
  • Delete event: 33
  • Issue comment event: 116
  • Push event: 83
  • Public event: 1
  • Pull request review comment event: 19
  • Pull request review event: 27
  • Pull request event: 90
  • Fork event: 7

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 23
  • Average time to close issues: 24 days
  • Average time to close pull requests: 5 days
  • Total issue authors: 5
  • Total pull request authors: 4
  • Average comments per issue: 1.47
  • Average comments per pull request: 1.04
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 17
  • Pull requests: 23
  • Average time to close issues: 24 days
  • Average time to close pull requests: 5 days
  • Issue authors: 5
  • Pull request authors: 4
  • Average comments per issue: 1.47
  • Average comments per pull request: 1.04
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • GallVp (30)
  • CeciliaDeng (7)
  • jasonshiller (4)
  • rosscrowhurst (1)
  • annabel-NZ (1)
  • stephenrdoyle (1)
Pull Request Authors
  • GallVp (42)
  • yykaya (5)
  • liamlelievre (1)
  • CeciliaDeng (1)
  • jasonshiller (1)
Top Labels
Issue Labels
enhancement (24) bug (16) discussion needed (1) awaiting-feedback (1) Stale (1) top-priority (1)
Pull Request Labels
enhancement (4) awaiting-feedback (1) bug (1)

Dependencies

.github/workflows/branch.yml actions
  • mshick/add-pr-comment b8f338c590a895d50bcbfa6c5859251edc8952fc composite
.github/workflows/ci.yml actions
  • actions/checkout 0ad4b8fadaa221de15dcec353f45205ec38ea70b composite
  • jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
  • nf-core/setup-nextflow v2 composite
.github/workflows/clean-up.yml actions
  • actions/stale 28ca1036281a5e5922ead5184a1bbf96e5fc984e composite
.github/workflows/download_pipeline.yml actions
  • actions/setup-python 82c7e631bb3cdc910f68e0081d67478d79c6982d composite
  • eWaterCycle/setup-singularity 931d4e31109e875b13309ae1d07c70ca8fbc8537 composite
  • jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
  • nf-core/setup-nextflow v2 composite
.github/workflows/fix-linting.yml actions
  • actions/checkout 0ad4b8fadaa221de15dcec353f45205ec38ea70b composite
  • actions/setup-python 82c7e631bb3cdc910f68e0081d67478d79c6982d composite
  • peter-evans/create-or-update-comment 71345be0265236311c031f5c7866368bd1eff043 composite
.github/workflows/linting.yml actions
  • actions/checkout 0ad4b8fadaa221de15dcec353f45205ec38ea70b composite
  • actions/setup-python 82c7e631bb3cdc910f68e0081d67478d79c6982d composite
  • actions/upload-artifact 65462800fd760344b1a7b4382951275a0abb4808 composite
  • nf-core/setup-nextflow v2 composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact 09f2f74827fd3a8607589e5ad7f9398816f540fe composite
  • marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite
modules/gallvp/agat/spaddintrons/meta.yml cpan
modules/gallvp/agat/spextractsequences/meta.yml cpan
modules/gallvp/braker3/meta.yml cpan
modules/gallvp/busco/busco/meta.yml cpan
modules/gallvp/busco/generateplot/meta.yml cpan
modules/gallvp/custom/restoregffids/meta.yml cpan
modules/gallvp/custom/rmouttogff3/meta.yml cpan
modules/gallvp/custom/shortenfastaids/meta.yml cpan
modules/gallvp/edta/edta/meta.yml cpan
modules/gallvp/gffread/meta.yml cpan
modules/gallvp/ltrretriever/lai/meta.yml cpan
modules/gallvp/repeatmasker/repeatmasker/meta.yml cpan
modules/nf-core/agat/convertspgff2gtf/meta.yml cpan
modules/nf-core/agat/convertspgxf2gxf/meta.yml cpan
modules/nf-core/agat/spfilterfeaturefromkilllist/meta.yml cpan
modules/nf-core/agat/spmergeannotations/meta.yml cpan
modules/nf-core/cat/cat/meta.yml cpan
modules/nf-core/cat/fastq/meta.yml cpan
modules/nf-core/eggnogmapper/meta.yml cpan
modules/nf-core/fastavalidator/meta.yml cpan
modules/nf-core/fastp/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/gffcompare/meta.yml cpan
modules/nf-core/gffread/meta.yml cpan
modules/nf-core/gt/gff3/meta.yml cpan
modules/nf-core/gunzip/meta.yml cpan
modules/nf-core/liftoff/meta.yml cpan
modules/nf-core/orthofinder/meta.yml cpan
modules/nf-core/repeatmodeler/builddatabase/meta.yml cpan
modules/nf-core/repeatmodeler/repeatmodeler/meta.yml cpan
modules/nf-core/samtools/cat/meta.yml cpan
modules/nf-core/seqkit/rmdup/meta.yml cpan
modules/nf-core/sortmerna/meta.yml cpan
modules/nf-core/star/align/meta.yml cpan
modules/nf-core/star/genomegenerate/meta.yml cpan
modules/nf-core/tsebra/meta.yml cpan
modules/nf-core/umitools/extract/meta.yml cpan
subworkflows/gallvp/fasta_edta_lai/meta.yml cpan
subworkflows/gallvp/fasta_gxf_busco_plot/meta.yml cpan
subworkflows/gallvp/gxf_fasta_agat_spaddintrons_spextractsequences/meta.yml cpan
subworkflows/nf-core/fastq_fastqc_umitools_fastp/meta.yml cpan
subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml cpan
modules/gallvp/agat/spaddintrons/environment.yml conda
  • agat 1.4.0.*
modules/gallvp/agat/spextractsequences/environment.yml conda
  • agat 1.4.0.*
modules/gallvp/busco/busco/environment.yml conda
  • busco 5.7.1.*
modules/gallvp/busco/generateplot/environment.yml conda
  • busco 5.7.1.*
modules/gallvp/custom/restoregffids/environment.yml conda
  • python 3.10.2.*
modules/gallvp/custom/rmouttogff3/environment.yml conda
  • perl-bioperl 1.7.8.*
modules/gallvp/custom/shortenfastaids/environment.yml conda
  • biopython 1.75
  • python 3.8.13
modules/gallvp/gffread/environment.yml conda
  • gffread 0.12.7.*
modules/gallvp/ltrretriever/lai/environment.yml conda
  • ltr_retriever 2.9.9.*
modules/gallvp/repeatmasker/repeatmasker/environment.yml conda
  • repeatmasker 4.1.5.*
modules/nf-core/agat/convertspgff2gtf/environment.yml conda
  • agat 1.4.0.*
modules/nf-core/agat/convertspgxf2gxf/environment.yml conda
  • agat 1.4.0.*
modules/nf-core/agat/spfilterfeaturefromkilllist/environment.yml conda
  • agat 1.4.0.*
modules/nf-core/agat/spmergeannotations/environment.yml conda
  • agat 1.4.0.*
modules/nf-core/cat/cat/environment.yml conda
  • pigz 2.3.4.*
modules/nf-core/cat/fastq/environment.yml conda
  • coreutils 8.30.*
modules/nf-core/eggnogmapper/environment.yml conda
  • eggnog-mapper 2.1.12.*
modules/nf-core/fastavalidator/environment.yml conda
  • py_fasta_validator 0.6.*
modules/nf-core/fastp/environment.yml conda
  • fastp 0.23.4.*
modules/nf-core/fastqc/environment.yml conda
  • fastqc 0.12.1.*
modules/nf-core/gffcompare/environment.yml conda
  • gffcompare 0.12.6.*
modules/nf-core/gffread/environment.yml conda
  • gffread 0.12.7.*
modules/nf-core/gt/gff3/environment.yml conda
  • genometools-genometools 1.6.5.*
modules/nf-core/gunzip/environment.yml conda
  • grep 3.11.*
  • sed 4.8.*
  • tar 1.34.*
modules/nf-core/liftoff/environment.yml conda
  • liftoff 1.6.3.*
modules/nf-core/orthofinder/environment.yml conda
  • diamond 2.1.9.*
  • orthofinder 2.5.5.*
modules/nf-core/repeatmodeler/builddatabase/environment.yml conda
  • repeatmodeler 2.0.5.*
modules/nf-core/repeatmodeler/repeatmodeler/environment.yml conda
  • repeatmodeler 2.0.5.*
modules/nf-core/samtools/cat/environment.yml conda
  • htslib 1.21.*
  • samtools 1.21.*
modules/nf-core/seqkit/rmdup/environment.yml conda
  • seqkit 2.8.1.*
modules/nf-core/sortmerna/environment.yml conda
  • sortmerna 4.3.6.*
modules/nf-core/star/align/environment.yml conda
  • gawk 5.1.0.*
  • htslib 1.18.*
  • samtools 1.18.*
  • star 2.7.10a.*
modules/nf-core/star/genomegenerate/environment.yml conda
  • gawk 5.1.0.*
  • htslib 1.18.*
  • samtools 1.18.*
  • star 2.7.10a.*
modules/nf-core/tsebra/environment.yml conda
  • tsebra 1.1.2.5.*
modules/nf-core/umitools/extract/environment.yml conda
  • umi_tools 1.1.5.*