genepal
A Nextflow pipeline for genome and pan-genome annotation
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Keywords
Repository
A Nextflow pipeline for genome and pan-genome annotation
Basic Info
- Host: GitHub
- Owner: Plant-Food-Research-Open
- License: mit
- Language: Nextflow
- Default Branch: main
- Homepage: https://github.com/Plant-Food-Research-Open/genepal/blob/main/docs/README.md
- Size: 7.39 MB
Statistics
- Stars: 13
- Watchers: 13
- Forks: 7
- Open Issues: 38
- Releases: 10
Topics
Metadata Files
README.md
plant-food-research-open/genepal
Introduction
plant-food-research-open/genepal is a bioinformatics pipeline for single genome, phased genomes and pan-genome annotation. An overview is shown in the Pipeline Flowchart and the references for the tools are listed in CITATIONS.md. Protein coding gene structures are predicted with BRAKER which uses GeneMark-ES/ET/EP+/ETP. These tools require a license for commercial works.
Pipeline Flowchart

- fasta_validator: Validate genome FASTA
- RepeatModeler or EDTA: Create TE library
- RepeatMasker: Soft mask the genome fasta
- sra-tools: RNASeq data download from SRA
- FastQC, fastp, SortMeRNA: QC, trim and filter RNASeq evidence
- STAR: RNASeq alignment
- cat: Concatenate protein FASTA files
- BRAKER: Predict protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS
- Directly provided BAM files should be
--outSAMstrandField intronMotifcompliant - With protein evidence alone, BRAKER workflow C is executed
- With protein plus RNASeq evidence, BRAKER workflow D is executed
- Directly provided BAM files should be
- Liftoff: Optionally, liftoff annotations from reference genome FASTA/GFF
- TSEBRA: Optionally, ensure that each BRAKER or both BRAKER and Liftoff models have full intron support
- AGAT
- Merge multi-reference liftoffs
- Remove liftoff transcripts marked by validORF=False_
- Remove liftoff genes with any intron shorter than 10 bp
- Remove rRNA, tRNA and other non-protein coding models from liftoff
- Optionally, allow or remove iso-forms
- Remove BRAKER models from Liftoff loci
- Merge Liftoff and BRAKER models
- Optionally, remove models without any EggNOG-mapper hits
- Optionally, remove models with ORFs shorter than
Namino acids
- EggNOG-mapper: Add functional annotation to gff
- GenomeTools: GFF format validation
- GffRead: Extraction of protein sequences and filtering of BRAKER models with invalid ORF(s)
- OrthoFinder: Perform phylogenetic orthology inference across genomes
- GffCompare: Compare and benchmark against an existing annotation
- BUSCO: Completeness statistics for genome and annotation through proteins
- R Markdown: Specialized pangene analysis
- MultiQC: Exhaustive QC statistics
Usage
Refer to usage, parameters and output documents for details.
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
First, prepare an assemblysheet with your input genomes that looks as follows:
assemblysheet.csv:
csv
tag ,fasta ,is_masked
a_thaliana ,/path/to/genome.fa ,yes
Each row represents an input genome and the fields are:
tag:A unique tag which represents the genome throughout the pipelinefasta:fasta file for the genomeis_masked: yes or no to denote whether the fasta file is already masked or not
At minimum, a file with proteins as evidence is also required. Now, you can run the pipeline using:
bash
nextflow run plant-food-research-open/genepal \
-revision <version> \
-profile <docker/singularity/.../institute> \
--input assemblysheet.csv \
--protein_evidence proteins.faa \
--outdir <OUTDIR>
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Plant&Food Users
Download the pipeline to your /workspace/$USER folder. Change the parameters defined in the pfr/params.json file. Submit the pipeline to SLURM for execution.
bash
sbatch ./pfr_genepal
Credits
plant-food-research-open/genepal workflows were originally scripted by Jason Shiller (@jasonshiller). Usman Rashid (@gallvp) wrote the Nextflow pipeline.
We thank the following people for extensive assistance in the development of the pipeline,
- Cecilia Deng @CeciliaDeng
- Charles David @charlesdavid
- Chen Wu @christinawu2008
- Leonardo Salgado @leorippel
- Ross Crowhurst @rosscrowhurst
- Susan Thomson @cflsjt
- Ting-Hsuan Chen @ting-hsuan-chen
and for contributions to the codebase,
- Liam Le Lievre @liamlelievre
The pipeline uses nf-core modules contributed by following authors:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
If you use plant-food-research-open/genepal for your analysis, please cite it as:
genepal: A Nextflow pipeline for genome and pan-genome annotation.
Usman Rashid, Jason Shiller, Ross Crowhurst, Chen Wu, Ting-Hsuan Chen, Leonardo Salgado, Charles David, Sarah Bailey, Ignacio Carvajal, Anand Rampadarath, Ken Smith, Liam Le Lievre, Cecilia Deng, Susan Thomson
zenodo. 2024. doi: 10.5281/zenodo.14195006.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: Plant-Food-Research-Open
- Login: Plant-Food-Research-Open
- Kind: organization
- Repositories: 1
- Profile: https://github.com/Plant-Food-Research-Open
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this pipeline, please cite it as below."
authors:
- family-names: "Rashid"
given-names: "Usman"
orcid: "https://orcid.org/0000-0002-1109-5493"
- family-names: "Shiller"
given-names: "Jason"
- family-names: "Crowhurst"
given-names: "Ross"
- family-names: "Wu"
given-names: "Chen"
- family-names: "Chen"
given-names: "Ting-Hsuan"
- family-names: "Salgado"
given-names: "Leonardo"
- family-names: "David"
given-names: "Charles"
- family-names: "Bailey"
given-names: "Sarah"
- family-names: "Carvajal"
given-names: "Ignacio"
- family-names: "Rampadarath"
given-names: "Anand"
- family-names: "Smith"
given-names: "Ken"
- family-names: "Le Lievre"
given-names: "Liam"
- family-names: "Deng"
given-names: "Cecilia"
- family-names: "Thomson"
given-names: "Susan"
title: "genepal: A Nextflow pipeline for genome and pan-genome annotation"
version: 0.7.2
date-released: 2024-11-21
url: "https://github.com/Plant-Food-Research-Open/genepal"
doi: 10.5281/zenodo.14195006
GitHub Events
Total
- Create event: 41
- Release event: 5
- Issues event: 75
- Watch event: 12
- Delete event: 33
- Issue comment event: 116
- Push event: 83
- Public event: 1
- Pull request review comment event: 19
- Pull request review event: 27
- Pull request event: 90
- Fork event: 7
Last Year
- Create event: 41
- Release event: 5
- Issues event: 75
- Watch event: 12
- Delete event: 33
- Issue comment event: 116
- Push event: 83
- Public event: 1
- Pull request review comment event: 19
- Pull request review event: 27
- Pull request event: 90
- Fork event: 7
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 17
- Total pull requests: 23
- Average time to close issues: 24 days
- Average time to close pull requests: 5 days
- Total issue authors: 5
- Total pull request authors: 4
- Average comments per issue: 1.47
- Average comments per pull request: 1.04
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 17
- Pull requests: 23
- Average time to close issues: 24 days
- Average time to close pull requests: 5 days
- Issue authors: 5
- Pull request authors: 4
- Average comments per issue: 1.47
- Average comments per pull request: 1.04
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- GallVp (30)
- CeciliaDeng (7)
- jasonshiller (4)
- rosscrowhurst (1)
- annabel-NZ (1)
- stephenrdoyle (1)
Pull Request Authors
- GallVp (42)
- yykaya (5)
- liamlelievre (1)
- CeciliaDeng (1)
- jasonshiller (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- mshick/add-pr-comment b8f338c590a895d50bcbfa6c5859251edc8952fc composite
- actions/checkout 0ad4b8fadaa221de15dcec353f45205ec38ea70b composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- nf-core/setup-nextflow v2 composite
- actions/stale 28ca1036281a5e5922ead5184a1bbf96e5fc984e composite
- actions/setup-python 82c7e631bb3cdc910f68e0081d67478d79c6982d composite
- eWaterCycle/setup-singularity 931d4e31109e875b13309ae1d07c70ca8fbc8537 composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- nf-core/setup-nextflow v2 composite
- actions/checkout 0ad4b8fadaa221de15dcec353f45205ec38ea70b composite
- actions/setup-python 82c7e631bb3cdc910f68e0081d67478d79c6982d composite
- peter-evans/create-or-update-comment 71345be0265236311c031f5c7866368bd1eff043 composite
- actions/checkout 0ad4b8fadaa221de15dcec353f45205ec38ea70b composite
- actions/setup-python 82c7e631bb3cdc910f68e0081d67478d79c6982d composite
- actions/upload-artifact 65462800fd760344b1a7b4382951275a0abb4808 composite
- nf-core/setup-nextflow v2 composite
- dawidd6/action-download-artifact 09f2f74827fd3a8607589e5ad7f9398816f540fe composite
- marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite
- agat 1.4.0.*
- agat 1.4.0.*
- busco 5.7.1.*
- busco 5.7.1.*
- python 3.10.2.*
- perl-bioperl 1.7.8.*
- biopython 1.75
- python 3.8.13
- gffread 0.12.7.*
- ltr_retriever 2.9.9.*
- repeatmasker 4.1.5.*
- agat 1.4.0.*
- agat 1.4.0.*
- agat 1.4.0.*
- agat 1.4.0.*
- pigz 2.3.4.*
- coreutils 8.30.*
- eggnog-mapper 2.1.12.*
- py_fasta_validator 0.6.*
- fastp 0.23.4.*
- fastqc 0.12.1.*
- gffcompare 0.12.6.*
- gffread 0.12.7.*
- genometools-genometools 1.6.5.*
- grep 3.11.*
- sed 4.8.*
- tar 1.34.*
- liftoff 1.6.3.*
- diamond 2.1.9.*
- orthofinder 2.5.5.*
- repeatmodeler 2.0.5.*
- repeatmodeler 2.0.5.*
- htslib 1.21.*
- samtools 1.21.*
- seqkit 2.8.1.*
- sortmerna 4.3.6.*
- gawk 5.1.0.*
- htslib 1.18.*
- samtools 1.18.*
- star 2.7.10a.*
- gawk 5.1.0.*
- htslib 1.18.*
- samtools 1.18.*
- star 2.7.10a.*
- tsebra 1.1.2.5.*
- umi_tools 1.1.5.*































