GFF3toEMBL

GFF3toEMBL: Preparing annotated assemblies for submission to EMBL - Published in JOSS (2016)

https://github.com/sanger-pathogens/gff3toembl

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    3 of 10 committers (30.0%) from academic institutions
  • Institutional organization owner
    Organization sanger-pathogens has institutional domain (www.sanger.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing
Last synced: 6 months ago · JSON representation

Repository

Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI

Basic Info
Statistics
  • Stars: 29
  • Watchers: 14
  • Forks: 12
  • Open Issues: 5
  • Releases: 5
Topics
bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing
Created over 11 years ago · Last pushed about 7 years ago
Metadata Files
Readme Changelog License Authors

README.md

GFF3toEMBL

Converts GFF3 files from Prokka into a format suitable for submission to EMBL.

Build Status
License: GPL v3
status
install with bioconda
Container ready
Docker Build Status
Docker Pulls
codecov

Contents

Introduction

Submitting annoated genomes to EMBL is a very difficult and time consuming process. This software converts GFF3 files from the most commonly use prokaryote annotation tool Prokka into a format that is suitable for submission to EMBL. It has been used to prepare more than 30% of all annotated genomes in EMBL/GenBank.

N.B. This implements some EMBL specific conventions and is not a generic conversion tool. It is also not a validator, so you need to pass in parameters which are acceptable to EMBL.

Installation

GFF3toEMBL has the following dependencies:

Required dependencies

There are a number of ways to install GFF3toEMBL and details are provided below. If you encounter an issue when installing GFF3toEMBL please contact your local system administrator. If you encounter a bug please log it here or email us at path-help@sanger.ac.uk.

Docker

A docker container is provided with all of the dependancies setup and installed. To install the container:

docker pull sangerpathogens/gff3toembl

To run the script from within the container on test data (substituting /home/ubuntu/data for your own directory):

docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/gff3toembl gff3_to_embl --output_filename /data/output_file.embl ABC 123 PRJ1234 ABC /opt/gff3toembl-1.1.0/gff3toembl/tests/data/single_feature.gff

From source

This is for advanced users. The homebrew recipe, Dockerfile and the TravisCI install dependancies script all contain steps to setup depenancies and install the software so might be worth looking at for hints.

  • Install genometools including python bindings
  • git clone git@github.com:sanger-pathogens/gff3toembl.git
  • python setup.py install

Running the tests

Run python setup.py test

Usage

``` usage: gff3toembl [-h] [--authors AUTHORS] [--title TITLE] [--publication PUBLICATION] [--genometype GENOMETYPE] [--classification CLASSIFICATION] [--outputfilename OUTPUTFILENAME] [--locustag LOCUSTAG] [--translationtable TRANSLATIONTABLE] [--chromosomelist CHROMOSOMELIST] [--version] organism taxonid project_accession description file

Converts prokaryote GFF3 annotations to EMBL for ENA submission. Cite http://dx.doi.org/10.21105/joss.00080

positional arguments: organism Organism taxonid Taxon id project_accession Accession number for the project description Genus species subspecies strain of organism file GFF3 filename

optional arguments: -h, --help show this help message and exit --authors AUTHORS, -i AUTHORS Authors (in the EMBL RA line style) --title TITLE, -m TITLE Title of paper (in the EMBL RT line style) --publication PUBLICATION, -p PUBLICATION Publication or journal name (in the EMBL RL line style) --genometype GENOMETYPE, -g GENOMETYPE Genome type (linear/circular) --classification CLASSIFICATION, -c CLASSIFICATION Classification (PROK/UNC/..) --outputfilename OUTPUTFILENAME, -f OUTPUTFILENAME Output filename --locustag LOCUSTAG, -l LOCUSTAG Overwrite the locus tag in the annotation file --translationtable TRANSLATIONTABLE, -n TRANSLATIONTABLE Translation table --chromosomelist CHROMOSOMELIST, -d CHROMOSOME_LIST Create a chromosome list file, and use the supplied name --version show program's version number and exit ```

An example: gff3_to_embl --authors 'John' --title 'Some title' --publication 'Some journal' \ --genome_type 'circular' --classification 'PROK' \ --output_filename /tmp/single_feature.embl --translation_table 11 \ Organism 1234 'My project' 'My description' gff3toembl/tests/data/single_feature.gff

Example data

The directory 'example_data' contains an input GFF file and the output file along with the command.

License

GFF3toEMBL is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page or email path-help@sanger.ac.uk.

Citation

If you use this software please cite:

GFF3toEMBL: Preparing annotated assemblies for submission to EMBL
Andrew J. Page, Sascha Steinbiss, Ben Taylor, Torsten Seemann, Jacqueline A. Keane
The Journal of Open Source Software, 1 (6) 2016. doi: 10.21105/joss.00080

Known Issues

This doesn't work with some versions of Genometools on Mac OS X; it appears to work with Genometools 1.5.4

Owner

  • Name: Pathogen Informatics, Wellcome Sanger Institute
  • Login: sanger-pathogens
  • Kind: organization
  • Location: Hinxton, Cambs., UK

GitHub Events

Total
  • Member event: 1
Last Year
  • Member event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 254
  • Total Committers: 10
  • Avg Commits per committer: 25.4
  • Development Distribution Score (DDS): 0.417
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
andrewjpage a****e@g****m 148
Ben Taylor b****5@g****m 75
peterjc p****k@g****m 16
Sascha Steinbiss s****a@s****e 5
Sara Sjunnebo s****4@s****k 3
Martin Aslett m****a@s****k 2
Cashalow s****u@g****m 2
Shaun Jackman s****n@g****m 1
Sara Sjunnebo s****o@g****m 1
Ben Taylor b****5@m****k 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 22
  • Total pull requests: 56
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 5 hours
  • Total issue authors: 9
  • Total pull request authors: 8
  • Average comments per issue: 3.68
  • Average comments per pull request: 0.29
  • Merged pull requests: 55
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • peterjc (10)
  • tseemann (4)
  • andersgs (2)
  • apredeus (1)
  • ireneortega (1)
  • dutchscientist (1)
  • tobsecret (1)
  • sjackman (1)
  • mcnelsonphd (1)
Pull Request Authors
  • andrewjpage (33)
  • peterjc (8)
  • bewt85 (7)
  • ssjunnebo (3)
  • satta (2)
  • aslett1 (1)
  • sachalau (1)
  • sjackman (1)
Top Labels
Issue Labels
bug (5) enhancement (2) question (1)
Pull Request Labels

Dependencies

setup.py pypi
  • six *