GFF3toEMBL
GFF3toEMBL: Preparing annotated assemblies for submission to EMBL - Published in JOSS (2016)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
3 of 10 committers (30.0%) from academic institutions -
✓Institutional organization owner
Organization sanger-pathogens has institutional domain (www.sanger.ac.uk) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Keywords
Repository
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Basic Info
- Host: GitHub
- Owner: sanger-pathogens
- License: other
- Language: Python
- Default Branch: master
- Homepage: http://sanger-pathogens.github.io/gff3toembl/
- Size: 2.92 MB
Statistics
- Stars: 29
- Watchers: 14
- Forks: 12
- Open Issues: 5
- Releases: 5
Topics
Metadata Files
README.md
GFF3toEMBL
Converts GFF3 files from Prokka into a format suitable for submission to EMBL.
Contents
Introduction
Submitting annoated genomes to EMBL is a very difficult and time consuming process. This software converts GFF3 files from the most commonly use prokaryote annotation tool Prokka into a format that is suitable for submission to EMBL. It has been used to prepare more than 30% of all annotated genomes in EMBL/GenBank.
N.B. This implements some EMBL specific conventions and is not a generic conversion tool. It is also not a validator, so you need to pass in parameters which are acceptable to EMBL.
Installation
GFF3toEMBL has the following dependencies:
Required dependencies
There are a number of ways to install GFF3toEMBL and details are provided below. If you encounter an issue when installing GFF3toEMBL please contact your local system administrator. If you encounter a bug please log it here or email us at path-help@sanger.ac.uk.
Docker
A docker container is provided with all of the dependancies setup and installed. To install the container:
docker pull sangerpathogens/gff3toembl
To run the script from within the container on test data (substituting /home/ubuntu/data for your own directory):
docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/gff3toembl gff3_to_embl --output_filename /data/output_file.embl ABC 123 PRJ1234 ABC /opt/gff3toembl-1.1.0/gff3toembl/tests/data/single_feature.gff
From source
This is for advanced users. The homebrew recipe, Dockerfile and the TravisCI install dependancies script all contain steps to setup depenancies and install the software so might be worth looking at for hints.
- Install genometools including python bindings
- git clone git@github.com:sanger-pathogens/gff3toembl.git
- python setup.py install
Running the tests
Run python setup.py test
Usage
``` usage: gff3toembl [-h] [--authors AUTHORS] [--title TITLE] [--publication PUBLICATION] [--genometype GENOMETYPE] [--classification CLASSIFICATION] [--outputfilename OUTPUTFILENAME] [--locustag LOCUSTAG] [--translationtable TRANSLATIONTABLE] [--chromosomelist CHROMOSOMELIST] [--version] organism taxonid project_accession description file
Converts prokaryote GFF3 annotations to EMBL for ENA submission. Cite http://dx.doi.org/10.21105/joss.00080
positional arguments: organism Organism taxonid Taxon id project_accession Accession number for the project description Genus species subspecies strain of organism file GFF3 filename
optional arguments: -h, --help show this help message and exit --authors AUTHORS, -i AUTHORS Authors (in the EMBL RA line style) --title TITLE, -m TITLE Title of paper (in the EMBL RT line style) --publication PUBLICATION, -p PUBLICATION Publication or journal name (in the EMBL RL line style) --genometype GENOMETYPE, -g GENOMETYPE Genome type (linear/circular) --classification CLASSIFICATION, -c CLASSIFICATION Classification (PROK/UNC/..) --outputfilename OUTPUTFILENAME, -f OUTPUTFILENAME Output filename --locustag LOCUSTAG, -l LOCUSTAG Overwrite the locus tag in the annotation file --translationtable TRANSLATIONTABLE, -n TRANSLATIONTABLE Translation table --chromosomelist CHROMOSOMELIST, -d CHROMOSOME_LIST Create a chromosome list file, and use the supplied name --version show program's version number and exit ```
An example:
gff3_to_embl --authors 'John' --title 'Some title' --publication 'Some journal' \
--genome_type 'circular' --classification 'PROK' \
--output_filename /tmp/single_feature.embl --translation_table 11 \
Organism 1234 'My project' 'My description' gff3toembl/tests/data/single_feature.gff
Example data
The directory 'example_data' contains an input GFF file and the output file along with the command.
License
GFF3toEMBL is free software, licensed under GPLv3.
Feedback/Issues
Please report any issues to the issues page or email path-help@sanger.ac.uk.
Citation
If you use this software please cite:
GFF3toEMBL: Preparing annotated assemblies for submission to EMBL
Andrew J. Page, Sascha Steinbiss, Ben Taylor, Torsten Seemann, Jacqueline A. Keane
The Journal of Open Source Software, 1 (6) 2016. doi: 10.21105/joss.00080
Known Issues
This doesn't work with some versions of Genometools on Mac OS X; it appears to work with Genometools 1.5.4
Owner
- Name: Pathogen Informatics, Wellcome Sanger Institute
- Login: sanger-pathogens
- Kind: organization
- Location: Hinxton, Cambs., UK
- Website: http://www.sanger.ac.uk/science/groups/pathogen-informatics
- Repositories: 54
- Profile: https://github.com/sanger-pathogens
GitHub Events
Total
- Member event: 1
Last Year
- Member event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| andrewjpage | a****e@g****m | 148 |
| Ben Taylor | b****5@g****m | 75 |
| peterjc | p****k@g****m | 16 |
| Sascha Steinbiss | s****a@s****e | 5 |
| Sara Sjunnebo | s****4@s****k | 3 |
| Martin Aslett | m****a@s****k | 2 |
| Cashalow | s****u@g****m | 2 |
| Shaun Jackman | s****n@g****m | 1 |
| Sara Sjunnebo | s****o@g****m | 1 |
| Ben Taylor | b****5@m****k | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 22
- Total pull requests: 56
- Average time to close issues: about 2 months
- Average time to close pull requests: about 5 hours
- Total issue authors: 9
- Total pull request authors: 8
- Average comments per issue: 3.68
- Average comments per pull request: 0.29
- Merged pull requests: 55
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- peterjc (10)
- tseemann (4)
- andersgs (2)
- apredeus (1)
- ireneortega (1)
- dutchscientist (1)
- tobsecret (1)
- sjackman (1)
- mcnelsonphd (1)
Pull Request Authors
- andrewjpage (33)
- peterjc (8)
- bewt85 (7)
- ssjunnebo (3)
- satta (2)
- aslett1 (1)
- sachalau (1)
- sjackman (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- six *