genbank-to
Convert genbank files to a swath of other formats
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
Convert genbank files to a swath of other formats
Basic Info
- Host: GitHub
- Owner: linsalrob
- License: mit
- Language: Python
- Default Branch: main
- Size: 46.9 KB
Statistics
- Stars: 18
- Watchers: 3
- Forks: 1
- Open Issues: 2
- Releases: 3
Metadata Files
README.md
genbank_to
A straightforward application to convert NCBI GenBank format files to a swath of other formats. Hopefully we have the format you need, but if not either post an issue using our template, or if you have already got it working, post a PR so we can add it and add you to the project.
You might also be interested deprekate's package called genbank which includes
several of the features here, and you can import genbank into your Python projects.
What it does
Read an NCBI GenBank format file (like our test data) and convert it to one of many different formats.
Input formats
At the moment we only support NCBI GenBank format. If you want us to read other common formats, let us know and we'll add them.
Output formats
Here are the output formats you can request. You can request as many of these at once as you like!
These outputs are assuming you provide a (for example) genome file that contains ORFs, Proteins, and Genomes.
Nucleotide output
-nor--nucleotideoutputs the whole DNA sequence (e.g. the genome)-oor--orfsoutputs the DNA sequence of the open reading frames
Protein output
-aor--aminoacidsoutputs the protein sequence for each of the open reading frames
Complex formats
-por--pttNCBI ptt protein table. This is a somewhat deprecated NCBI format from their genomes downloads-for--functionsoutputs tab separated data ofprotein IDandprotein function(also called theproduct)--gff3outputs GFF3 format--amroutputs a GFF file, an amino acid fasta file, and a nucleotide fasta file as required by AMR Finder Plus. Note that this format checks for validity that often crashes AMRFinderPlus--phage_finderoutputs a unique format required by phage_finder
Output options
--pseudonormally we skip pseudogenes (e.g. in creating amino acid fasta files). This will try and include pseudogenes, but often biopython complains and ignores them!-ior--seqidonly output this sequence, or these sequences if you specify more than one-i/--seqid-zor--zipcompress some of the outputs--logwrite logs to a different file
Separate multi-GenBank files
If your GenBank files contains multiple sequence records (separated with //), you can provide the --separate flag.
This will write each entry into its own file. This is compatible with -n/--nucleotide, -o/--orfs, and
-a/--aminoacids. However, if you provide the --separate flag on its own, it will write each entry in your
multi-GenBank file to its own GenBank file.
Examples
All of these examples use our test data
- Extract a
fastaof the genome:
bash
genbank_to -g test/NC_001417.gbk -n test/NC_001417.fna
- Extract the DNA sequences of the ORFs to a single file
bash
genbank_to -g test/NC_001417.gbk -o test/NC_001417.orfs
- Extract the protein (amino acid) sequences of the ORFs to a file
bash
genbank_to -g test/NC_001417.gbk -a test/NC_001417.faa
- Do all of these at once
bash
genbank_to -g test/NC_001417.gbk -n test/NC_001417.fna -o test/NC_001417.orfs -a test/NC_001417.faa
Installation
You can install genbank_to in three different ways:
- Using conda
This is the easiest and recommended method.
bash
mamba create -n genbank_to genbank_to
conda activate genbank_to
genbank_to --help
- Using pip
I recommend putting this into a virtual environment:
bash
virtualenv venv
source venv/bin/activate
pip install genbank_to
genbank_to --help
- Directly from this repository
(Not really recommended as things might break)
bash
git clone https://github.com/linsalrob/genbank_to.git
cd genbank_to
virtualenv venv
source venv/bin/activate
python setup.py install
genbank_to --help
Owner
- Name: Rob Edwards
- Login: linsalrob
- Kind: user
- Location: Adelaide, Australia
- Company: Flinders University
- Website: http://edwards.flinders.edu.au/
- Twitter: linsalrob
- Repositories: 31
- Profile: https://github.com/linsalrob
Professor of CS and Biology Writing bioinformatics code to study viruses, phages, and metagenomes.
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Edwards
given-names: Robert
orcid: https://orcid.org/0000-0001-8383-8949
title: linsalrob/genbank_to: AMRFinder Goodness
version: v0.4
date-released: 2022-04-19
GitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 3
- Total pull request authors: 0
- Average comments per issue: 0.67
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tseemann (1)
- courtherms (1)
- kevinmyers (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 39 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 7
- Total maintainers: 1
pypi.org: genbank-to
Convert GenBank format files to a swath of other formats
- Homepage: https://github.com/linsalrob/genbank_to
- Documentation: https://genbank-to.readthedocs.io/
- License: The MIT License (MIT)
-
Latest release: 0.42
published almost 4 years ago
Rankings
Maintainers (1)
Dependencies
- bcbio-gff *
- biopython *
- pandas *
- bcbio-gff *
- biopython *
- numpy *
- pandas *