genbank-to

Convert genbank files to a swath of other formats

https://github.com/linsalrob/genbank_to

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Convert genbank files to a swath of other formats

Basic Info
  • Host: GitHub
  • Owner: linsalrob
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 46.9 KB
Statistics
  • Stars: 18
  • Watchers: 3
  • Forks: 1
  • Open Issues: 2
  • Releases: 3
Created almost 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

genbank_to

Edwards Lab DOI License: MIT GitHub language count PyPi

A straightforward application to convert NCBI GenBank format files to a swath of other formats. Hopefully we have the format you need, but if not either post an issue using our template, or if you have already got it working, post a PR so we can add it and add you to the project.

You might also be interested deprekate's package called genbank which includes several of the features here, and you can import genbank into your Python projects.

What it does

Read an NCBI GenBank format file (like our test data) and convert it to one of many different formats.

Input formats

At the moment we only support NCBI GenBank format. If you want us to read other common formats, let us know and we'll add them.

Output formats

Here are the output formats you can request. You can request as many of these at once as you like!

These outputs are assuming you provide a (for example) genome file that contains ORFs, Proteins, and Genomes.

Nucleotide output

  • -n or --nucleotide outputs the whole DNA sequence (e.g. the genome)
  • -o or --orfs outputs the DNA sequence of the open reading frames

Protein output

  • -a or --aminoacids outputs the protein sequence for each of the open reading frames

Complex formats

  • -p or --ptt NCBI ptt protein table. This is a somewhat deprecated NCBI format from their genomes downloads
  • -f or --functions outputs tab separated data of protein ID and protein function (also called the product)
  • --gff3 outputs GFF3 format
  • --amr outputs a GFF file, an amino acid fasta file, and a nucleotide fasta file as required by AMR Finder Plus. Note that this format checks for validity that often crashes AMRFinderPlus
  • --phage_finder outputs a unique format required by phage_finder

Output options

  • --pseudo normally we skip pseudogenes (e.g. in creating amino acid fasta files). This will try and include pseudogenes, but often biopython complains and ignores them!
  • -i or --seqid only output this sequence, or these sequences if you specify more than one -i/--seqid
  • -z or --zip compress some of the outputs
  • --log write logs to a different file

Separate multi-GenBank files

If your GenBank files contains multiple sequence records (separated with //), you can provide the --separate flag. This will write each entry into its own file. This is compatible with -n/--nucleotide, -o/--orfs, and -a/--aminoacids. However, if you provide the --separate flag on its own, it will write each entry in your multi-GenBank file to its own GenBank file.

Examples

All of these examples use our test data

  1. Extract a fasta of the genome:

bash genbank_to -g test/NC_001417.gbk -n test/NC_001417.fna

  1. Extract the DNA sequences of the ORFs to a single file

bash genbank_to -g test/NC_001417.gbk -o test/NC_001417.orfs

  1. Extract the protein (amino acid) sequences of the ORFs to a file

bash genbank_to -g test/NC_001417.gbk -a test/NC_001417.faa

  1. Do all of these at once

bash genbank_to -g test/NC_001417.gbk -n test/NC_001417.fna -o test/NC_001417.orfs -a test/NC_001417.faa

Installation

You can install genbank_to in three different ways:

  1. Using conda

This is the easiest and recommended method.

bash mamba create -n genbank_to genbank_to conda activate genbank_to genbank_to --help

  1. Using pip

I recommend putting this into a virtual environment:

bash virtualenv venv source venv/bin/activate pip install genbank_to genbank_to --help

  1. Directly from this repository

(Not really recommended as things might break)

bash git clone https://github.com/linsalrob/genbank_to.git cd genbank_to virtualenv venv source venv/bin/activate python setup.py install genbank_to --help

Owner

  • Name: Rob Edwards
  • Login: linsalrob
  • Kind: user
  • Location: Adelaide, Australia
  • Company: Flinders University

Professor of CS and Biology Writing bioinformatics code to study viruses, phages, and metagenomes.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Edwards
    given-names: Robert
    orcid: https://orcid.org/0000-0001-8383-8949
title: linsalrob/genbank_to: AMRFinder Goodness
version: v0.4
date-released: 2022-04-19

GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 24
  • Total Committers: 1
  • Avg Commits per committer: 24.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
linsalrob r****s@g****m 24

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tseemann (1)
  • courtherms (1)
  • kevinmyers (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 39 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 1
pypi.org: genbank-to

Convert GenBank format files to a swath of other formats

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 39 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 21.5%
Dependent repos count: 21.7%
Average: 21.8%
Forks count: 22.6%
Downloads: 33.4%
Maintainers (1)
Last synced: 8 months ago

Dependencies

requirements.txt pypi
  • bcbio-gff *
  • biopython *
  • pandas *
setup.py pypi
  • bcbio-gff *
  • biopython *
  • numpy *
  • pandas *