orfipy

Fast and flexible ORF finder

https://github.com/urmi-21/orfipy

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary

Keywords

bioinformatics codon-tables dna extract-orfs orf-detection orf-finder orf-search protein python
Last synced: 6 months ago · JSON representation

Repository

Fast and flexible ORF finder

Basic Info
  • Host: GitHub
  • Owner: urmi-21
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 5.3 MB
Statistics
  • Stars: 73
  • Watchers: 4
  • Forks: 9
  • Open Issues: 4
  • Releases: 0
Topics
bioinformatics codon-tables dna extract-orfs orf-detection orf-finder orf-search protein python
Created over 5 years ago · Last pushed about 4 years ago
Metadata Files
Readme License

README.md

Build Status PyPI - Python Version install with bioconda install with bioconda PyPI Downloads publication

Introduction

orfipy is a tool written in python/cython to extract ORFs in an extremely and fast and flexible manner. Other popular ORF searching tools are OrfM and getorf. Compared to OrfM and getorf, orfipy provides the most options to fine tune ORF searches. orfipy uses multiple CPU cores and is particularly faster for data containing multiple smaller fasta sequences such as de-novo transcriptome assemblies. Please read the paper here.

Please cite as: Urminder Singh, Eve Syrkin Wurtele, orfipy: a fast and flexible tool for extracting ORFs, Bioinformatics, 2021;, btab090, https://doi.org/10.1093/bioinformatics/btab090

Installation

Install latest stable version

pip install orfipy Or install via conda

``` conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge

conda create -n orfipy -c bioconda orfipy ```

Install the development version from source

git clone https://github.com/urmi-21/orfipy.git cd orfipy pip install .

or use pip

pip install git+git://github.com/urmi-21/orfipy.git

Examples

Details of orfipy algorithm are in the paper. Please go through the SI if you are interested to know differences between orfipy and other ORF finder tools and how to set orfipy parameters to match the output of other tools.

Below are some usage examples for orfipy

To see full list of options use the command:

orfipy -h

Input

orfipy version 0.0.3 and above, supports sequences in Fasta/Fastq format (orfipy uses pyfastx). Input files can be in .gz format.

Extract ORF sequences and write ORF sequences in orfs.fa file

orfipy input.fasta --dna orfs.fa --min 10 --max 10000 --procs 4 --table 1 --outdir orfs_out

Use standard codon table but use only ATG as start codon

orfipy input.fa.gz --dna orfs.fa --start ATG Note: Users can also provide their own translation table, as a .json file, to orfipy using --table option. Example of json file containing a valid translation table is here

See available codon tables ``` orfipy --show-table

```

Extract ORFs BED file orfipy input.fasta --bed orfs.bed --min 50 --procs 4 or orfipy input.fasta --min 50 --procs 4 > orfs.bed

Extract ORFs BED12 file

Note: Add --include-stop for orfipy output to be consistent with Transdecoder.Predict output .bed file.

orfipy testseq.fa --min 100 --bed12 of.bed --partial-5 --partial-3 --include-stop

Extract ORFs peptide sequences using default translation table orfipy input.fasta --pep orfs_peptides.fa --min 50 --procs 4

API

Users can directly import the ORF search algorithm, written in cython, in their python ecosystem.

```

import orfipycore seq='ATGCATGACTAGCATCAGCATCAGCAT' for start,stop,strand,description in orfipycore.orfs(seq,minlen=3,maxlen=1000): ... print(start,stop,strand,description) ... 0 9 + ID=SeqORF.1;ORFtype=complete;ORFlen=9;ORFframe=1;Start:ATG;Stop:TAG

`` orfipy_core.orfs` function can take following arguments

  • seq: Required input sequence (str)
  • name ['Seq'] Name (str)
  • minlen [0] min length (int)
  • maxlen [1000000] max length (int)
  • strand ['b'] Strand to use, (b)oth, (f)wd or (r)ev (char)
  • starts ['TTG','CTG','ATG'] Start codons to use (list)
  • stops=['TAA','TAG','TGA'] Stop codons to use (list)
  • include_stop [False] Include stop codon in ORF (bool)
  • partial3 [False] Report ORFs without a stop (bool)
  • partial5 [False] Report ORFs without a start (bool)
  • between_stops [False] Report ORFs defined as between stops (bool)

Comparison with getorf and OrfM

Comparison of orfipy features and performance with getorf and OrfM. Tools were run on different data and ORFs were output to both nucleotide and peptide Fasta files (fasta), only peptide Fasta (peptide) and BED (bed). For details see the publication and SI

  • orfipy is most flexible, particularly faster for data containing multiple smaller fasta sequences such as de-novo transcriptome assemblies or collection of microbial genomes.
  • OrfM is fast (faster for Fastq), uses less memory, but ORF search options are limited
  • getorf is memory efficient but slower, no Fastq support. Provides some flexibility in ORF searches.

Funding

This work is funded in part by the National Science Foundation award IOS 1546858, "Orphan Genes: An Untapped Genetic Reservoir of Novel Traits". This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 (Bridges HPC environment through allocations TG-MCB190098 and TG-MCB200123 awarded from XSEDE and HPC Consortium).

Owner

  • Name: Urminder Singh
  • Login: urmi-21
  • Kind: user

Bioinformatics Scientist

GitHub Events

Total
  • Watch event: 15
  • Issue comment event: 5
  • Fork event: 2
Last Year
  • Watch event: 15
  • Issue comment event: 5
  • Fork event: 2

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 215
  • Total Committers: 2
  • Avg Commits per committer: 107.5
  • Development Distribution Score (DDS): 0.005
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
urmi-21 m****1@g****m 214
eve-syrkin-wurtele m****h@i****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 17
  • Total pull requests: 1
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 16
  • Total pull request authors: 1
  • Average comments per issue: 2.29
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • EnzoAndree (2)
  • krabapple (1)
  • Prakash2403 (1)
  • urmi-21 (1)
  • wwood (1)
  • BenjaminGuinet (1)
  • IgorFesenko (1)
  • freiburgermsu (1)
  • VJ-Ulaganathan (1)
  • lijing28101 (1)
  • AndyLy2Zy (1)
  • AlexanderBartholomaeus (1)
  • xiekunwhy (1)
  • apoosakkannu (1)
  • brettyout (1)
Pull Request Authors
  • eve-syrkin-wurtele (1)
Top Labels
Issue Labels
enhancement (3)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 213 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 4
  • Total maintainers: 1
pypi.org: orfipy

orfipy

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 213 Last month
Rankings
Stargazers count: 10.1%
Dependent packages count: 10.1%
Downloads: 11.9%
Forks count: 13.3%
Average: 13.4%
Dependent repos count: 21.5%
Maintainers (1)
Last synced: 7 months ago

Dependencies

requirements.txt pypi
  • colorama *
  • cython *
  • psutil *
  • pyahocorasick *
  • pyfastx *
setup.py pypi
  • line.rstrip *