https://github.com/a-slide/blastpy3

Simple and lightweight Python 3 wrapper module for NCBI BLAST+

https://github.com/a-slide/blastpy3

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Simple and lightweight Python 3 wrapper module for NCBI BLAST+

Basic Info
  • Host: GitHub
  • Owner: a-slide
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 93.8 KB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Archived
Created about 11 years ago · Last pushed almost 6 years ago
Metadata Files
Readme License

README.md

Blastpy3

PyPI version Downloads Anaconda Version Anaconda Downloads License Language


Lightweight High level Python 3 API for NCBI BLAST+ blastn


Blastn

This class contain the wrapper for Blastn and require the installation of ncbi Blast+ 2.2.28+.

Setup Blastn object: Create subject database

Upon instantiation, a database is created from the user-provided subject sequence. Database files are created in a temporary directory. The following parameters can be customized at Blastn objects instantiation * refpath: Path to the reference fasta file (not gzipped). Mandatory * makeblastdbexec: Path of the makeblastdb executable. Default = "makeblastdb" * makeblastdb_opt: makeblastdb command line options as a string. Default = ""

To ensure a proper database files deletion at the end of the execution it is possible to call the object using the with statement. Alternatively you can call the rm_db method at the end of the Blastn usage.

Code with Blastn(ref_path="./subject.fa") as blastn: print (blastn) Output ``` CREATE DATABASE: makeblastdb -dbtype nucl -inputtype fasta -in subject.fa -out tempdir

MAKEBLASTDB CLASS Parameters list dbdir /tmp/tmplbkdwzm2 dbpath /tmp/tmplbkdwzm2/Yeast makeblastdbexec makeblastdb makeblastdbopt ref_path ./data/Yeast.fa verbose False

Cleaning up blast DB files for "subject" ```

Calling Blastn object: Perform Blastn and return a list of hits

The "align" method of a Blastn object can then be called with a query fasta file (query_path) or directly with a sequence string (query_seq).. The following parameters can be customized at Blastn objects calling:

  • query_path: Path to a fasta file containing the query sequences (not gzipped). Mandatory
  • query_seq: sequence string
  • blast_exec: Path of the blast executable. By Default blastn will be used. Default = "blastn"
  • blastn_opt: Blastn command line options as a string. Default = ""
  • task: Type of blast to be performed ('blastn' 'blastn-short' 'dc-megablast' 'megablast' 'rmblastn'). Default = "dc-megablast"
  • evalue: E Value cuttoff to retain alignments. Default = 1
  • bestqueryhit: find and return only the best hit per query. Default = False

A list containing 1 BlastHit object for each query hit found in the subject will be returned, except if not hit were found in which situation 'None' will be returned. If the bestqueryhit flag was set to True, Only the best hit per query sequence from the query file will be returned.

Code with Blastn(ref_path="./subject.fa") as blastn: hit_list = blastn(query_path="./query.fa") for hit in hit_list: print (hit) Output ``` CREATE DATABASE: makeblastdb -dbtype nucl -input_type fasta -in ./subject.fa -out /tmp/tmp1ZBlfT/subject

MAKE BLAST: blastn -num_threads 4 -task dc-megablast -evalue 1 -outfmt "6 std qseq" -dust no -query ./query.fa -db /tmp/tmp1ZBlfT/subject

2 hits found

HIT 0 Query query1:0-48(+) Subject subject:19-67(+) Lenght : 48 Identity : 100.0% Evalue : 2e-23 Bit score : 87.8 Aligned query seq : GCATGCTCGATCAGTAGCTCTCAGTACGCATACGCTAGCATCACGACT

HIT 1 Query query2:0-48(+) Subject subject:89-137(+) Lenght : 48 Identity : 100.0% Evalue : 2e-23 Bit score : 87.8 Aligned query seq : CGCATCGACTCGATCTGATCAGCTCACAGTCAGCATCAGCTACGATCA

Cleaning up blast DB files for "subject" ```

BlastHit

Python object representing a hit found by blastn. The object contains the following public fields:

  • id: Auto incremented unique identifier [INT]
  • q_id: Query sequence name [STR]
  • s_id: Subject sequence name [STR]
  • identity: % of identity in the hit [FLOAT 0:100]
  • length: length of the hit [INT >=0]
  • mis: Number of mismatch in the hit [INT >=0]
  • gap: Number of gap in the hit [INT >=0]
  • q_start: Hit start position of the query sequence [INT >=0]
  • q_end: Hit end position of the query sequence [INT >=0]
  • s_start: Hit start position of the subject sequence [INT >=0]
  • s_end: Hit end position of the subject sequence [INT >=0]
  • evalue: E value of the alignment [FLOAT >=0]
  • bscore: Bit score of the alignment[FLOAT >=0]
  • q_seq: Sequence of the query aligned on the subject sequence [STR]
  • q_orient: Orientation of the query sequence [+ or -]
  • s_orient: Orientation of the subject sequence [+ or -]

The validity of numeric value is checked upon instantiation. Invalid values will raise assertion errors.

BlastHit Objects can return a comprehensive report of themselves under the form of an ordered dictionnary:

code ```

Interactive import

from BlastHit import BlastHit

Create a default BlastHit object

h = BlastHit()

Call the report method

h.get_report(full = True) **Output** OrderedDict([('Query', 'query:0-10(+)'), ('Subject', 'subject:0-10(+)'), ('Identity', 100.0), ('Evalue', 0.0), ('Bit Score', 0.0), ('Hit length', 10), ('Number of gap', 0), ('Number of mismatch', 0)]) ```

Testing pyBlast module

The module can be easily tested thanks to pytest

  • Install pytest with pip pip instal pytest
  • Run test with py.test-2.7 -v

Example of output if successful. Please note than some tests might fail due to the random sampling of DNA sequences, and uncertainties of Blastn algorithm. ``` ========================================== test session starts =========================================== platform linux2 -- Python 2.7.5 -- py-1.4.27 -- pytest-2.7.0 -- /usr/bin/python rootdir: /home/adrien/Programming/Python/pyBlast, inifile: collected 21 items

testpyBlast.py::testBlastHit[4.16866907958-57-98-69-88-12-100-43-1.40452897105-47.3666242716] PASSED testpyBlast.py::testBlastHit[-1-7-10-20-73-54-25-45-98.7921480151-45.2397166228] xfail testpyBlast.py::testBlastHit[8.92741377413--1-100-36-34-33-14-71-18.8547135761-97.6604693294] xfail testpyBlast.py::testBlastHit[10.5987790458-46--1-45-78-81-86-86-73.8740266727-56.887410005] xfail testpyBlast.py::testBlastHit[66.8213911219-62-48--1-91-10-60-20-88.7850139735-81.7901609219] xfail testpyBlast.py::testBlastHit[86.6626174287-29-83-34--1-53-57-68-17.9799756069-7.83036609495] xfail testpyBlast.py::testBlastHit[5.23985331666-43-85-33-7--1-14-3-74.2130782704-88.9289495285] xfail testpyBlast.py::testBlastHit[75.6935977321-8-78-68-10-39--1-74-44.1447867052-22.5203082483] xfail testpyBlast.py::testBlastHit[39.8692596061-60-5-49-77-9-31--1-2.59963139531-46.3133849683] xfail testpyBlast.py::testBlastHit[15.7192632366-24-92-1-64-82-83-90--1-75.5540618409] xfail testpyBlast.py::testBlastHit[18.6627439886-34-57-60-5-45-26-40-77.7840842678--1] xfail testpyBlast.py::testBlastn[blastn-Queries from Subject] PASSED testpyBlast.py::testBlastn[blastn-Random queries] xfail testpyBlast.py::testBlastn[blastn-short-Queries from Subject] PASSED testpyBlast.py::testBlastn[blastn-short-Random queries] xfail testpyBlast.py::testBlastn[dc-megablast-Queries from Subject] PASSED testpyBlast.py::testBlastn[dc-megablast-Random queries] xfail testpyBlast.py::testBlastn[megablast-Queries from Subject] PASSED testpyBlast.py::testBlastn[megablast-Random queries] xfail testpyBlast.py::testBlastn[rmblastn-Queries from Subject] PASSED testpyBlast.py::testBlastn[rmblastn-Random queries] xfail

================================== 6 passed, 15 xfailed in 5.91 seconds ================================== ```

Dependencies

Authors and Contact

Adrien Leger - 2015

Owner

  • Name: Adrien Leger
  • Login: a-slide
  • Kind: user
  • Location: Oxford, UK
  • Company: @nanoporetech

Research scientist at Oxford Nanopore Technologies

GitHub Events

Total
Last Year

Dependencies

setup.py pypi
  • pyfaidx >=0.5.8