cblaster

Find clustered hits from a BLAST search

https://github.com/gamcil/cblaster

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: acs.org, zenodo.org
  • Committers with academic emails
    1 of 10 committers (10.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Find clustered hits from a BLAST search

Basic Info
  • Host: GitHub
  • Owner: gamcil
  • License: mit
  • Language: HTML
  • Default Branch: master
  • Size: 48 MB
Statistics
  • Stars: 120
  • Watchers: 5
  • Forks: 25
  • Open Issues: 48
  • Releases: 37
Created over 6 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

cblaster

Python package License: MIT PyPI version Documentation Status DOI

Both cblaster and clinker can now be used without installation on the CAGECAT webserver.

Outline

cblaster is a tool for finding clusters of co-located homologous sequences in BLAST searches.

cblaster search workflow

Given a collection of protein sequences, cblaster can search sequence databases remotely (via NCBI BLAST API) or locally (via DIAMOND). Search results are parsed and filtered based on user thresholds for identity, coverage and e-value. The genomic coordinates of remaining hits are obtained from the NCBI's Identical Protein Group (IPG) database (or a local database in local searches). Finally, cblaster scans for instances of collocation and generates visualisations:

cblaster search results

Installation

cblaster can be installed via pip:

bash $ pip3 install cblaster --user

or by cloning the repository and installing:

bash $ git clone https://github.com/gamcil/cblaster.git ... $ cd cblaster/ $ pip3 install .

Additionally, we provide executables for Windows and Mac which can be downloaded from here.

Once installed, make sure you configure cblaster with your email address:

bash $ cblaster config --email name@domain.com

You can find example search files, along with generated output, in the examples folder of the repository.

Dependencies

cblaster is tested on Python 3.6, and its only external Python dependency is the requests module (used for interaction with NCBI APIs). If you want to perform local searches, you should have diamond installed and available on your system $PATH. cblaster will throw an error if a local search is started but it cannot find diamond or diamond-aligner (alias when installed via apt) on the system.

Usage

cblaster accepts FASTA files and collections of valid NCBI sequence identifiers (GIs, accession numbers) as input. A remote search can be performed as simply as:

bash $ cblaster search --query_file query.fasta

For example, to remotely search the burnettramic acids gene cluster, bua , against the NCBI's nr database:

```bash $ cblaster search -qf bua.fasta

[12:14:17] INFO - Starting cblaster in remote mode [12:14:17] INFO - Launching new search [12:14:19] INFO - Request Identifier (RID): WHS0UGYJ015 [12:14:19] INFO - Request Time Of Execution (RTOE): 25s [12:14:44] INFO - Polling NCBI for completion status [12:14:44] INFO - Checking search status... [12:15:44] INFO - Checking search status... [12:16:44] INFO - Checking search status... [12:16:46] INFO - Search has completed successfully! [12:16:46] INFO - Retrieving results for search WHS0UGYJ015 [12:16:51] INFO - Parsing results... [12:16:51] INFO - Found 3944 hits meeting score thresholds [12:16:51] INFO - Fetching genomic context of hits [12:17:14] INFO - Searching for clustered hits across 705 organisms [12:17:14] INFO - Writing summary to

Aspergillus mulundensis DSM 5745

NW_020797889.1

Query Subject Identity Coverage E-value Bitscore Start End Strand QBE85641.1 XP026607259.1 75.56 99.5918 0 742 1717881 1719409 - QBE85642.1 XP026607260.1 89.916 100 0 667 1719650 1720797 + QBE85643.1 XP026607261.1 89.532 83.1169 0 832 1721494 1722934 + QBE85644.1 XP026607262.1 64.829 98.9218 6.51e-157 455 1723252 1724467 - QBE85645.1 XP026607263.1 69.97 100 6.93e-157 449 1725113 1726277 - QBE85646.1 XP026607264.1 82.759 96.8447 0 670 1726892 1728302 + QBE85647.1 XP026607265.1 72.674 99.2048 0 764 1729735 1731338 + QBE85648.1 XP026607266.1 56.098 98.324 4.24e-64 205 1731701 1732402 - QBE85649.1 XP_026607267.1 79.623 99.8746 0 6573 1732820 1745289 +

... ```

A query sequence absence/presence matrix can be generated using the --binary argument:

Organism Scaffold Start End QBE85641.1 QBE85642.1 QBE85643.1 QBE85644.1 QBE85645.1 QBE85646.1 QBE85647.1 QBE85648.1 QBE85649.1 Aspergillus mulundensis DSM 5745 NW_020797889.1 1717881 1745289 1 1 1 1 1 1 1 1 1 Aspergillus versicolor CBS 583.65 KV878126.1 3162095 3187090 1 1 1 0 1 1 1 1 1 Pseudomassariella vexata CBS 129021 MCFJ01000004.1 1606356 1628483 1 1 1 0 0 1 0 1 1 Hypoxylon sp. CO27-5 KZ112517.1 92119 112957 1 1 1 0 0 0 1 0 1 Hypoxylon sp. EC38 KZ111255.1 514739 535366 1 1 1 0 0 0 1 0 1 Epicoccum nigrum ICMP 19927 KZ107839.1 2116719 2142558 1 1 0 0 0 1 1 0 1 Aureobasidium subglaciale EXF-2481 NW_013566983.1 700476 718693 1 1 0 0 0 1 1 0 0 Aureobasidium pullulans EXF-6514 QZBF01000009.1 18721 34295 1 1 0 0 0 1 1 0 0 Aureobasidium pullulans EXF-5628 QZBI01000512.1 329 13401 1 0 0 0 0 1 1 0 0

cblaster can also generate fully interactive visualisations of the binary table. To view an example, click here.

For further usage examples and API documentation, please refer to the documentation.

Citation

If you found this tool useful, please cite:

text Cameron L M Gilchrist, Thomas J Booth, Bram van Wersch, Liana van Grieken, Marnix H Medema, Yit-Heng Chooi, cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters, Bioinformatics Advances, 2021;, vbab016, https://doi.org/10.1093/bioadv/vbab016

cblaster makes use of the following tools: ``` Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

Acland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7–17 (2014). ```

Owner

  • Name: Cameron Gilchrist
  • Login: gamcil
  • Kind: user
  • Location: Perth, Western Australia

Postdoc @ Steinegger Lab, Seoul National University Ex. Chooi Lab, The University of Western Australia

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Gilchrist
    given-names: Cameron
    orcid: https://orcid.org/0000-0001-7798-427X
  - family-names: Booth
    given-names: Thomas J
    orcid: https://orcid.org/0000-0002-6134-1488
  - family-names: van Wersch
    given-names: Bram
  - family-names: van Grieken
    given-names: Liana
  - family-names: Medema
    given-names: Marnix H
    orcid: https://orcid.org/0000-0002-2191-2821
  - family-names: Chooi
    given-names: Yit-Heng
    orcid: https://orcid.org/0000-0001-7719-7524
title: "cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters"
version: 1.3.9
doi: 10.1093/bioadv/vbab016
date-released: 2021-08-05

GitHub Events

Total
  • Create event: 1
  • Release event: 1
  • Issues event: 6
  • Watch event: 22
  • Issue comment event: 30
  • Push event: 2
  • Pull request event: 4
  • Fork event: 6
Last Year
  • Create event: 1
  • Release event: 1
  • Issues event: 6
  • Watch event: 22
  • Issue comment event: 30
  • Push event: 2
  • Pull request event: 4
  • Fork event: 6

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 890
  • Total Committers: 10
  • Avg Commits per committer: 89.0
  • Development Distribution Score (DDS): 0.406
Past Year
  • Commits: 10
  • Committers: 2
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.1
Top Committers
Name Email Commits
Cameron Gilchrist c****t@g****m 529
Bram e****0@g****m 274
LianaGrieken l****n@g****m 63
Matthias van den Belt m****t@b****l 12
brymerr921 b****1@g****m 6
Mohammad Alanjary m****y@w****l 2
Martin Larralde m****e@e****e 1
DrBoothTJ 6****J 1
Chase Clark 1****c 1
Friederike Biermann f****e@b****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 68
  • Total pull requests: 58
  • Average time to close issues: 2 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 53
  • Total pull request authors: 12
  • Average comments per issue: 2.25
  • Average comments per pull request: 0.33
  • Merged pull requests: 50
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 7
  • Pull requests: 7
  • Average time to close issues: N/A
  • Average time to close pull requests: 13 days
  • Issue authors: 7
  • Pull request authors: 3
  • Average comments per issue: 3.14
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • yye88 (5)
  • ghost (3)
  • zuoyang97 (3)
  • Tim-Kirkwood (2)
  • pksun1 (2)
  • galacmr (2)
  • JordanVV (2)
  • corkdagga (2)
  • StefaanVerwimp (2)
  • mafeeney (2)
  • jjsanchezgil (1)
  • lydia1201 (1)
  • jeep3 (1)
  • Dfvandenberg (1)
  • aberaslop (1)
Pull Request Authors
  • gamcil (27)
  • bramvanwersch (13)
  • LianaGrieken (4)
  • biobeni (4)
  • malanjary-wur (2)
  • althonos (2)
  • kaileyhh (2)
  • LucoDevro (2)
  • FriederikeBiermann (2)
  • MatthiasvdBelt (1)
  • brymerr921 (1)
  • DrBoothTJ (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 141 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 38
  • Total maintainers: 1
pypi.org: cblaster
  • Versions: 38
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 141 Last month
Rankings
Stargazers count: 7.8%
Forks count: 8.5%
Dependent packages count: 10.0%
Average: 12.3%
Downloads: 13.4%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • actions/upload-release-asset v1 composite
  • tubone24/update_release v1.0 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/pythonapp.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
setup.py pypi
  • Biopython *
  • PySimpleGUI *
  • appdirs *
  • biopython *
  • clinker >=0.0.15
  • defusedxml *
  • genomicsqlite *
  • gffutils *
  • numpy *
  • requests *
  • scipy *