https://github.com/branniganlab/blobulator

Blob-based analysis of protein sequences - available online at blobulator.branniganlab.org and as an open-source tool.

https://github.com/branniganlab/blobulator

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.4%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Blob-based analysis of protein sequences - available online at blobulator.branniganlab.org and as an open-source tool.

Basic Info
  • Host: GitHub
  • Owner: BranniganLab
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 20.1 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 94
  • Releases: 2
Created over 5 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License

README.md

Protein Blobulator

Looking for the web interface? Find it here: https://www.blobulator.branniganlab.org/

This tool identifies contiguous stretches of hydrophobic residues within a protein sequence. Any sequence of contiguous hydrophobic residues that is at least as long as the minimum blob length is considered an hydrophobic or h "blob". Any remaining segments that are at least as long as the minimum length are considered polar or p "blobs," while those that are shorter than the minimum blob length are considered separator or "s" residues. Separator residues are very short stretches of non-hydrophobic residues that may be found between two h blobs.

Running locally:

Installation guide:

Software requirements:

Python 3.9+

Quick Install:

[Optional] Create a conda environment: conda create --name blobulator_env python=3.9 conda activate blobulator_env [For website and sample scripts] Download the repository: git clone https://github.com/BranniganLab/blobulator Install with pip pip install git+https://github.com/BranniganLab/blobulator Known issue: If you get an error installing pycairo, try conda install pycairo and retry the above.

Running through an internet browser:

Note: this option is identical to the website version, but is hosted on your local machine: cd [path_to_repository]/website python3 blobulation.py If a browser doesn't open automatically, copy the url from the terminal into a browser.

Scripting - Hello, World:

``` import blobulator

# A very simple oligopeptide and standard settings
sequence = "RRRRRRRRRIIIIIIIII"
cutoff = 0.4
min_blob = 4
hscale = "kyte_doolittle"

# Do the blobulation
blobDF = blobulator.compute(sequence, cutoff, min_blob, hscale)

# Cleanup the dataframe (make it more human-readable)
blobDF = blobulator.clean_df(blobDF)

# Save it as a csv for later use
oname = "hello_blob.csv"
blobDF.to_csv(oname, index=False)

``` Additional sample scripts can be found in the repository examples directory.

Using the command-line utility blobulate.py:

Minimal Install:

The backend can be installed independently using with pip install blobulator

Basic usage:

Open a terminal in the blobulator directory and run: python3 -m blobulator --sequence AFRPGAGQPPRRKECTPEVEEGV --oname ./my_blobulation.csv This will blobulate the sequence "AFRPGAGQPPRRKECTPEVEEGV" and write the result to my_blobulation.csv

Options:

You may specify additional paramters using the following options: ``` -h, --help show help information and exit

--sequence SEQUENCE Takes a single string of EITHER DNA or protein one-letter codes (no spaces). --cutoff CUTOFF Sets the cutoff hydrophobicity (floating point number between 0.00 and 1.00 inclusive). Defaults to 0.4 --minBlob MINBLOB Mininmum blob length (integer greater than 1). Defaults to 4 --oname ONAME Name of output file or path to output directory. Defaults to blobulated_.csv --fasta FASTA FASTA file with 1 or more sequences --DNA DNA Flag that says whether the inputs are DNA or protein. Defaults to false (protein) ```

Advanced Usage (FASTA files):

  • Place a fasta file with one or more sequences in any directory (Note: they must all be DNA or protein sequences)
  • Open a terminal in the blobulator directory and run: python3 -m blobulator --fasta ./relative/path/to/my_sequences.fasta --oname ./relative/path/to/outputs/
  • This will blobulate all sequences in my_sequences.fasta (assuming they are protein sequences) and output the results to the outputs folder prefixed by their sequence id.

Example:

There is a fasta file in blobulation/example called bsubtilis.fasta that contains the sequences of several proteins from Bacillus subtilis. To blobulate all those proteins with a cutoff of 0.4 and a minimum blob size of 4, we run: ``` mkdir outputs python3 -m blobulator --fasta ../example/bsubtilis.fasta --cutoff 0.4 --minBlob 4 --oname outputs/ ```

CSV Outputs:

Whether you have blobulated your proteins of interest using the web utility or the command-line option, you can obtain the blobulation data as a csv (the only output of the command line option or by clicking "Download Data" on the website). These CSVs are organized with each residue in its own row and columns as follows: - ResidueNumber: The position of the residue in the sequence starting at 1 - ResidueName: The one-letter amino acid code - Window: The size of the rolling average window (this is currently 3 by default. We have not yet added the ability to change this.) - HydropathyCutoff: The normalized cutoff used during blobulation (float between 0 and 1) - MinimumBlobLength: The minimum blob length used during blobulation (integer greater than 0) - bloblength: The length of the residue's blob - NormalizedMeanBlobHydropathy: The normalized mean hydropathy of the residue's blob - BlobType: The one-letter blob code (h=hydropathic, p=polar/hydrophilic, s=short hydrophilic) - BlobIndexNumber: Indices which distinguish blobs. E.g. h1 is the first hydrophobic blob. h1a and h1b refer to two halves of a blob separated by a short hydrophobic blob. - BlobDas-PappuClass: Blob scored by Das-Pappu globularity. 1=globular, 2=Janus/boundary, 3=Polar, 4=Polycation, 5=Polyanion - BlobNCPR: Net-charge-per-residue of the blob - FractionofPositivelyChargedResidues: FPC = N(Positively charged residues)/N(residues) - FractionofNegativelyChargedResidues: FNC = N(Negatively charged residues)/N(residues) - FractionofChargedResidues: FCR = FPC+FNC - UverskyDiagramScore: Distance from the Uversky-Gillespie-Fink globular/disordered cutoff. See https://pubmed.ncbi.nlm.nih.gov/11025552/ - dSNPenrichment: Predicted disease-causing mutation enrichment. dSNPenrichment: Predicted enrichment of disease-causing SNPs. See Lohia, Hansen, and Brannigan, 2022, PNAS, In Press. - BlobDisorder: Mean expected disorder score as provided by D2P2. See https://doi.org/10.1093/nar/gks1226 - NormalizedKyte-Doolittlehydropathy: K-D hydropathy normalized to be between 0 and 1. See Kyte-Doolittlehydropathy. - Kyte-Doolittle_hydropathy: Traditional K-D hydropathy (on a scale from -4.5 to 4.5). This is a very common hydrophobicity scale dating to 1982: https://doi.org/10.1016%2F0022-2836%2882%2990515-0

Blobulating proteins in VMD

A plugin to blobulate protein structures in VMD

This plugin allows users to blobulate and view blobs on a protein structure in Visual Molecular Dynamics (VMD). The functionality of this plugin is to provide users with an interface by which they can tune parameters and alter the representation of blobs on a given protein structure.

Software requirements:

VMD

Installation guide:

To obtain this plugin, download the following files from the VMDscripts folder into a single directory: ``` blobulation.tcl BlobGUI.tcl normalized_hydropathyscales.tcl ```

Quickstart:

  1. Load a protein into VMD.
  2. Access the Tk console via the Extensions dropdown menu Extensions > Tk Console.
  3. In the Tk console, change directory to the directory where you downloaded the above scripts cd /path/to/blobulator/scripts.
  4. Source the plugin source Blob_GUI.tcl.
  5. Click the blobulate button to generate the corresponding graphical representation in VMD.

Optional Settings:

  • Select the residues you wish to blobulate (defaults to "all").
  • Select your desired scale (defaults to "Kyte-Doolittle").
  • Adjust the 'Length' and 'Hydrophobicity' thresholds to your chosen parameters (if applicable).
  • Select how you color your blobs; blob representations apply to every frame in a loaded trajectory.
    • Blob Color - Colors by blob type: h-blobs are blue, p-blobs are orange, and s-blobs are green.
    • Blob ID - Colors h-blobs by blob ID, p-blobs are orange, s-blobs are green, and h-blobs are a color from green to blue.
  • To remove all representations, click the 'Clear representations' button.
  • Clicking the 'Default' buttons will return the threshold buttons to their default positions.
    • For 'Length', the default will always be set to 4.
    • For 'Hydrophobicity', this value updates depending on the Hydropathy Scale.
    • To automatically assign the default value when switching scales, click the 'Auto-Update Threshold' checkbox.

How to access blob representations:

The blobulation algorithm will apply all blob types to the VMD user and user2 values.

The 'user' value will store the type of blob: user 1 -> h-blobs, user 2 -> s-blobs, and user 3 -> p-blobs.

The 'user2' value will store the blob group: user2 1 -> h-blob group 1, user2 2 -> s-blob group 1, user2 3 -> h-blob group 2, etc.

When coloring by Blob ID, h-blobs will have different colors depending on the user2 value.

Known Limitations:

VMD blobulator can not run its blobulation algorithm on proteins that contain non-standard amino acids.

Owner

  • Name: Brannigan Lab
  • Login: BranniganLab
  • Kind: organization

GitHub Events

Total
  • Create event: 41
  • Issues event: 81
  • Watch event: 2
  • Delete event: 35
  • Issue comment event: 23
  • Push event: 220
  • Gollum event: 7
  • Pull request review event: 56
  • Pull request review comment event: 17
  • Pull request event: 90
Last Year
  • Create event: 41
  • Issues event: 81
  • Watch event: 2
  • Delete event: 35
  • Issue comment event: 23
  • Push event: 220
  • Gollum event: 7
  • Pull request review event: 56
  • Pull request review comment event: 17
  • Pull request event: 90

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 31
  • Total pull requests: 36
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 5
  • Total pull request authors: 4
  • Average comments per issue: 0.06
  • Average comments per pull request: 0.06
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 31
  • Pull requests: 36
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Issue authors: 5
  • Pull request authors: 4
  • Average comments per issue: 0.06
  • Average comments per pull request: 0.06
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cpitman-11 (33)
  • RL0109 (31)
  • EzryStIago (19)
  • gbrannigan (6)
  • lindseymriggs (3)
  • mebh (2)
  • ttjoseph (1)
  • JahmalEnnis (1)
Pull Request Authors
  • cpitman-11 (43)
  • RL0109 (33)
  • EzryStIago (12)
  • lindseymriggs (2)
  • ttjoseph (1)
Top Labels
Issue Labels
bug (12) VMD Plugin (9) Website (8) enhancement (5) Mac Plugin (3) documentation (2) top priority (2) Windows Plugin (2) Command Line Interface (1) Housekeeping (1) invalid (1) Low Priority/Not Needed for Paper (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 113 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
pypi.org: blobulator

Edge Detection in Protein Sequences

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 113 Last month
Rankings
Dependent packages count: 10.1%
Average: 38.6%
Dependent repos count: 67.0%
Maintainers (1)
Last synced: 8 months ago

Dependencies

blobulator/requirements.txt pypi
  • argparse *
  • biopython *
  • flask *
  • flask_cors *
  • flask_restful *
  • flask_session *
  • matplotlib *
  • pandas *
  • requests *
  • svglib *
  • wtforms *
setup.py pypi
  • Bio *
  • flask *
  • flask_cors *
  • flask_restful *
  • flask_session *
  • importlib_resources >=1.4
  • matplotlib >=3.5.0
  • numpy >=1.22.0
  • pandas >=1.4.0
  • reportlab *
  • svglib *
  • wtforms *