d-script

A structure-aware interpretable deep learning model for sequence-based prediction of protein-protein interactions

https://github.com/samsledje/d-script

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A structure-aware interpretable deep learning model for sequence-based prediction of protein-protein interactions

Basic Info
Statistics
  • Stars: 100
  • Watchers: 3
  • Forks: 21
  • Open Issues: 2
  • Releases: 20
Created over 5 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation

README.md

D-SCRIPT

PyPI DOI License Pytest Ruff

D-SCRIPT Architecture

D-SCRIPT is a deep learning method for predicting a physical interaction between two proteins given just their sequences. It generalizes well to new species and is robust to limitations in training data size. Its design reflects the intuition that for two proteins to physically interact, a subset of amino acids from each protein should be in contact with the other. The intermediate stages of D-SCRIPT directly implement this intuition, with the penultimate stage in D-SCRIPT being a rough estimate of the inter-protein contact map of the protein dimer. This structurally-motivated design enhances the interpretability of the results and, since structure is more conserved evolutionarily than sequence, improves generalizability across species.

You can now make predictions with D-SCRIPT via the interface on HuggingFace!

Installation

bash pip install dscript

Usage

Protein sequences need to first be embedded using the Bepler+Berger protein language model; this requires a .fasta file as input. Everything before the first space will be used as the key.

bash dscript embed --seqs [sequences] --outfile [embedding file]

Candidate pairs should be in tab-separated (.tsv) format with no header, and columns for [protein key 1], [protein key 2]. Optionally, a third column with [label] can be provided, so predictions can be made using training or test data files (but the label will not affect the predictions only the first two columns will be read).

While pre-trained model files can be downloaded directly, we recommend instead passing the name of a pre-trained model that will be automatically downloaded from HuggingFace. Available models include:

  • samsl/dscripthumanv1
  • samsl/topsyturvyhuman_v1 (recommended)
  • samsl/tt3dhumanv1

bash dscript predict --pairs [input data] --embeddings [embedding file] --model [model file] --outfile [predictions file]

References

Owner

  • Name: Samuel Sledzieski
  • Login: samsledje
  • Kind: user
  • Location: Cambridge, MA
  • Company: Massachusetts Institute of Technology

PhD student @ MIT. Studying computational biology and bioinformatics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Sledzieski"
  given-names: "Samuel"
  orcid: "https://orcid.org/0000-0002-0170-3029"
- family-names: "Singh"
  given-names: "Rohit"
  orcid: "https://orcid.org/0000-0002-4084-7340"
- family-name: "Devkota"
  given-names: "Kapil"
  orcid: "https://orcid.org/0000-0002-6093-6260"
title: "D-SCRIPT"
version: 0.2.0
doi: 10.1016/j.cels.2021.08.010
date-released: 2022-06-26
url: "https://github.com/samsledje/D-SCRIPT"
preferred-citation:
  type: article
  authors:
  - family-names: "Sledzieski"
    given-names: "Samuel"
    orcid: "https://orcid.org/0000-0002-0170-3029"
  - family-names: "Singh"
    given-names: "Rohit"
    orcid: "https://orcid.org/0000-0002-4084-7340"
  - family-names: "Cowen"
    given-names: "Lenore"
    orcid: "https://orcid.org/0000-0001-6698-6413"
  - family-names: "Berger"
    given-names: "Bonnie"
    orcid: "https://orcid.org/0000-0002-2724-7228"
  doi: "10.1016/j.cels.2021.08.010"
  journal: "Cell Systems"
  publisher: "Elsevier"
  volume: 12
  issue: 10
  start: 969
  end: 982
  title: "D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions"
  year: 2021

GitHub Events

Total
  • Create event: 7
  • Release event: 2
  • Issues event: 11
  • Watch event: 16
  • Delete event: 1
  • Issue comment event: 17
  • Push event: 16
  • Pull request review event: 5
  • Pull request review comment event: 5
  • Pull request event: 5
  • Fork event: 3
Last Year
  • Create event: 7
  • Release event: 2
  • Issues event: 11
  • Watch event: 16
  • Delete event: 1
  • Issue comment event: 17
  • Push event: 16
  • Pull request review event: 5
  • Pull request review comment event: 5
  • Pull request event: 5
  • Fork event: 3

Dependencies

.github/workflows/autorun-tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/pypi_publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
environment.yml conda
  • biopython
  • cudatoolkit 10.2.*
  • h5py
  • matplotlib
  • numpy
  • pandas
  • pip 20.0.*
  • python 3.7.*
  • pytorch 1.11.*
  • scikit-learn
  • scipy
  • seaborn
  • setuptools
  • tqdm
docs/requirements.txt pypi
  • Sphinx ==3.3.1
  • biopython *
  • h5py *
  • jinja2 <3.1
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
  • seaborn *
  • sphinx-rtd-theme ==1.0.0
  • sphinxcontrib-applehelp ==1.0.2
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==1.0.3
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.4
  • torch ==1.5.0
  • tqdm *
requirements.txt pypi
  • biopython *
  • h5py *
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
  • seaborn *
  • setuptools *
  • torch ==1.11
  • tqdm *
setup.py pypi
  • biopython *
  • h5py *
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
  • seaborn *
  • torch >=1.11
  • tqdm *