contrastive-finetuning-plms

Optimizing protein language models with Sentence Transformers - ADOPT2

https://github.com/peptoneltd/contrastive-finetuning-plms

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Optimizing protein language models with Sentence Transformers - ADOPT2

Basic Info
  • Host: GitHub
  • Owner: PeptoneLtd
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.71 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Contrastive Finetuning protein Language Models

This repo contains data and scripts to demonstrate how Sentence-Transformers can be used with protein Language Models, in particular ESM models, as demonstrated in the paper Optimizing protein language models with Sentence Transformers, NeurIPS (2023).

Setup

Please note that this implementation requires GPUs.

bash git clone https://github.com/PeptoneLtd/contrastive-finetuning-plms.git cd contrastive-finetuning-plms pip install -r full_env.txt

Usage

Two minimal examples showing how to train a solubility and disorder prediction are provided. * scripts/solubility_search_seeds.py * scripts/disorder_st_avg.py

Note that the scripts take the data from the data folder and might require adjusting of the paths depending on the environment setting. For the disorder task in case of a large scale search, one might consider caching the frozen residue level representations from ESM, as currently it automatically downloads those from huggingface on-the-fly.

Citations

If you use this work in your research, please cite the the relevant software:

BiBTeX @inproceedings{adopt2, title = {Optimizing protein language models with Sentence Transformers}, author = {Istvan Redl and Fabio Airoldi and Sandro Bottaro and Albert Chung and Oliver Dutton and Carlo Fisicaro and Patrik Foerch and Louie Henderson and Falk Hoffmann and Michele Invernizzi and Benjamin M J Owens and Stefano Ruschetta and Kamil Tamiola}, booktitle = {Proceedings of the NeurIPS Workshop on Machine Learning in Structural Biology}, year = {2023}, note = {Workshop Paper}, url = {https://www.mlsb.io/papers_2023/Optimizing_protein_language_models_with_Sentence_Transformers.pdf} }

Licence

This source code is licensed under the Apache 2.0 license found in the LICENSE file in the root directory of this source tree.

Owner

  • Name: Peptone
  • Login: PeptoneLtd
  • Kind: organization
  • Email: hello@peptone.io
  • Location: London, United Kingdom

World's first, end-to-end Protein Engineering Operating System (PeOS).

Citation (CITATION.cff)

cff-version: 0.2.0
message: "If you use this software, please cite it as below."
authors:
  - given-names: "Istvan Redl"
    family-names: "Redl"
    email: "istvan@peptone.io"
    affiliation: "Peptone Ltd."
  - given-names: "Fabio Airoldi"
    family-names: "Airoldi"
    affiliation: "Peptone Ltd."
  - given-names: "Sandro Bottaro"
    family-names: "Bottaro"
    affiliation: "Peptone Ltd."
  - given-names: "Albert Chung"
    family-names: "Chung"
    affiliation: "Peptone Ltd."
  - given-names: "Oliver Dutton"
    family-names: "Dutton"
    affiliation: "Peptone Ltd."
  - given-names: "Carlo Fisicaro"
    family-names: "Fisicaro"
    affiliation: "Peptone Ltd."
    orcid: "0000-0002-2029-7230"
  - given-names: "Louie Henderson"
    family-names: "Henderson"
    affiliation: "Peptone Ltd."
  - given-names: "Falk Hoffmann"
    family-names: "Hofmann"
    affiliation: "Peptone Ltd."
  - given-names: "Michele Invernizzi"
    family-names: "Invernizzi"
    affiliation: "Peptone Ltd."
  - given-names: "Stefano Ruschetta"
    family-names: "Ruschetta"
    affiliation: "Peptone Ltd."
title: "Contrastive Finetuning protein Language Models"
version: 0.2.0
doi:
date-released:
url: "https://github.com/PeptoneLtd/contrastive-finetuning-plms"

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • albert-peptone (1)
  • CFisicaro (1)
Top Labels
Issue Labels
Pull Request Labels