https://github.com/anton-bushuiev/proteinttt
Training on test proteins improves fitness, structure, and function prediction
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Repository
Training on test proteins improves fitness, structure, and function prediction
Basic Info
- Host: GitHub
- Owner: anton-bushuiev
- License: mit
- Language: Python
- Default Branch: main
- Size: 1.13 MB
Statistics
- Stars: 11
- Watchers: 5
- Forks: 2
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
ProteinTTT
ProteinTTT is a package that allows you to use test-time training (TTT) to improve the performance of protein language models via on-the-fly per-protein customization.
🚨 The repository is under active development.
Installation
Please first install the model you are planning to use with TTT and then install ProteinTTT:
bash
git clone https://github.com/anton-bushuiev/ProteinTTT && pip install -e ProteinTTT
Usage
In the following example, we use ESMFold + TTT to predict the structure of a protein. Here, customizing ESMFold with TTT leads to structure prediction with twice higher pLDDT.
```python import torch import esm import biotite.structure.io as bsio from proteinttt.models.esmfold import ESMFoldTTT, DEFAULTESMFOLDTTT_CFG
Set your sequence
sequence = "GIHLGELGLLPSTVLAIGYFENLVNIICESLNMLPKLEVSGKEYKKFKFTIVIPKDLDANIKKRAKIYFKQKSLIEIEIPTSSRNYPIHIQFDENSTDDILHLYDMPTTIGGIDKAIEMFMRKGHIGKTDQQKLLEERELRNFKTTLENLIATDAFAKEMVEVIIEE"
Load model
model = esm.pretrained.esmfold_v1() model = model.eval().cuda()
def predictstructure(model, sequence): with torch.nograd(): output = model.infer_pdb(sequence)
with open("result.pdb", "w") as f:
f.write(output)
struct = bsio.load_structure("result.pdb", extra_fields=["b_factor"])
print('pLDDT:', struct.b_factor.mean())
predict_structure(model, sequence)
pLDDT: 38.43025
================ TTT ================
tttcfg = DEFAULTESMFOLDTTTCFG tttcfg.steps = 10 # This is how you can modify config model = ESMFoldTTT.tttfrompretrained(model, tttcfg=tttcfg, esmfoldconfig=model.cfg) model.ttt(sequence)
=====================================
predict_structure(model, sequence)
pLDDT: 78.69619
Reset model to original state (after this model.ttt can be called again on another protein)
================ TTT ================
model.ttt_reset()
=====================================
```
See notebooks/demo.ipynb for more usage examples.
References
If you use ProteinTTT in your research, please cite the following paper:
bibtex
@article{bushuiev2024training,
title={Training on test proteins improves fitness, structure, and function prediction},
author={Bushuiev, Anton and Bushuiev, Roman and Zadorozhny, Nikola and Samusevich, Raman and St{\"a}rk, Hannes and Sedlar, Jiri and Pluskal, Tom{\'a}{\v{s}} and Sivic, Josef},
journal={arXiv preprint arXiv:2411.02109},
url={https://arxiv.org/abs/2411.02109},
doi={10.48550/arXiv.2411.02109},
year={2024}
}
Owner
- Name: Anton Bushuiev
- Login: anton-bushuiev
- Kind: user
- Location: Prague
- Company: Czech Technical University in Prague
- Twitter: AntonBushuiev
- Repositories: 23
- Profile: https://github.com/anton-bushuiev
PhD student. Machine learning / computational biology 🤖🌱
GitHub Events
Total
- Release event: 1
- Watch event: 11
- Member event: 1
- Push event: 28
- Pull request event: 8
- Fork event: 2
- Create event: 3
Last Year
- Release event: 1
- Watch event: 11
- Member event: 1
- Push event: 28
- Pull request event: 8
- Fork event: 2
- Create event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 10 hours
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 10 hours
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- roman-bushuiev (3)