https://github.com/bioinfomachinelearning/transfew
Transformer for protein function prediction (version 2)
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
Transformer for protein function prediction (version 2)
Basic Info
- Host: GitHub
- Owner: BioinfoMachineLearning
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 555 KB
Statistics
- Stars: 10
- Watchers: 2
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
TransFew
Improving protein function prediction by learning and integrating representations of protein sequences and function labels
TransFew leaverages representations of both protein sequences and function labels (Gene Ontology (GO) terms) to predict the function of proteins. It improves the accuracy of predicting both common and rare function terms (GO terms).
Installation
```
clone project
git clone https://github.com/BioinfoMachineLearning/TransFew.git cd TransFew/
download trained models and test sample
https://calla.rnet.missouri.edu/rnaminer/tfew/TFewDataset
Unzip Dataset
unzip TFewDataset
create conda environment
conda env create -f transfew.yaml conda activate transfew ```
Prediction
``` Predict protein functions with TransFew
options: -h, --help show this help message and exit
--data-path DATA_PATH Path to data files (models)
--working-dir WORKING_DIR Path to generate temporary files
--ontology ONTOLOGY Path to data files
--no-cuda NO_CUDA Disables CUDA training.
--batch-size BATCH_SIZE Batch size.
--fasta-path FASTA_PATH Path to Fasta
--output OUTPUT File to save output ```
- An example of predicting cellular component of some proteins: ```
Change ROOT_DIR in CONSTANTS.py to path of data directory
python predict.py --data-path /TFewData/ --fasta-path outputdir/testfasta.fasta --ontology cc --working-dir output_dir --output result.tsv ```
Output format
protein GO term score
A0A7I2V2M2 GO:0043227 0.996
A0A7I2V2M2 GO:0043226 0.996
A0A7I2V2M2 GO:0005737 0.926
A0A7I2V2M2 GO:0043233 0.924
A0A7I2V2M2 GO:0031974 0.913
A0A7I2V2M2 GO:0070013 0.912
A0A7I2V2M2 GO:0031981 0.831
A0A7I2V2M2 GO:0005654 0.767
Dataset
See DATASET.md (https://github.com/BioinfoMachineLearning/TransFew/blob/main/DATASET.md) for description of data
Training
The training program is available in training.py, to train the model:
1. Change ROOT_DIR in CONSTANTS.py to path of data directory
2. Run: python training.py
Reference
``` Boadu, F., & Cheng, J. (2024). Improving protein function prediction by learning and integrating representations of protein sequences and function labels. Bioinformatics Advances. Volume 4, Issue 1, vbae120.
```
Owner
- Name: BioinfoMachineLearning
- Login: BioinfoMachineLearning
- Kind: organization
- Repositories: 29
- Profile: https://github.com/BioinfoMachineLearning
GitHub Events
Total
- Watch event: 3
- Push event: 1
Last Year
- Watch event: 3
- Push event: 1
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Hamid-R-Moradi (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- biopandas ==0.4.1
- biopython ==1.81
- click ==8.1.3
- contourpy ==1.0.7
- cycler ==0.11.0
- docker-pycreds ==0.4.0
- fair-esm ==2.0.0
- fairscale ==0.4.13
- fonttools ==4.39.3
- gitdb ==4.0.10
- gitpython ==3.1.31
- importlib-resources ==5.12.0
- joblib ==1.2.0
- kiwisolver ==1.4.4
- matplotlib ==3.7.1
- obonet ==1.0.0
- opencv-python ==4.7.0.72
- packaging ==23.1
- pandas ==1.5.3
- pathtools ==0.1.2
- protobuf ==4.23.0
- psutil ==5.9.5
- pyg-lib ==0.2.0
- pyparsing ==3.0.9
- python-dateutil ==2.8.2
- pytz ==2023.2
- pyyaml ==6.0
- scipy ==1.10.1
- seaborn ==0.12.2
- sentry-sdk ==1.19.1
- setproctitle ==1.3.2
- smmap ==5.0.0
- thop ==0.1.1
- threadpoolctl ==3.1.0
- torch-cluster ==1.6.1
- torch-geometric ==2.3.1
- torch-scatter ==2.1.1
- torch-sparse ==0.6.17
- torch-spline-conv ==1.2.2
- torch-summary ==1.4.5
- tqdm ==4.65.0
- ultralytics ==8.0.82
- wandb ==0.15.2
- zipp ==3.15.0