https://github.com/bioinfomachinelearning/transfew

Transformer for protein function prediction (version 2)

https://github.com/bioinfomachinelearning/transfew

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Transformer for protein function prediction (version 2)

Basic Info
  • Host: GitHub
  • Owner: BioinfoMachineLearning
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 555 KB
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

TransFew

Improving protein function prediction by learning and integrating representations of protein sequences and function labels

TransFew leaverages representations of both protein sequences and function labels (Gene Ontology (GO) terms) to predict the function of proteins. It improves the accuracy of predicting both common and rare function terms (GO terms).

Installation

```

clone project

git clone https://github.com/BioinfoMachineLearning/TransFew.git cd TransFew/

download trained models and test sample

https://calla.rnet.missouri.edu/rnaminer/tfew/TFewDataset

Unzip Dataset

unzip TFewDataset

create conda environment

conda env create -f transfew.yaml conda activate transfew ```

Prediction

``` Predict protein functions with TransFew

options: -h, --help show this help message and exit

--data-path DATA_PATH Path to data files (models)

--working-dir WORKING_DIR Path to generate temporary files

--ontology ONTOLOGY Path to data files

--no-cuda NO_CUDA Disables CUDA training.

--batch-size BATCH_SIZE Batch size.

--fasta-path FASTA_PATH Path to Fasta

--output OUTPUT File to save output ```

  1. An example of predicting cellular component of some proteins: ```
  2. Change ROOT_DIR in CONSTANTS.py to path of data directory

  3. python predict.py --data-path /TFewData/ --fasta-path outputdir/testfasta.fasta --ontology cc --working-dir output_dir --output result.tsv ```

Output format

protein GO term score A0A7I2V2M2 GO:0043227 0.996 A0A7I2V2M2 GO:0043226 0.996 A0A7I2V2M2 GO:0005737 0.926 A0A7I2V2M2 GO:0043233 0.924 A0A7I2V2M2 GO:0031974 0.913 A0A7I2V2M2 GO:0070013 0.912 A0A7I2V2M2 GO:0031981 0.831 A0A7I2V2M2 GO:0005654 0.767

Dataset

See DATASET.md (https://github.com/BioinfoMachineLearning/TransFew/blob/main/DATASET.md) for description of data

Training

The training program is available in training.py, to train the model: 1. Change ROOT_DIR in CONSTANTS.py to path of data directory 2. Run: python training.py

Reference

``` Boadu, F., & Cheng, J. (2024). Improving protein function prediction by learning and integrating representations of protein sequences and function labels. Bioinformatics Advances. Volume 4, Issue 1, vbae120.

```

Owner

  • Name: BioinfoMachineLearning
  • Login: BioinfoMachineLearning
  • Kind: organization

GitHub Events

Total
  • Watch event: 3
  • Push event: 1
Last Year
  • Watch event: 3
  • Push event: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Hamid-R-Moradi (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • biopandas ==0.4.1
  • biopython ==1.81
  • click ==8.1.3
  • contourpy ==1.0.7
  • cycler ==0.11.0
  • docker-pycreds ==0.4.0
  • fair-esm ==2.0.0
  • fairscale ==0.4.13
  • fonttools ==4.39.3
  • gitdb ==4.0.10
  • gitpython ==3.1.31
  • importlib-resources ==5.12.0
  • joblib ==1.2.0
  • kiwisolver ==1.4.4
  • matplotlib ==3.7.1
  • obonet ==1.0.0
  • opencv-python ==4.7.0.72
  • packaging ==23.1
  • pandas ==1.5.3
  • pathtools ==0.1.2
  • protobuf ==4.23.0
  • psutil ==5.9.5
  • pyg-lib ==0.2.0
  • pyparsing ==3.0.9
  • python-dateutil ==2.8.2
  • pytz ==2023.2
  • pyyaml ==6.0
  • scipy ==1.10.1
  • seaborn ==0.12.2
  • sentry-sdk ==1.19.1
  • setproctitle ==1.3.2
  • smmap ==5.0.0
  • thop ==0.1.1
  • threadpoolctl ==3.1.0
  • torch-cluster ==1.6.1
  • torch-geometric ==2.3.1
  • torch-scatter ==2.1.1
  • torch-sparse ==0.6.17
  • torch-spline-conv ==1.2.2
  • torch-summary ==1.4.5
  • tqdm ==4.65.0
  • ultralytics ==8.0.82
  • wandb ==0.15.2
  • zipp ==3.15.0