phynteny

Predict the function of phage hypothetical proteins using an LSTM model trained with Phage Synteny

https://github.com/susiegriggo/phynteny

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Predict the function of phage hypothetical proteins using an LSTM model trained with Phage Synteny

Basic Info
  • Host: GitHub
  • Owner: susiegriggo
  • License: mit
  • Language: PureBasic
  • Default Branch: main
  • Homepage:
  • Size: 1.12 GB
Statistics
  • Stars: 50
  • Watchers: 5
  • Forks: 3
  • Open Issues: 5
  • Releases: 0
Created over 3 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

phynteny logo

Phynteny: Synteny-based annotation of bacteriophage genes Edwards Lab License: MIT DOI GitHub language count CI PyPI version Downloads Anaconda-Server Badge

Conda

READ THIS New version of Phynteny, Phynteny Transformer is now available. This should provide more accurate results.

Approximately 65% of all bacteriophage (phage) genes cannot be attributed a known biological function. Phynteny uses a long-short term memory model trained on phage synteny (the conserved gene order across phages) to assign hypothetical phage proteins to a PHROG category.

Phynteny is still a work in progress and the LSTM model has not yet been optimised. Use with caution!

NOTE: This version of Phynteny will only annotate phages with 120 genes or less due to the architecture of the LSTM. We aim to adjust this in future versions.

Dependencies

Phynteny installation requires Python 3.8 or above. You will need the following python dependencies to run Phynteny and its related support scripts. The latest tested versions of the dependencies are: * python - version 3.10.0 * sklearn - version 1.2.2 * biopython - version 1.81 * numpy - version 1.21.0 (Windows, Linux, Apple Intel), version 1.24.0 (Apple M1/M2) * tensorflow - version 2.9.0 (Windows, Linux, Apple Intel), tensorflow-macos version 2.11 (Apple M1/M2) * pandas - version 2.0.2 * loguru - version 0.7.0 * click - version 8.1.3

We recommend GPU support if you are training Phynteny. This requires CUDA and cuDNN: * CUDA toolkit - version 11.2 * cuDNN - version 8.1.1

Installation

Option 1: Installing Phynteny using conda (recommended)

You can install Phynteny from bioconda at https://anaconda.org/bioconda/phynteny. Make sure you have conda installed. ```bash

create conda environment and install phynteny

conda create -n phynteny -c bioconda phynteny

activate environment

conda activate phynteny

install phynteny

conda install -c bioconda phynteny ```

NOTE: bioconda installations of Phynteny do not have GPU support. This is fine for most uses but not does not enable training of phynteny models.

Now you can go to Install Models to install pre-trained phynteny models.

Option 2: Installing Phynteny using pip

You can install Phynteny from PyPI at https://pypi.org/project/phynteny/. Make sure you have pip and mamba installed.

pip install phynteny

NOTE: pip installation is recommended for training Phynteny models

Now you can go to Install Models to install pre-trained phynteny models.

Option 3: Installing Phynteny from source

If all else fails you can install Phynteny from this repo.

git clone https://github.com/susiegriggo/Phynteny.git --branch main --depth 1 cd Phynteny pip install .

Now you can go to Install Models to install pre-trained phynteny models.

Install Models

Once you've installed Phynteny you'll need to download the pre-trained models install_models If you would like to specify a particular location to download the models run install_models -o <path/to/database_dir>

If for some reason this does not work. you can download the pre-trained models from Zenodo and untar in a location of your choice.

Usage

Phynteny takes a genbank file containing PHROG annotations as input. If your phage is not yet in this format, pharokka can take your phage (in fasta format) to a genbank file with PHROG annotations. Phynteny will then return a genbank files and a table containing the details of the predictions made using phynteny. Each prediction is accompanied by a 'phynteny score' which ranges between 1-10 and a recalibrated confidence score.

Reccomended
phynteny tests/data/test_phage.gbk -o test_phynteny

Custom

If you wish to specify your own LSTM model, run:

phynteny test_phage.gbk -o test_phage_phynteny -m your_models -t confidence_dict.pkl Details of how to train the phynteny models and generate confidence estimates is detailed below.

Train Phynteny

Phynteny has already been trained for you on a dataset containing over 1 million prophages! If you feel inclined to generate your own Phynteny model using your own dataset, instructions and training scripts are provided here.

Performance

Coming soon: Notebooks demonstrating the performance of the model

Bugs and Suggestions

If you break Phynteny or would like to make any suggestions please open an issue or email me at susie.grigson@flinders.edu.au

Wow! How can I cite this incredible piece of work?

The Phynteny manuscript is currently in preparation. In the meantime, please cite Phynteny as: Grigson, S. R., Mallawaarachchi, V., Roach, M. R., Papudeshi, B., Bouras, G., Decewicz, P., Dinsdale, E. A. & Edwards, R. A. (2023). Phynteny: Synteny-based annotation of phage genomes. DOI: 10.5281/zenodo.8128917

If you use pharokka to annotate your phage before using Phynteny please cite it as well: Bouras, G., Nepal, R., Houtak, G., Psaltis, A. J., Wormald, P. J., & Vreugde, S. (2023). Pharokka: a fast scalable bacteriophage annotation tool. Bioinformatics, 39(1), btac776.

If you found Phynteny useful and would like to get even better annotations for your phages check out phold!

Owner

  • Name: Susie Grigson
  • Login: susiegriggo
  • Kind: user
  • Location: Adelaide
  • Company: Flinders University

Bioinformatics PhD student

GitHub Events

Total
  • Issues event: 3
  • Watch event: 12
  • Issue comment event: 4
  • Push event: 2
  • Pull request event: 3
  • Create event: 1
Last Year
  • Issues event: 3
  • Watch event: 12
  • Issue comment event: 4
  • Push event: 2
  • Pull request event: 3
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Fazel-AVB (1)
  • igortru (1)
  • samuelmontgomery (1)
  • GeoMicroSoares (1)
  • bhagavadgitadu22 (1)
Pull Request Authors
  • gbouras13 (2)
  • susiegriggo (2)
  • linsalrob (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 19 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 4
  • Total maintainers: 1
pypi.org: phynteny

Phynteny: Synteny-based prediction of bacteriophage genes

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 19 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 13.3%
Forks count: 19.1%
Dependent repos count: 21.8%
Average: 24.2%
Downloads: 56.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • Babel2.9.1 *
  • Cython0.29.28 *
  • Jinja23.0.3 *
  • Keras-Preprocessing1.1.2 *
  • Markdown3.3.7 *
  • MarkupSafe2.1.1 *
  • PhiSpy4.2.21 *
  • Pillow9.1.1 *
  • PyJWT2.4.0 *
  • PyQt55.12.3 *
  • PyQt5_sip4.19.18 *
  • PyQtChart5.12 *
  • PyQtWebEngine5.12.1 *
  • PySocks1.7.1 *
  • PyYAML6.0 *
  • Pygments2.11.2 *
  • Rtree0.9.7 *
  • Send2Trash1.8.0 *
  • Shapely1.8.1.post1 *
  • Werkzeug2.1.2 *
  • absl-py =0.15.0
  • aiohttp =3.8.1
  • aiosignal =1.2.0
  • alphashape1.3.1 *
  • anyio3.5.0 *
  • argon2-cffi-bindings21.2.0 *
  • argon2-cffi21.3.0 *
  • astor0.8.1 *
  • asttokens2.0.5 *
  • astunparse1.6.3 *
  • async-timeout4.0.2 *
  • attrs21.4.0 *
  • backcall0.2.0 *
  • backports.functools-lru-cache1.6.4 *
  • bcbio-gff0.6.9 *
  • beautifulsoup44.10.0 *
  • biom-format2.1.10 *
  • biopython1.79 *
  • bleach4.1.0 *
  • blinker1.4 *
  • bokeh2.4.2 *
  • brotlipy0.7.0 *
  • bx-python0.8.13 *
  • cachetools5.0.0 *
  • cairocffi1.2.0 *
  • certifi2022.6.15 *
  • cffi1.15.0 *
  • chardet4.0.0 *
  • charset-normalizer2.0.12 *
  • click-log0.4.0 *
  • click8.0.4 *
  • cloudpickle2.0.0 *
  • colorcet3.0.0 *
  • crc64iso0.0.2 *
  • cryptography36.0.1 *
  • cycler0.11.0 *
  • debugpy1.5.1 *
  • decorator5.1.1 *
  • defusedxml0.7.1 *
  • deprecation2.1.0 *
  • distlib0.3.6 *
  • dna-features-viewer3.1.1 *
  • entrypoints0.4 *
  • executing0.8.3 *
  • fastdist1.1.2 *
  • filelock3.8.0 *
  • flatbuffers1.12 *
  • flit_core3.7.1 *
  • fonttools4.33.3 *
  • frozenlist1.3.0 *
  • future0.18.2 *
  • gast0.3.3 *
  • gensim4.2.0 *
  • google-auth-oauthlib0.4.6 *
  • google-auth2.6.6 *
  • google-pasta0.2.0 *
  • graphbin1.4 *
  • grpcio1.32.0 *
  • h5py2.10.0 *
  • hmmlearn0.2.7 *
  • holoviews1.14.8 *
  • huggingface-hub0.9.1 *
  • hyperopt0.2.7 *
  • idna3.3 *
  • igraph0.9.9 *
  • importlib-metadata4.11.3 *
  • importlib-resources5.4.0 *
  • iniconfig1.1.1 *
  • ipykernel6.9.2 *
  • ipython-genutils0.2.0 *
  • ipython8.1.1 *
  • jedi0.18.1 *
  • joblib1.0.1 *
  • json50.9.5 *
  • jsonschema4.4.0 *
  • jupyter-client7.1.2 *
  • jupyter-core4.9.2 *
  • jupyter-server1.15.6 *
  • jupyterlab-pygments0.1.2 *
  • jupyterlab-server2.11.0 *
  • jupyterlab3.3.2 *
  • keras2.10.0 *
  • kiwisolver1.4.2 *
  • kraken-biom1.2.0 *
  • labelprop0.1.3 *
  • libclang14.0.6 *
  • llvmlite0.38.0 *
  • matplotlib-inline0.1.3 *
  • matplotlib3.5.2 *
  • mistune0.8.4 *
  • multidict6.0.2 *
  • munkres1.1.4 *
  • nbclassic0.3.7 *
  • nbclient0.5.13 *
  • nbconvert6.4.4 *
  • nbformat5.2.0 *
  • ncafs0.2 *
  • nest-asyncio1.5.4 *
  • networkx2.8.2 *
  • notebook-shim0.1.0 *
  • notebook6.4.10 *
  • numba0.55.1 *
  • numpy1.19.5 *
  • oauthlib3.2.0 *
  • opt-einsum3.3.0 *
  • packaging21.3 *
  • pandas1.2.5 *
  • pandocfilters1.5.0 *
  • panel0.12.6 *
  • param1.12.0 *
  • parso0.8.3 *
  • pexpect4.8.0 *
  • pickle50.0.11 *
  • pickleshare0.7.5 *
  • pip22.3.1 *
  • plotting0.0.7 *
  • pluggy1.0.0 *
  • prettytable3.4.1 *
  • prometheus-client0.13.1 *
  • prompt-toolkit3.0.27 *
  • protobuf3.19.6 *
  • psutil5.9.0 *
  • ptyprocess0.7.0 *
  • pure-eval0.2.2 *
  • py1.11.0 *
  • py4j0.10.9.5 *
  • pyOpenSSL22.0.0 *
  • pyasn1-modules0.2.7 *
  • pyasn10.4.8 *
  • pycparser2.21 *
  • pyct0.4.8 *
  • pyhard1.9.3 *
  • pyhhmm2.0.1 *
  • pyispace0.3.2 *
  • pynndescent0.5.7 *
  • pyparsing3.0.7 *
  • pyrsistent0.18.1 *
  • pysam0.16.0.1 *
  • pysan0.2.4 *
  • pytest7.1.1 *
  • python-Levenshtein0.12.2 *
  • python-dateutil2.8.2 *
  • python-lzo1.14 *
  • pytz2021.3 *
  • pyu2f0.1.5 *
  • pyviz-comms2.1.0 *
  • pyzmq22.3.0 *
  • regex2022.9.11 *
  • requests-oauthlib1.3.1 *
  • requests2.27.1 *
  • rsa4.8 *
  • scikit-learn1.0.2 *
  • scipy1.8.1 *
  • seaborn0.11.2 *
  • seqtk0.5.0 *
  • setuptools65.4.1 *
  • six1.15.0 *
  • sklearn0.0 *
  • smart-open6.0.0 *
  • sniffio1.2.0 *
  • soupsieve2.3.1 *
  • stack-data0.2.0 *
  • stellargraph1.2.1 *
  • superfocus0.35 *
  • tensorboard-data-server0.6.0 *
  • tensorboard-plugin-wit1.8.1 *
  • tensorboard2.9.0 *
  • tensorflow-estimator2.4.0 *
  • tensorflow-io-gcs-filesystem0.27.0 *
  • tensorflow2.4.1 *
  • termcolor1.1.0 *
  • terminado0.13.3 *
  • testpath0.6.0 *
  • texttable1.6.4 *
  • threadpoolctl2.2.0 *
  • tokenizer3.4.1 *
  • tokenizers0.12.1 *
  • torch1.12.1 *
  • tornado6.1 *
  • tqdm4.64.0 *
  • traitlets5.1.1 *
  • transformers4.21.3 *
  • trimesh3.10.5 *
  • typer0.4.0 *
  • typing-extensions3.7.4.3 *
  • umap-learn0.5.3 *
  • umap0.1.1 *
  • unicodedata214.0.0 *
  • urllib31.26.9 *
  • wcwidth0.2.5 *
  • webencodings0.5.1 *
  • websocket-client1.3.1 *
  • wheel0.37.1 *
  • wrapt1.12.1 *
  • yarl1.7.2 *
  • zipp3.7.0 *
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
.github/workflows/testing.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
environment.yml pypi
setup.py pypi
  • alive-progress >=3.0.1
  • biopython >=1.79
  • click *
  • joblib *
  • loguru *
  • numpy <=1.26.4
  • pandas *
  • requests >=2.25.1
  • scikit-learn <=1.2.2
  • tensorflow-macos ==2.9.1