phables

🫧🧬 From fragmented assemblies to high-quality bacteriophage genomes

https://github.com/vini2/phables

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • ✓
    CITATION.cff file
    Found CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • ✓
    DOI references
    Found 37 DOI reference(s) in README
  • â—‹
    Academic publication links
  • â—‹
    Committers with academic emails
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary

Keywords

assembly-graphs bacteriophages bioinformatics genomics metagenomics
Last synced: 4 months ago · JSON representation ·

Repository

🫧🧬 From fragmented assemblies to high-quality bacteriophage genomes

Basic Info
Statistics
  • Stars: 81
  • Watchers: 2
  • Forks: 5
  • Open Issues: 7
  • Releases: 18
Topics
assembly-graphs bacteriophages bioinformatics genomics metagenomics
Created over 3 years ago · Last pushed 5 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

phables logo phables logo

Phables: from fragmented assemblies to high-quality bacteriophage genomes

DOI GitHub Code style: black GitHub last commit (branch) install with bioconda Conda Conda PyPI version Downloads CI CodeQL Documentation Status

Phables is a tool developed to resolve bacteriophage genomes using assembly graphs of viral metagenomic data. It models phage-like components in the viral metagenomic assembly as flow networks, models as a minimum flow decomposition problem and resolves genomic paths corresponding to flow paths determined. Phables uses the Minimum Flow Decomposition via Integer Linear Programming implementation to obtain the flow paths.

For detailed instructions on installation and usage, please refer to the documentation hosted at Read the Docs.

Phables is available on Bioconda at https://anaconda.org/bioconda/phables and on PyPI at https://pypi.org/project/phables/. Feel free to pick your package manager, but we recommend that you use conda.

NEW: Phables is now available as a Docker container from Docker hub. Click here for more details.

Setting up Phables

Option 1: Installing Phables using conda (recommended)

You can install Phables from Bioconda at https://anaconda.org/bioconda/phables. Make sure you have conda installed.

```bash

create conda environment and install phables

conda create -n phables -c conda-forge -c anaconda -c bioconda phables

activate environment

conda activate phables ```

Now you can go to Setting up Gurobi to configure Gurobi.

Option 2: Installing Phables using pip

You can install Phables from PyPI at https://pypi.org/project/phables/. Make sure you have pip and mamba installed.

bash pip install phables

Now you can go to Setting up Gurobi to configure Gurobi.

Setting up Gurobi

The MFD implementation uses the linear programming solver Gurobi. The phables conda environment and pip setup does not include Gurobi. You have to install Gurobi using one of the following commands depending on your package manager.

```bash

conda

conda install -c gurobi gurobi

pip

pip install gurobipy ```

To handle large models without any model size limitations, once you have installed Gurobi, you have to activate the (academic) license and add the key using the following command. You only have to do this once.

bash grbgetkey <KEY>

You can refer to further instructions at https://www.gurobi.com/academia/academic-program-and-licenses/.

Test the installation

After setting up, run the following command to print out the Phables help message.

bash phables --help

Quick Start Guide

Phables is powered by Snaketool which packs in all the setup, testing, preprocessing and running steps into an easy-to-use pipeline.

Setup the databases

```bash

Download and setup the databases - you only have to do this once

phables install ```

Run on test data

bash phables test

Run on your own data

```bash

Run Phables using short read data

phables run --input assembly_graph.gfa --reads fastq/ --threads 8

Run Phables using long read data

phables run --input assembly_graph.gfa --reads fastq/ --threads 8 --longreads ```

Please refer to the documentation hosted at Read the Docs for further information on how to run Phables.

Issues and Questions

If you want to test (or break) Phables give it a try and report any issues and suggestions under Phables Issues.

If you come across any questions, please have a look at the Phables FAQ page. If your question is not here, feel free to post it under Phables Issues.

Contributing to Phables

Are you interested in contributing to the Phables project? If so, you can check out the contributing guidelines in CONTRIBUTING.md.

Acknowledgement

Phables uses the Gurobi implementation of MFD-ILP and code snippets from STRONG, METAMVGL, GraphBin, MetaCoAG and Hecatomb. Special thanks are owed to Ryan Wick for developing Bandage to visualise assembly graphs, which I heavily rely upon to investigate, develop and optimise my methods. The Phables logo was designed by Amber Skye.

Citation

Phables is published in Bioinformatics at DOI: 10.1093/bioinformatics/btad586.

If you use Phables in your work, please cite Phables as,

Vijini Mallawaarachchi, Michael J Roach, Przemyslaw Decewicz, Bhavya Papudeshi, Sarah K Giles, Susanna R Grigson, George Bouras, Ryan D Hesse, Laura K Inglis, Abbey L K Hutton, Elizabeth A Dinsdale, Robert A Edwards, Phables: from fragmented assemblies to high-quality bacteriophage genomes, Bioinformatics, Volume 39, Issue 10, October 2023, btad586, https://doi.org/10.1093/bioinformatics/btad586

bibtex @article{10.1093/bioinformatics/btad586, author = {Mallawaarachchi, Vijini and Roach, Michael J and Decewicz, Przemyslaw and Papudeshi, Bhavya and Giles, Sarah K and Grigson, Susanna R and Bouras, George and Hesse, Ryan D and Inglis, Laura K and Hutton, Abbey L K and Dinsdale, Elizabeth A and Edwards, Robert A}, title = "{Phables: from fragmented assemblies to high-quality bacteriophage genomes}", journal = {Bioinformatics}, volume = {39}, number = {10}, pages = {btad586}, year = {2023}, month = {09}, abstract = "{Microbial communities have a profound impact on both human health and various environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of challenges in viral assembly, fragmentation of genomes can occur, and existing tools may recover incomplete genome fragments. Therefore, the identification and characterization of novel phage genomes remain a challenge, leading to the need of improved approaches for phage genome recovery.We introduce Phables, a new computational method to resolve phage genomes from fragmented viral metagenome assemblies. Phables identifies phage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that Phables recovers on average over 49\\% more high-quality phage genomes compared to existing viral identification tools. Furthermore, Phables can resolve variant phage genomes with over 99\\% average nucleotide identity, a distinction that existing tools are unable to make.Phables is available on GitHub at https://github.com/Vini2/phables.}", issn = {1367-4811}, doi = {10.1093/bioinformatics/btad586}, url = {https://doi.org/10.1093/bioinformatics/btad586}, eprint = {https://academic.oup.com/bioinformatics/article-pdf/doi/10.1093/bioinformatics/btad586/51972145/btad586.pdf}, }

Also, please cite the following tools/databases used by Phables.

  • Roach MJ, Pierce-Ward NT, Suchecki R, Mallawaarachchi V, Papudeshi B, et al. Ten simple rules and a template for creating workflows-as-applications. PLOS Computational Biology 18(12) (2022): e1010705. https://doi.org/10.1371/journal.pcbi.1010705
  • Terzian P, Olo Ndela E, Galiez C, Lossouarn J, Pérez Bucio RE, Mom R, Toussaint A, Petit MA, Enault F. PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR Genomics and Bioinformatics, Volume 3, Issue 3, lqab067 (2021). https://doi.org/10.1093/nargab/lqab067
  • Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017). https://doi.org/10.1038/nbt.3988
  • Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100 (2018). https://doi.org/10.1093/bioinformatics/bty191
  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools, Bioinformatics, Volume 25, Issue 16, Pages 2078–2079 (2009). https://doi.org/10.1093/bioinformatics/btp352
  • Woodcroft BJ, Newell R, CoverM: Read coverage calculator for metagenomics (2017). https://github.com/wwood/CoverM
  • Roach, M. J., Hart, B. J., Beecroft, S. J., Papudeshi, B., Inglis, L. K., Grigson, S. R., Mallawaarachchi, V., Bouras, G., & Edwards, R. A. Koverage: Read-coverage analysis for massive (meta)genomics datasets. Journal of Open Source Software, 9(94), 6235, (2024). https://doi.org/10.21105/joss.06235
  • Hagberg AA, Schult DA, and Swart PJ. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11–15 (2008).
  • Gurobi Optimization. https://www.gurobi.com/.

Owner

  • Name: Vijini Mallawaarachchi
  • Login: Vini2
  • Kind: user
  • Location: Adelaide, Australia
  • Company: Flinders University

Research Associate at Flinders University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite our article in Bioinformatics as below."
authors:
- family-names: "Mallawaarachchi"
  given-names: "Vijini"
  orcid: "https://orcid.org/0000-0002-2651-8719"
- family-names: "Roach"
  given-names: "Michael J."
  orcid: "https://orcid.org/0000-0003-1488-5148"
- family-names: "Decewicz"
  given-names: "Przemyslaw"
  orcid: "https://orcid.org/0000-0002-5621-7124"
- family-names: "Papudeshi"
  given-names: "Bhavya"
  orcid: "https://orcid.org/0000-0001-5359-3100"
- family-names: "Giles"
  given-names: "Sarak K."
  orcid: "https://orcid.org/0000-0002-4395-060X"
- family-names: "Grigson"
  given-names: "Susanna R."
  orcid: "https://orcid.org/0000-0003-4738-3451"
- family-names: "Bouras"
  given-names: "George"
  orcid: "https://orcid.org/0000-0002-5885-4186"
- family-names: "Hesse"
  given-names: "Ryan D."
  orcid: "https://orcid.org/0000-0001-9366-5631"
- family-names: "Inglis"
  given-names: "Laura K."
  orcid: "https://orcid.org/0000-0001-7919-8563"
- family-names: "Hutton"
  given-names: "Abbey LK."
  orcid: "https://orcid.org/0000-0002-2474-1327"
- family-names: "Dinsdale"
  given-names: "Elizabeth A."
  orcid: "https://orcid.org/0000-0002-2177-203X"
- family-names: "Edwards"
  given-names: "Robert A."
  orcid: "https://orcid.org/0000-0001-8383-8949"
title: "Phables: from fragmented assemblies to high-quality bacteriophage genomes"
doi: 10.1093/bioinformatics/btad586
date-released: 2017-12-18
url: "https://github.com/github-linguist/linguist"
preferred-citation:
  type: article
  authors:
  - family-names: "Mallawaarachchi"
    given-names: "Vijini"
    orcid: "https://orcid.org/0000-0002-2651-8719"
  - family-names: "Roach"
    given-names: "Michael J."
    orcid: "https://orcid.org/0000-0003-1488-5148"
  - family-names: "Decewicz"
    given-names: "Przemyslaw"
    orcid: "https://orcid.org/0000-0002-5621-7124"
  - family-names: "Papudeshi"
    given-names: "Bhavya"
    orcid: "https://orcid.org/0000-0001-5359-3100"
  - family-names: "Giles"
    given-names: "Sarak K."
    orcid: "https://orcid.org/0000-0002-4395-060X"
  - family-names: "Grigson"
    given-names: "Susanna R."
    orcid: "https://orcid.org/0000-0003-4738-3451"
  - family-names: "Bouras"
    given-names: "George"
    orcid: "https://orcid.org/0000-0002-5885-4186"
  - family-names: "Hesse"
    given-names: "Ryan D."
    orcid: "https://orcid.org/0000-0001-9366-5631"
  - family-names: "Inglis"
    given-names: "Laura K."
    orcid: "https://orcid.org/0000-0001-7919-8563"
  - family-names: "Hutton"
    given-names: "Abbey LK."
    orcid: "https://orcid.org/0000-0002-2474-1327"
  - family-names: "Dinsdale"
    given-names: "Elizabeth A."
    orcid: "https://orcid.org/0000-0002-2177-203X"
  - family-names: "Edwards"
    given-names: "Robert A."
    orcid: "https://orcid.org/0000-0001-8383-8949"
  doi: "10.1093/bioinformatics/btad586"
  journal: "Bioinformatics"
  month: 9
  title: "Phables: from fragmented assemblies to high-quality bacteriophage genomes"
  start: "btad586"
  issue: 10
  volume: 39
  year: 2023

GitHub Events

Total
  • Issues event: 7
  • Watch event: 4
  • Issue comment event: 5
Last Year
  • Issues event: 7
  • Watch event: 4
  • Issue comment event: 5

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 586
  • Total Committers: 3
  • Avg Commits per committer: 195.333
  • Development Distribution Score (DDS): 0.067
Past Year
  • Commits: 4
  • Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Vijini Mallawaarachchi v****i@g****m 547
Michael Roach b****e@g****m 36
linsalrob r****s@g****m 3

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 48
  • Total pull requests: 15
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 22 hours
  • Total issue authors: 23
  • Total pull request authors: 3
  • Average comments per issue: 2.4
  • Average comments per pull request: 0.13
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Issue authors: 5
  • Pull request authors: 0
  • Average comments per issue: 0.4
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Vini2 (19)
  • pdec (3)
  • ZarulHanifah (2)
  • minhtrung1997 (2)
  • ilnamkang (2)
  • adijatj (1)
  • bhagavadgitadu22 (1)
  • shenwei356 (1)
  • ellasiera (1)
  • kalonji08 (1)
  • JayWichanan (1)
  • miryamcarrillo98 (1)
  • xtc2002 (1)
  • Evapatrick (1)
  • rdenise (1)
Pull Request Authors
  • beardymcjohnface (11)
  • Vini2 (5)
  • linsalrob (1)
Top Labels
Issue Labels
enhancement (18) documentation (2) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 126 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 24
  • Total maintainers: 1
pypi.org: phables

Phables: from fragmented assemblies to high-quality bacteriophage genomes

  • Versions: 24
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 126 Last month
Rankings
Stargazers count: 10.1%
Dependent packages count: 10.1%
Average: 16.6%
Forks count: 16.9%
Dependent repos count: 21.5%
Downloads: 24.5%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/testing.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • codecov/codecov-action v3 composite
docs/requirements.txt pypi
  • Jinja2 >=2.10.2
  • Markdown >=3.2.1,<3.4
  • PyYAML >=5.2
  • babel >=2.9.0
  • click >=7.0
  • ghp-import >=1.0
  • importlib_metadata >=4.3
  • jinja2 ==3.0.3
  • mdx_gh_links >=0.2
  • mergedeep >=1.3.4
  • mkdocs >=1.3.1
  • mkdocs-material *
  • mkdocs-redirects >=1.0.1
  • packaging >=20.5
  • pygments >=2.12
  • pymdown-extensions *
  • pyyaml_env_tag >=0.1
  • watchdog >=2.0.0
setup.py pypi
  • biopython *
  • click *
  • gurobipy *
  • more-itertools *
  • networkx *
  • numpy *
  • pandas *
  • pysam *
  • python-igraph *
  • scipy *
  • tqdm *
.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/pypi-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
build/environment.yml pypi