phylo2vec

Phylo2Vec: a vector representation for binary trees

https://github.com/sbhattlab/phylo2vec

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Phylo2Vec: a vector representation for binary trees

Basic Info
Statistics
  • Stars: 10
  • Watchers: 1
  • Forks: 7
  • Open Issues: 2
  • Releases: 8
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Phylo2Vec

PyPI version Documentation DOI

LGPL-3.0 License

pre-commit.ci status CI Python CI Rust CI R

Phylo2Vec (or phylo2vec) is a high-performance software package for encoding, manipulating, and analysing binary phylogenetic trees. At its core, the package contains representation of binary trees, which defines a bijection from any tree topology with 𝑛 leaves into an integer vector of size 𝑛 − 1. Compared to the traditional Newick format, phylo2vec was designed with fast sampling, fast conversion/compression from Newick-format trees to the Phylo2Vec format, and rapid tree comparison in mind.

This current version features a core implementation in Rust, providing significant performance improvements and memory efficiency while remaining available in Python (superseding the version described in the original paper) and R via dedicated wrappers, making it accessible to a broad audience in the bioinformatics community.

Link to the paper: https://doi.org/10.1093/sysbio/syae030

Installation

Pip

The easiest way to install the standard Python package is using pip:

bash pip install phylo2vec

Several optimization schemes based on Phylo2Vec are also available, but require extra dependencies. (See this notebook for a demo). To avoid bloating the standard package, these dependencies must be installed separately. To do so, run:

bash pip install phylo2vec[opt]

Manual installation

  • We recommend setting up pixi package management tool.
  • Clone the repository and install using pixi:

bash git clone https://github.com/sbhattlab/phylo2vec.git cd phylo2vec pixi run -e py-phylo2vec install-python

This will compile and install the package as the core functionality is written in Rust.

Installing R package

Option 1: from a release (Windows, Mac, Ubuntu >= 22.04)

Retrieve one of the compiled binaries from the releases that fits your OS. Once the file is downloaded, simply run install.packages in your R command line.

R install.packages("/path/to/package_file", repos = NULL, type = 'source')

Option 2: using devtools

⚠️ This requires installing Rust to build the core package.

R devtools::install_github("sbhattlab/phylo2vec", subdir="./r-phylo2vec", build = FALSE)

Note: to download a specific version, use:

R devtools::install_github("sbhattlab/phylo2vec@vX.Y.Z", subdir="./r-phylo2vec", build = FALSE)

Option 3: manual installation

⚠️ This requires installing Rust to build the core package.

Clone the repository and run the following install.packages in your R command line.

Note: to download a specific version, you can use git checkout to a desired tag.

bash git clone https://github.com/sbhattlab/phylo2vec cd phylo2vec

R install.packages("./r-phylo2vec", repos = NULL, type = 'source')

Basic Usage

Python

Conversion between Newick and vector representations

```python import numpy as np from phylo2vec import fromnewick, tonewick

Convert a vector to Newick string

v = np.array([0, 1, 2, 3, 4]) newick = to_newick(v) # '(0,(1,(2,(3,(4,5)6)7)8)9)10;'

Convert Newick string back to vector

vconverted = fromnewick(newick) # array([0, 1, 2, 3, 4], dtype=int16) ```

Tree Manipulation

```python from phylo2vec.utils.vector import addleaf, removeleaf, rerootatrandom

Add a leaf to an existing tree

vnew = addleaf(v, 2) # Add a leaf to the third position

Remove a leaf

vreduced = removeleaf(v, 1) # Remove the second leaf

Random rerooting

vrerooted = rerootat_random(v) ```

Optimization

To run the hill climbing-based optimisation scheme presented in the original Phylo2Vec paper, run:

```python

A hill-climbing scheme to optimize Phylo2Vec vectors

from phylo2vec.opt import HillClimbing

hc = HillClimbing(verbose=True) hcresult = hc.fit("/path/to/yourfasta_file.fa") ```

Command-line interface (CLI)

We also provide a command-line interface for quick experimentation on phylo2vec-derived objects.

To see the available functions, run:

bash phylo2vec --help

Examples:

bash phylo2vec samplev 5 # Sample a vector with 5 leaves phylo2vec samplem 5 # Sample a matrix with 5 leaves phylo2vec from_newick '((0,1),2);' # Convert a Newick to a vector phylo2vec from_newick '((0:0.3,1:0.1):0.5,2:0.4);' # Convert a Newick to a matrix phylo2vec to_newick 0,1,2 # Convert a vector to Newick phylo2vec to_newick $'0.0,1.0,2.0\n0.0,3.0,4.0' # Convert a matrix to Newick

Documentation

For comprehensive documentation, tutorials, and API reference, visit: https://phylo2vec.readthedocs.io

How to Contribute

We welcome contributions to Phylo2Vec! Here's how you can help:

  1. Fork the repository and create your branch from main
  2. Make your changes and add tests if applicable
  3. Run the tests to ensure they pass
  4. Submit a pull request with a detailed description of your changes

Please make sure to follow our coding standards and write appropriate tests for new features.

Thanks to our contributors so far!

Contributors

License

This project is distributed under the GNU Lesser General Public License v3.0 (LGPL).

Citation

If you use Phylo2Vec in your research, please cite:

bibtex @article{10.1093/sysbio/syae030, author = {Penn, Matthew J and Scheidwasser, Neil and Khurana, Mark P and Duchêne, David A and Donnelly, Christl A and Bhatt, Samir}, title = {Phylo2Vec: a vector representation for binary trees}, journal = {Systematic Biology}, year = {2024}, month = {03}, doi = {10.1093/sysbio/syae030}, url = {https://doi.org/10.1093/sysbio/syae030}, }

Related Work

Owner

  • Name: sbhattlab
  • Login: sbhattlab
  • Kind: organization

JOSS Publication

phylo2vec: a library for vector-based phylogenetic tree manipulation
Published
October 28, 2025
Volume 10, Issue 114, Page 9040
Authors
Neil Scheidwasser ORCID
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Ayush Nag ORCID
eScience Institute, University of Washington, Seattle, United States
Matthew J. Penn ORCID
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Anthony Jakob ORCID
Independent researcher
Frederik Mølkjær Andersen ORCID
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Mark Poulsen Khurana ORCID
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Landung Setiawan ORCID
eScience Institute, University of Washington, Seattle, United States
David A. Duchêne ORCID
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Samir Bhatt ORCID
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark, MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
Editor
Abhishek Tiwari ORCID
Tags
bioinformatics phylogenetics binary tree

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Scheidwasser"
    given-names: "Neil"
    orcid: "https://orcid.org/0000-0001-9922-0289"
  - family-names: "Nag"
    given-names: "Ayush"
    orcid: "https://orcid.org/0009-0008-1790-597X"
  - family-names: "Setiawan"
    given-names: "Landung"
    orcid: "https://orcid.org/0000-0002-1624-2667"
  - family-names: "Gordon"
    given-names: "Madeline"
    orcid: "https://orcid.org/0009-0003-6220-7218"
title: "Phylo2Vec: a vector representation for binary trees"
url: "https://phylo2vec.readthedocs.io"
repository-code: "https://github.com/sbhattlab/phylo2vec"
# -------------------------------
# Zenodo DOI will be added after the first release
# doi: 10.xxxx/zenodo.xxxxxxx
# -------------------------------
# Note for `version` and `date-released` meta below
# Pointing to a specific release/version will
# require having to update this file with each release
# version: x.x.x
# date-released: xxxx-xx-xx
# -------------------------------
preferred-citation:
  type: article
  authors:
  - family-names: "Penn"
    given-names: "Matthew J"
  - family-names: "Scheidwasser"
    given-names: "Neil"
  - family-names: "Khurana"
    given-names: "Mark P"
  - family-names: "Duchêne"
    given-names: "David A"
  - family-names: "Donnelly"
    given-names: "Christl A"
  - family-names: "Bhatt"
    given-names: "Samir"
  journal: "Systematic Biology"
  doi: https://doi.org/10.1093/sysbio/syae030
  title: "Phylo2Vec: a vector representation for binary trees"
  month: 03
  year: 2025

GitHub Events

Total
  • Create event: 16
  • Issues event: 17
  • Release event: 6
  • Watch event: 4
  • Delete event: 12
  • Issue comment event: 10
  • Push event: 70
  • Pull request review event: 61
  • Pull request review comment event: 43
  • Pull request event: 122
  • Fork event: 2
Last Year
  • Create event: 16
  • Issues event: 17
  • Release event: 6
  • Watch event: 4
  • Delete event: 12
  • Issue comment event: 10
  • Push event: 70
  • Pull request review event: 61
  • Pull request review comment event: 43
  • Pull request event: 122
  • Fork event: 2

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 380 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 20
  • Total maintainers: 1
pypi.org: phylo2vec

Phylo2Vec: integer vector representation of binary (phylogenetic) trees

  • Versions: 20
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 380 Last month
Rankings
Dependent packages count: 10.0%
Average: 38.8%
Dependent repos count: 67.6%
Maintainers (1)
Last synced: 6 months ago