phylo2vec
Phylo2Vec: a vector representation for binary trees
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 12 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Phylo2Vec: a vector representation for binary trees
Basic Info
- Host: GitHub
- Owner: sbhattlab
- License: lgpl-3.0
- Language: Rust
- Default Branch: main
- Homepage: https://phylo2vec.readthedocs.io
- Size: 1.65 MB
Statistics
- Stars: 10
- Watchers: 1
- Forks: 7
- Open Issues: 2
- Releases: 8
Metadata Files
README.md
Phylo2Vec
Phylo2Vec (or phylo2vec) is a high-performance software package for encoding, manipulating, and analysing binary phylogenetic trees. At its core, the package contains representation of binary trees, which defines a bijection from any tree topology with 𝑛 leaves into an integer vector of size 𝑛 − 1. Compared to the traditional Newick format, phylo2vec was designed with fast sampling, fast conversion/compression from Newick-format trees to the Phylo2Vec format, and rapid tree comparison in mind.
This current version features a core implementation in Rust, providing significant performance improvements and memory efficiency while remaining available in Python (superseding the version described in the original paper) and R via dedicated wrappers, making it accessible to a broad audience in the bioinformatics community.
Link to the paper: https://doi.org/10.1093/sysbio/syae030
Installation
Pip
The easiest way to install the standard Python package is using pip:
bash
pip install phylo2vec
Several optimization schemes based on Phylo2Vec are also available, but require extra dependencies. (See this notebook for a demo). To avoid bloating the standard package, these dependencies must be installed separately. To do so, run:
bash
pip install phylo2vec[opt]
Manual installation
- We recommend setting up pixi package management tool.
- Clone the repository and install using
pixi:
bash
git clone https://github.com/sbhattlab/phylo2vec.git
cd phylo2vec
pixi run -e py-phylo2vec install-python
This will compile and install the package as the core functionality is written in Rust.
Installing R package
Option 1: from a release (Windows, Mac, Ubuntu >= 22.04)
Retrieve one of the compiled binaries from the
releases that fits your OS.
Once the file is downloaded, simply run install.packages in your R command
line.
R
install.packages("/path/to/package_file", repos = NULL, type = 'source')
Option 2: using devtools
⚠️ This requires installing Rust to build the core package.
R
devtools::install_github("sbhattlab/phylo2vec", subdir="./r-phylo2vec", build = FALSE)
Note: to download a specific version, use:
R
devtools::install_github("sbhattlab/phylo2vec@vX.Y.Z", subdir="./r-phylo2vec", build = FALSE)
Option 3: manual installation
⚠️ This requires installing Rust to build the core package.
Clone the repository and run the following install.packages in your R command
line.
Note: to download a specific version, you can use git checkout to a desired
tag.
bash
git clone https://github.com/sbhattlab/phylo2vec
cd phylo2vec
R
install.packages("./r-phylo2vec", repos = NULL, type = 'source')
Basic Usage
Python
Conversion between Newick and vector representations
```python import numpy as np from phylo2vec import fromnewick, tonewick
Convert a vector to Newick string
v = np.array([0, 1, 2, 3, 4]) newick = to_newick(v) # '(0,(1,(2,(3,(4,5)6)7)8)9)10;'
Convert Newick string back to vector
vconverted = fromnewick(newick) # array([0, 1, 2, 3, 4], dtype=int16) ```
Tree Manipulation
```python from phylo2vec.utils.vector import addleaf, removeleaf, rerootatrandom
Add a leaf to an existing tree
vnew = addleaf(v, 2) # Add a leaf to the third position
Remove a leaf
vreduced = removeleaf(v, 1) # Remove the second leaf
Random rerooting
vrerooted = rerootat_random(v) ```
Optimization
To run the hill climbing-based optimisation scheme presented in the original Phylo2Vec paper, run:
```python
A hill-climbing scheme to optimize Phylo2Vec vectors
from phylo2vec.opt import HillClimbing
hc = HillClimbing(verbose=True) hcresult = hc.fit("/path/to/yourfasta_file.fa") ```
Command-line interface (CLI)
We also provide a command-line interface for quick experimentation on phylo2vec-derived objects.
To see the available functions, run:
bash
phylo2vec --help
Examples:
bash
phylo2vec samplev 5 # Sample a vector with 5 leaves
phylo2vec samplem 5 # Sample a matrix with 5 leaves
phylo2vec from_newick '((0,1),2);' # Convert a Newick to a vector
phylo2vec from_newick '((0:0.3,1:0.1):0.5,2:0.4);' # Convert a Newick to a matrix
phylo2vec to_newick 0,1,2 # Convert a vector to Newick
phylo2vec to_newick $'0.0,1.0,2.0\n0.0,3.0,4.0' # Convert a matrix to Newick
Documentation
For comprehensive documentation, tutorials, and API reference, visit: https://phylo2vec.readthedocs.io
How to Contribute
We welcome contributions to Phylo2Vec! Here's how you can help:
- Fork the repository and create your branch from
main - Make your changes and add tests if applicable
- Run the tests to ensure they pass
- Submit a pull request with a detailed description of your changes
Please make sure to follow our coding standards and write appropriate tests for new features.
Thanks to our contributors so far!
License
This project is distributed under the GNU Lesser General Public License v3.0 (LGPL).
Citation
If you use Phylo2Vec in your research, please cite:
bibtex
@article{10.1093/sysbio/syae030,
author = {Penn, Matthew J and Scheidwasser, Neil and Khurana, Mark P and Duchêne, David A and Donnelly, Christl A and Bhatt, Samir},
title = {Phylo2Vec: a vector representation for binary trees},
journal = {Systematic Biology},
year = {2024},
month = {03},
doi = {10.1093/sysbio/syae030},
url = {https://doi.org/10.1093/sysbio/syae030},
}
Related Work
- Preprint repository (core functions are deprecated): https://github.com/Neclow/phylo2vec_preprint
- C++ version (deprecated): https://github.com/Neclow/phylo2vec_cpp
- GradME: https://github.com/Neclow/GradME = phylo2vec + minimum evolution + gradient descent
Owner
- Name: sbhattlab
- Login: sbhattlab
- Kind: organization
- Repositories: 1
- Profile: https://github.com/sbhattlab
JOSS Publication
phylo2vec: a library for vector-based phylogenetic tree manipulation
Authors
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Section of Health Data Science and AI, University of Copenhagen, Copenhagen, Denmark
Tags
bioinformatics phylogenetics binary treeCitation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Scheidwasser"
given-names: "Neil"
orcid: "https://orcid.org/0000-0001-9922-0289"
- family-names: "Nag"
given-names: "Ayush"
orcid: "https://orcid.org/0009-0008-1790-597X"
- family-names: "Setiawan"
given-names: "Landung"
orcid: "https://orcid.org/0000-0002-1624-2667"
- family-names: "Gordon"
given-names: "Madeline"
orcid: "https://orcid.org/0009-0003-6220-7218"
title: "Phylo2Vec: a vector representation for binary trees"
url: "https://phylo2vec.readthedocs.io"
repository-code: "https://github.com/sbhattlab/phylo2vec"
# -------------------------------
# Zenodo DOI will be added after the first release
# doi: 10.xxxx/zenodo.xxxxxxx
# -------------------------------
# Note for `version` and `date-released` meta below
# Pointing to a specific release/version will
# require having to update this file with each release
# version: x.x.x
# date-released: xxxx-xx-xx
# -------------------------------
preferred-citation:
type: article
authors:
- family-names: "Penn"
given-names: "Matthew J"
- family-names: "Scheidwasser"
given-names: "Neil"
- family-names: "Khurana"
given-names: "Mark P"
- family-names: "Duchêne"
given-names: "David A"
- family-names: "Donnelly"
given-names: "Christl A"
- family-names: "Bhatt"
given-names: "Samir"
journal: "Systematic Biology"
doi: https://doi.org/10.1093/sysbio/syae030
title: "Phylo2Vec: a vector representation for binary trees"
month: 03
year: 2025
GitHub Events
Total
- Create event: 16
- Issues event: 17
- Release event: 6
- Watch event: 4
- Delete event: 12
- Issue comment event: 10
- Push event: 70
- Pull request review event: 61
- Pull request review comment event: 43
- Pull request event: 122
- Fork event: 2
Last Year
- Create event: 16
- Issues event: 17
- Release event: 6
- Watch event: 4
- Delete event: 12
- Issue comment event: 10
- Push event: 70
- Pull request review event: 61
- Pull request review comment event: 43
- Pull request event: 122
- Fork event: 2
Packages
- Total packages: 1
-
Total downloads:
- pypi 380 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 20
- Total maintainers: 1
pypi.org: phylo2vec
Phylo2Vec: integer vector representation of binary (phylogenetic) trees
- Documentation: https://phylo2vec.readthedocs.io/
- License: GNU Lesser General Public License v3 (LGPLv3)
-
Latest release: 1.4.0
published 7 months ago