https://github.com/althonos/pysylph

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

https://github.com/althonos/pysylph

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

Basic Info
  • Host: GitHub
  • Owner: althonos
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Size: 1.11 MB
Statistics
  • Stars: 18
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License

README.md

🕊️ Pysylph Stars

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror Issues Docs Changelog Downloads

🗺️ Overview

sylph[1] is a method developed by Jim Shaw and Yun William Yu for fast and robust ANI querying or metagenomic profiling for metagenomic shotgun samples. It uses a statistical model based on Poisson coverage to compute coverage-adjusted ANI instead of naive ANI.

pysylph is a Python module, implemented using the PyO3 framework, that provides bindings to sylph. It directly links to the sylph code, which has the following advantages over CLI wrappers:

  • pre-built wheels: pysylph is distributed on PyPI and features pre-built wheels for common platforms, including x86-64 and Arm64.
  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pysylph as a dependency to your project, and stop worrying about the sylph binary being present on the end-user machine.
  • sans I/O: Everything happens in memory, in Python objects you control, making it easier to pass your sequences to pysylph without having to write them to a temporary file.

This library is still a work-in-progress, and in an experimental stage, with API breaks very likely between minor versions.

🔧 Installing

Pysylph can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 platforms, as well as the code required to compile from source with Rust and maturin: console $ pip install pysylph

🔖 Citation

Pysylph is scientific software, and builds on top of sylph. Please cite sylph if you are using it in an academic work, for instance as:

pysylph, a Python library binding to sylph (Shaw & Yu, 2024).

💡 Examples

🔨 Creating a database

A database is a collection of genomes sketched for fast querying.

Here is how to create a database into memory, using Biopython to load genomes:

```python sketcher = pysylph.Sketcher() sketches = []

for path in pathlib.Path(".").glob("*.fasta"): contigs = [ str(record.seq) for record in Bio.SeqIO.parse(path, "fasta") ] sketch = sketcher.sketch_genome(name=path.stem, contigs=contigs) sketches.append(sketch)

database = pysylph.Database(sketches) ```

Sketcher methods are re-entrant and can be used to sketch multiple genomes in parallel using for instance a ThreadPool.

📝 Saving a database

The database can be saved to the binary format used by the sylph binary as well:

python database.dump("genomes.syldb")

🗒️ Loading a database

A database previously created with sylph can be loaded transparently in pysylph:

python database = pysylph.Database.load("genomes.syldb")

📊 Sketching a query

Samples must also be sketched before they can be used to query a database. Here is how to sketch a sample made of single-ended reads stored in FASTQ format:

python reads = [str(record.seq) for record in Bio.SeqIO.parse("sample.fastq", "fastq")] sample = sketcher.sketch_single(name="sample", reads=reads)

🔬 Querying a database

Once a sample has been sketched, it can be used to query a database for ANI containment or taxonomic profiling:

python profiler = pysylph.Profiler() results = profiler.query(sample, database) # ANI containment results = profiler.profile(sample, database) # taxonomic profiling

Profiler methods are re-entrant and can be used to query a database with multiple samples in parallel using for instance a ThreadPool.

🔎 See Also

Computing ANI for closed genomes? You may also be interested in pyskani, a Python package for computing ANI binding to skani, which was developed by the same authors.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the MIT License. It contains some code included verbatim from the the sylph source code, which was written by Jim Shaw and is distributed under the terms of the MIT License as well. Source distributions of pysylph vendors additional sources under their own terms using the cargo vendor command.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original sylph authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.

📚 References

Owner

  • Name: Martin Larralde
  • Login: althonos
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL / LUMC, @zellerlab

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

GitHub Events

Total
  • Release event: 2
  • Watch event: 18
  • Push event: 22
  • Create event: 6
Last Year
  • Release event: 2
  • Watch event: 18
  • Push event: 22
  • Create event: 6

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 82
  • Total Committers: 1
  • Avg Commits per committer: 82.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 82
  • Committers: 1
  • Avg Commits per committer: 82.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Martin Larralde m****e@e****e 82
Committer Domains (Top 20 + Academic)
embl.de: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 294 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
pypi.org: pysylph

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 294 Last month
Rankings
Dependent packages count: 10.1%
Stargazers count: 23.8%
Average: 30.8%
Forks count: 32.0%
Dependent repos count: 57.1%
Maintainers (1)
Last synced: 8 months ago

Dependencies

.github/workflows/test.yml actions
  • actions/checkout v4 composite
  • actions/checkout v3 composite
  • actions/setup-python v5 composite
  • actions/setup-python v2 composite
  • dtolnay/rust-toolchain stable composite
Cargo.lock cargo
  • 138 dependencies
Cargo.toml cargo
.github/workflows/requirements.txt pypi
  • maturin *
pyproject.toml pypi
.github/workflows/publish.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • dtolnay/rust-toolchain stable composite
  • pypa/cibuildwheel v2.21.3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • rasmus-saks/release-a-changelog-action v1.2.0 composite
docs/requirements.txt pypi
  • ipython *
  • nbsphinx *
  • pydata-sphinx-theme *
  • pygments *
  • pygments-style-monokailight *
  • recommonmark *
  • semantic_version *
  • sphinx >=5.0
  • sphinx-design *
  • sphinxcontrib-jquery *