pyskani

PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.

https://github.com/althonos/pyskani

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov, nature.com
✓
Committers with academic emails
1 of 1 committers (100.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Keywords

ani average-nucleotide-identity bioinformatics metagenomes python-bindings python-library taxonomy

Last synced: 6 months ago · JSON representation ·

Repository

PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.

Basic Info

Host: GitHub
Owner: althonos
License: mit
Language: Rust
Default Branch: main
Homepage:
Size: 2.87 MB

Statistics

Stars: 27
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 3

Topics

ani average-nucleotide-identity bioinformatics metagenomes python-bindings python-library taxonomy

Created about 3 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog Contributing License Citation

🐍⛓️🧬 Pyskani

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

🗺️ Overview

skani[1] is a method developed by Jim Shaw and Yun William Yu for fast and robust metagenomic sequence comparison through sparse chaining. It improves on FastANI by being more accurate and much faster, while requiring less memory.

pyskani is a Python module, implemented using the PyO3 framework, that provides bindings to skani. It directly links to the skani code, which has the following advantages over CLI wrappers:

pre-built wheels: pyskani is distributed on PyPI and features pre-built wheels for common platforms, including x86-64 and Arm64 UNIX.
single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyskani as a dependency to your project, and stop worrying about the skani binary being present on the end-user machine.
sans I/O: Everything happens in memory, in Python objects you control, making it easier to pass your sequences to skani without having to write them to a temporary file.

This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to be used in a standard pipeline.

🔧 Installing

Pyskani can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well as the code required to compile from source with Rust: console $ pip install pyskani

In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.

🔖 Citation

If you found Pyskani useful, please cite our paper, as well as the original skani paper.

To cite Pyskani:

Martin Larralde, Georg Zeller, Laura M. Carroll. 2025. PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity. NAR Genomics and Bioinformatics 7(3):lqaf095. doi:10.1093/nargab/lqaf095.

To cite skani:

Jim Shaw, Yun William Yu. 2023. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods 20(11):1661-1665. doi:10.1038/s41592-023-02018-3.

💡 Examples

📝 Creating a database

A database can be created either in memory or using a folder on the machine filesystem to store the sketches. Independently of the storage, a database can be used immediately for querying, or saved to a different location.

Here is how to create a database into memory, using Biopython to load the record: python database = pyskani.Database() record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-EC590.fasta", "fasta") database.sketch("E. coli EC590", bytes(record.seq))

For draft genomes, simply pass more arguments to the sketch method, for which you can use the splat operator: python database = pyskani.Database() records = Bio.SeqIO.parse("vendor/skani/test_files/e.coli-o157.fasta", "fasta") sequences = (bytes(record.seq) for record in records) database.sketch("E. coli O157", *sequences)

🗒️ Loading a database

To load a database, either created from skani or pyskani, you can either load all sketches into memory, for fast querying: python database = pyskani.Database.load("path/to/sketches")

Or load the files lazily to save memory, at the cost of slower querying: python database = pyskani.Database.open("path/to/sketches")

🔎 Querying a database

Once a database has been created or loaded, use the Database.query method to compute ANI for some query genomes: python record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-K12.fasta", "fasta") hits = database.query("E. coli K12", bytes(record.seq))

🔎 See Also

Computing ANI for closed genomes? You may also be interested in pyfastani, a Python package for computing ANI using the FastANI method developed by Chirag Jain et al.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the MIT License.

The skani code was written by Jim Shaw and is distributed under the terms of the MIT License as well. See vendor/skani/LICENSE for more information. Source distributions of pyskani vendors additional sources under their own terms using the cargo vendor command.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original skani authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

[1] Jim Shaw and Yun William Yu. 'Fast and robust metagenomic sequence comparison through sparse chaining with skani' (2023). Nature Methods. doi:10.1038/s41592-023-02018-3. PMID:37735570.

Owner

Name: Martin Larralde
Login: althonos
Kind: user
Location: Heidelberg, Germany
Company: EMBL / LUMC, @zellerlab

Twitter: althonos
Repositories: 91
Profile: https://github.com/althonos

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Pyskani
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Martin
    family-names: Larralde
    email: martin.larralde@embl.de
    affiliation: Leiden University Medical Center
    orcid: 'https://orcid.org/0000-0002-3947-4444'
  - given-names: Georg
    family-names: Zeller
    affiliation: Leiden University Medical Center
    orcid: 'https://orcid.org/0000-0003-1429-7485'
  - given-names: Laura
    name-particle: M.
    family-names: Carroll
    affiliation: Umeå University
    orcid: 'https://orcid.org/0000-0002-3677-0192'
identifiers:
  - type: doi
    value: 10.1101/2025.02.13.638148
    description: bioRxiv preprint
  - type: doi
    value: 10.1093/nargab/lqaf095
    description: NAR Genomics & Bioinformatics paper
repository-code: 'https://github.com/althonos/pyskani'
abstract: >-
  The average nucleotide identity (ANI) metric has become
  the gold standard for prokaryotic species delineation in
  the genomics era. The most popular ANI algorithms are
  available as command-line tools and/or web applications,
  making it inconvenient or impossible to incorporate them
  into bioinformatic workflows, which utilize the popular
  Python programming language. Here, we present PyOrthoANI,
  PyFastANI, and Pyskani, Python libraries for three popular
  ANI computation methods. ANI values produced by
  PyOrthoANI, PyFastANI, and Pyskani are virtually identical
  to those produced by OrthoANI, FastANI, and skani,
  respectively. All three libraries integrate seamlessly
  with BioPython, making it easy and convenient to use,
  compare, and benchmark popular ANI algorithms within
  Python-based workflows.
keywords:
  - python
  - library
  - average nucleotide identity
  - ANI
license: MIT
preferred-citation:
  type: article
  authors:
  - given-names: Martin
    family-names: Larralde
    email: martin.larralde@embl.de
    affiliation: Leiden University Medical Center
    orcid: 'https://orcid.org/0000-0002-3947-4444'
  - given-names: Georg
    family-names: Zeller
    affiliation: Leiden University Medical Center
    orcid: 'https://orcid.org/0000-0003-1429-7485'
  - given-names: Laura
    name-particle: M.
    family-names: Carroll
    affiliation: Umeå University
    orcid: 'https://orcid.org/0000-0002-3677-0192'
  doi: "10.1093/nargab/lqaf095"
  journal: "NAR Genomics and Bioinformatics"
  volume: 7
  issue: 3
  title: "PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity"
  year: 2025
  month: 9

GitHub Events

Total

Release event: 2
Watch event: 7
Push event: 22
Pull request event: 1
Fork event: 1
Create event: 3

Last Year

Release event: 2
Watch event: 7
Push event: 22
Pull request event: 1
Fork event: 1
Create event: 3

Committers

Last synced: over 1 year ago

All Time

Total Commits: 74
Total Committers: 1
Avg Commits per committer: 74.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 12
Committers: 1
Avg Commits per committer: 12.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Martin Larralde	m**e@e**e	74

Committer Domains (Top 20 + Academic)

embl.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

lmc297 (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 1,714 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

pypi.org: pyskani

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

Homepage: https://github.com/althonos/pyskani/
Documentation: https://pyskani.readthedocs.io
License: MIT
Latest release: 0.2.0
published 6 months ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 1,714 Last month

Rankings

Dependent packages count: 6.6%

Downloads: 9.0%

Stargazers count: 16.1%

Average: 18.6%

Forks count: 30.5%

Dependent repos count: 30.6%

Maintainers (1)

althonos

Last synced: 6 months ago

Dependencies

.github/workflows/publish.yml actions

KSXGitHub/github-actions-deploy-aur v2.2.5 composite
actions-rs/toolchain v1 composite
actions/checkout v3 composite
actions/checkout v1 composite
actions/download-artifact v2 composite
actions/setup-python v2 composite
actions/upload-artifact v3 composite
actions/upload-artifact v2 composite
docker/setup-qemu-action v2 composite
pypa/cibuildwheel v2.11.3 composite
pypa/gh-action-pypi-publish master composite
rasmus-saks/release-a-changelog-action v1.0.1 composite

.github/workflows/test.yml actions

actions-rs/tarpaulin v0.1 composite
actions-rs/toolchain v1 composite
actions/checkout v3 composite
actions/checkout v1 composite
actions/setup-python v2 composite
codecov/codecov-action v1 composite

Cargo.lock cargo

207 dependencies

Cargo.toml cargo

.github/workflows/requirements.txt pypi

auditwheel *
build *
requests *
setuptools >=41.0
setuptools-rust *
wheel *

docs/requirements.txt pypi

ipykernel *
ipython *
nbsphinx *
pygments *
pygments-style-monokailight *
recommonmark *
semantic_version *
setuptools >=46.4
setuptools-rust *
sphinx *

setup.py pypi

pyproject.toml pypi

pyskani

Science Score: 77.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

🐍⛓️🧬 Pyskani

🗺️ Overview

🔧 Installing

🔖 Citation

💡 Examples

📝 Creating a database

🗒️ Loading a database

🔎 Querying a database

🔎 See Also

💭 Feedback

⚠️ Issue Tracker

🏗️ Contributing

⚖️ License

📚 References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pyskani

Rankings

Maintainers (1)

Dependencies