pyskani
PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov, nature.com -
✓Committers with academic emails
1 of 1 committers (100.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Repository
PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
Basic Info
Statistics
- Stars: 27
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 3
Topics
Metadata Files
README.md
🐍⛓️🧬 Pyskani 
PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.
🗺️ Overview
skani[1] is a method developed by Jim Shaw
and Yun William Yu for fast and robust
metagenomic sequence comparison through sparse chaining. It improves on
FastANI by being more accurate and much faster, while requiring less memory.
pyskani is a Python module, implemented using the PyO3
framework, that provides bindings to skani. It directly links to the
skani code, which has the following advantages over CLI wrappers:
- pre-built wheels:
pyskaniis distributed on PyPI and features pre-built wheels for common platforms, including x86-64 and Arm64 UNIX. - single dependency: If your software or your analysis pipeline is
distributed as a Python package, you can add
pyskanias a dependency to your project, and stop worrying about theskanibinary being present on the end-user machine. - sans I/O: Everything happens in memory, in Python objects you control,
making it easier to pass your sequences to
skaniwithout having to write them to a temporary file.
This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to be used in a standard pipeline.
🔧 Installing
Pyskani can be installed directly from PyPI,
which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well
as the code required to compile from source with Rust:
console
$ pip install pyskani
<!-- Otherwise, pyskani is also available as a Bioconda
package:
console
$ conda install -c bioconda pyskani
-->
In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.
🔖 Citation
If you found Pyskani useful, please cite our paper, as well as the original skani paper.
To cite Pyskani:
Martin Larralde, Georg Zeller, Laura M. Carroll. 2025. PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity. NAR Genomics and Bioinformatics 7(3):lqaf095. doi:10.1093/nargab/lqaf095.
To cite skani:
Jim Shaw, Yun William Yu. 2023. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods 20(11):1661-1665. doi:10.1038/s41592-023-02018-3.
💡 Examples
📝 Creating a database
A database can be created either in memory or using a folder on the machine filesystem to store the sketches. Independently of the storage, a database can be used immediately for querying, or saved to a different location.
Here is how to create a database into memory,
using Biopython
to load the record:
python
database = pyskani.Database()
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-EC590.fasta", "fasta")
database.sketch("E. coli EC590", bytes(record.seq))
For draft genomes, simply pass more arguments to the sketch method, for
which you can use the splat operator:
python
database = pyskani.Database()
records = Bio.SeqIO.parse("vendor/skani/test_files/e.coli-o157.fasta", "fasta")
sequences = (bytes(record.seq) for record in records)
database.sketch("E. coli O157", *sequences)
🗒️ Loading a database
To load a database, either created from skani or pyskani, you can either
load all sketches into memory, for fast querying:
python
database = pyskani.Database.load("path/to/sketches")
Or load the files lazily to save memory, at the cost of slower querying:
python
database = pyskani.Database.open("path/to/sketches")
🔎 Querying a database
Once a database has been created or loaded, use the Database.query method
to compute ANI for some query genomes:
python
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-K12.fasta", "fasta")
hits = database.query("E. coli K12", bytes(record.seq))
🔎 See Also
Computing ANI for closed genomes? You may also be interested in
pyfastani, a Python package for computing ANI
using the FastANI method
developed by Chirag Jain et al.
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
⚖️ License
This library is provided under the MIT License.
The skani code was written by Jim Shaw
and is distributed under the terms of the MIT License
as well. See vendor/skani/LICENSE for more information. Source distributions
of pyskani vendors additional sources under their own terms using
the cargo vendor
command.
This project is in no way not affiliated, sponsored, or otherwise endorsed
by the original skani authors.
It was developed by Martin Larralde during his
PhD project at the European Molecular Biology Laboratory
in the Zeller team.
📚 References
- [1] Jim Shaw and Yun William Yu. 'Fast and robust metagenomic sequence comparison through sparse chaining with skani' (2023). Nature Methods. doi:10.1038/s41592-023-02018-3. PMID:37735570.
Owner
- Name: Martin Larralde
- Login: althonos
- Kind: user
- Location: Heidelberg, Germany
- Company: EMBL / LUMC, @zellerlab
- Twitter: althonos
- Repositories: 91
- Profile: https://github.com/althonos
PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Pyskani
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Martin
family-names: Larralde
email: martin.larralde@embl.de
affiliation: Leiden University Medical Center
orcid: 'https://orcid.org/0000-0002-3947-4444'
- given-names: Georg
family-names: Zeller
affiliation: Leiden University Medical Center
orcid: 'https://orcid.org/0000-0003-1429-7485'
- given-names: Laura
name-particle: M.
family-names: Carroll
affiliation: Umeå University
orcid: 'https://orcid.org/0000-0002-3677-0192'
identifiers:
- type: doi
value: 10.1101/2025.02.13.638148
description: bioRxiv preprint
- type: doi
value: 10.1093/nargab/lqaf095
description: NAR Genomics & Bioinformatics paper
repository-code: 'https://github.com/althonos/pyskani'
abstract: >-
The average nucleotide identity (ANI) metric has become
the gold standard for prokaryotic species delineation in
the genomics era. The most popular ANI algorithms are
available as command-line tools and/or web applications,
making it inconvenient or impossible to incorporate them
into bioinformatic workflows, which utilize the popular
Python programming language. Here, we present PyOrthoANI,
PyFastANI, and Pyskani, Python libraries for three popular
ANI computation methods. ANI values produced by
PyOrthoANI, PyFastANI, and Pyskani are virtually identical
to those produced by OrthoANI, FastANI, and skani,
respectively. All three libraries integrate seamlessly
with BioPython, making it easy and convenient to use,
compare, and benchmark popular ANI algorithms within
Python-based workflows.
keywords:
- python
- library
- average nucleotide identity
- ANI
license: MIT
preferred-citation:
type: article
authors:
- given-names: Martin
family-names: Larralde
email: martin.larralde@embl.de
affiliation: Leiden University Medical Center
orcid: 'https://orcid.org/0000-0002-3947-4444'
- given-names: Georg
family-names: Zeller
affiliation: Leiden University Medical Center
orcid: 'https://orcid.org/0000-0003-1429-7485'
- given-names: Laura
name-particle: M.
family-names: Carroll
affiliation: Umeå University
orcid: 'https://orcid.org/0000-0002-3677-0192'
doi: "10.1093/nargab/lqaf095"
journal: "NAR Genomics and Bioinformatics"
volume: 7
issue: 3
title: "PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity"
year: 2025
month: 9
GitHub Events
Total
- Release event: 2
- Watch event: 7
- Push event: 22
- Pull request event: 1
- Fork event: 1
- Create event: 3
Last Year
- Release event: 2
- Watch event: 7
- Push event: 22
- Pull request event: 1
- Fork event: 1
- Create event: 3
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Larralde | m****e@e****e | 74 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- lmc297 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 1,714 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
pypi.org: pyskani
PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.
- Homepage: https://github.com/althonos/pyskani/
- Documentation: https://pyskani.readthedocs.io
- License: MIT
-
Latest release: 0.2.0
published 4 months ago
Rankings
Maintainers (1)
Dependencies
- KSXGitHub/github-actions-deploy-aur v2.2.5 composite
- actions-rs/toolchain v1 composite
- actions/checkout v3 composite
- actions/checkout v1 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v3 composite
- actions/upload-artifact v2 composite
- docker/setup-qemu-action v2 composite
- pypa/cibuildwheel v2.11.3 composite
- pypa/gh-action-pypi-publish master composite
- rasmus-saks/release-a-changelog-action v1.0.1 composite
- actions-rs/tarpaulin v0.1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v3 composite
- actions/checkout v1 composite
- actions/setup-python v2 composite
- codecov/codecov-action v1 composite
- 207 dependencies
- auditwheel *
- build *
- requests *
- setuptools >=41.0
- setuptools-rust *
- wheel *
- ipykernel *
- ipython *
- nbsphinx *
- pygments *
- pygments-style-monokailight *
- recommonmark *
- semantic_version *
- setuptools >=46.4
- setuptools-rust *
- sphinx *