pyhmmer

Cython bindings and Python interface to HMMER3.

https://github.com/althonos/pyhmmer

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 8 committers (12.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Keywords

bioinformatics cython-library hidden-markov-model hmmer hmmer3 python-bindings python-library sequence-analysis

Keywords from Contributors

genomics metagenomics
Last synced: 4 months ago · JSON representation ·

Repository

Cython bindings and Python interface to HMMER3.

Basic Info
Statistics
  • Stars: 147
  • Watchers: 9
  • Forks: 13
  • Open Issues: 19
  • Releases: 59
Topics
bioinformatics cython-library hidden-markov-model hmmer hmmer3 python-bindings python-library sequence-analysis
Created about 5 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

🐍🟡♦️🟦 PyHMMER Stars

Cython bindings and Python interface to HMMER3.

Actions Coverage PyPI Bioconda AUR Wheel Python Versions Python Implementations License Source Mirror GitHub issues Docs Changelog Downloads Paper Citations

🗺️ Overview

HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is developed and maintained by the Eddy/Rivas Laboratory at Harvard University.

pyhmmer is a Python package, implemented using the Cython language, that provides bindings to HMMER3. It directly interacts with the HMMER internals, which has the following advantages over CLI wrappers (like hmmer-py):

  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyhmmer as a dependency to your project, and stop worrying about the HMMER binaries being properly setup on the end-user machine.
  • no intermediate files: Everything happens in memory, in Python objects you have control on, making it easier to pass your inputs to HMMER without needing to write them to a temporary file. Output retrieval is also done in memory, via instances of the pyhmmer.plan7.TopHits class.
  • no input formatting: The Easel object model is exposed in the pyhmmer.easel module, and you have the possibility to build a DigitalSequence object yourself to pass to the HMMER pipeline. This is useful if your sequences are already loaded in memory, for instance because you obtained them from another Python library (such as Pyrodigal or Biopython).
  • no output parsing: HMMER3 is notorious for its numerous output files and its fixed-width tabular output, which is hard to parse (even Bio.SearchIO.HmmerIO is struggling on some sequences).
  • efficient: Using pyhmmer to launch hmmsearch on sequences and HMMs in disk storage is typically as fast as directly using the hmmsearch binary (see the Benchmarks section). pyhmmer.hmmer.hmmsearch uses a different parallelisation strategy compared to the hmmsearch binary from HMMER, which can help getting the most of multiple CPUs when annotating smaller sequence databases.

This library is still a work-in-progress. It follows semantic-versioning, so API changes will be documented, but past v0.10 the API has been more or less stable. It should already pack enough features to run biological analyses or workflows involving hmmsearch, hmmscan, nhmmer, phmmer, hmmbuild and hmmalign.

🔧 Installing

pyhmmer can be installed from PyPI, which hosts some pre-built CPython wheels for Linux and MacOS on x86-64 and Arm64, as well as the code required to compile from source with Cython: console $ pip install pyhmmer

Compilation for UNIX PowerPC is not tested in CI, but should work out of the box. Note than non-UNIX operating systems (such as Windows) are not supported by HMMER.

A Bioconda package is also available: console $ conda install -c bioconda pyhmmer

🔖 Citation

PyHMMER is scientific software, with a published paper in the Bioinformatics. Please cite both PyHMMER and HMMER if you are using it in an academic work, for instance as:

PyHMMER (Larralde et al., 2023), a Python library binding to HMMER (Eddy, 2011).

Detailed references are available on the Publications page of the online documentation.

📖 Documentation

A complete API reference can be found in the online documentation, or directly from the command line using pydoc: console $ pydoc pyhmmer.easel $ pydoc pyhmmer.plan7

💡 Example

Use pyhmmer to run hmmsearch to search for Type 2 PKS domains (t2pks.hmm) inside proteins extracted from the genome of Anaerococcus provencensis (938293.PRJEB85.HG003687.faa). This will produce an iterable over TopHits that can be used for further sorting/querying in Python. Processing happens in parallel using Python threads, and a TopHits object is yielded for every HMM passed in the input iterable.

```python import pyhmmer

with pyhmmer.easel.SequenceFile("pyhmmer/tests/data/seqs/938293.PRJEB85.HG003687.faa", digital=True) as seqfile: sequences = seqfile.read_block()

with pyhmmer.plan7.HMMFile("pyhmmer/tests/data/hmms/txt/t2pks.hmm") as hmmfile: for hits in pyhmmer.hmmsearch(hmmfile, sequences, cpus=4): print(f"HMM {hits.query.name.decode()} found {len(hits)} hits in the target sequences") ```

Have a look at more in-depth examples such as building a HMM from an alignment, analysing the active site of a hit, or fetching marker genes from a genome in the Examples page of the online documentation.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⏱️ Benchmarks

Benchmarks were run on a i7-10710U CPU running @1.10GHz with 6 physical / 12 logical cores, using a FASTA file containing 4,489 protein sequences extracted from the genome of Escherichia coli (562.PRJEB4685) and the version 33.1 of the Pfam HMM library containing 18,259 domains. Commands were run 3 times on a warm SSD. Plain lines show the times for pressed HMMs, and dashed-lines the times for HMMs in text format.

Benchmarks

Raw numbers can be found in the benches folder. They suggest that phmmer should be run with the number of logical cores, while hmmsearch should be run with the number of physical cores (or less). A possible explanation for this observation would be that HMMER platform-specific code requires too many SIMD registers per thread to benefit from simultaneous multi-threading.

To read more about how PyHMMER achieves better parallelism than HMMER for many-to-many searches, have a look at the Performance page of the documentation.

🔍 See Also

Building a HMM from scratch? Then you may be interested in the pyfamsa package, providing bindings to FAMSA, a very fast multiple sequence aligner. In addition, you may want to trim alignments: in that case, consider pytrimal, which wraps trimAl 2.0.

If despite of all the advantages listed earlier, you would rather use HMMER through its CLI, this package will not be of great help. You can instead check the hmmer-py package developed by Danilo Horta at the EMBL-EBI.

⚖️ License

This library is provided under the MIT License. The HMMER3 and Easel code is available under the BSD 3-clause license. See vendor/hmmer/LICENSE and vendor/easel/LICENSE for more information.

This project is in no way affiliated, sponsored, or otherwise endorsed by the original HMMER authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Owner

  • Name: Martin Larralde
  • Login: althonos
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL / LUMC, @zellerlab

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  PyHMMER: A Python library binding to HMMER for efficient
  sequence analysis
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Martin
    family-names: Larralde
    email: martin.larralde@embl.de
    affiliation: European Molecular Biology Laboratory
    orcid: 'https://orcid.org/0000-0002-3947-4444'
  - given-names: Georg
    family-names: Zeller
    email: zeller@embl.de
    affiliation: European Molecular Biology Laboratory
    orcid: 'https://orcid.org/0000-0003-1429-7485'
identifiers:
  - type: doi
    value: 10.1093/bioinformatics/btad214
    description: Bioinformatics Application Note
repository-code: 'https://github.com/althonos/pyhmmer'
url: 'https://pyhmmer.readthedocs.io'
repository: 'https://git.embl.de/larralde/pyhmmer'
abstract: >-
  PyHMMER provides Python integration of the popular profile
  Hidden Markov Model software HMMER via Cython bindings.
  This allows annotation of protein sequences with profile
  HMMs and building new ones directly with Python. PyHMMER
  increases flexibility of use, allowing creating queries
  directly from Python code, launching searches and
  obtaining results without I/O, or accessing previously
  unavailable statistics like uncorrected p-values. A new
  parallelization model greatly improves performance when
  running multithreaded searches, while producing the exact
  same results as HMMER. 


  PyHMMER supports all modern Python versions (Python 3.6+)
  and similar platforms as HMMER (x86 or PowerPC UNIX
  systems). Pre-compiled packages are released via PyPI
  (https://pypi.org/project/pyhmmer/) and Bioconda
  (https://anaconda.org/bioconda/pyhmmer). The PyHMMER
  source code is available under the terms of the
  open-source MIT licence and hosted on GitHub
  (https://github.com/althonos/pyhmmer); its documentation
  is available on ReadTheDocs
  (https://pyhmmer.readthedocs.io). Supplementary data are
  available at Bioinformatics online.
keywords:
  - bioinformatics
  - hmm
license: MIT
preferred-citation:
  type: article
  authors:
  - given-names: Martin
    family-names: Larralde
    email: martin.larralde@embl.de
    affiliation: European Molecular Biology Laboratory
    orcid: 'https://orcid.org/0000-0002-3947-4444'
  - given-names: Georg
    family-names: Zeller
    email: zeller@embl.de
    affiliation: European Molecular Biology Laboratory
    orcid: 'https://orcid.org/0000-0003-1429-7485'
  doi: 10.1093/bioinformatics/btad214
  journal: "Bioinformatics"
  month: 5
  title: "PyHMMER: a Python library binding to HMMER for efficient sequence analysis"
  issue: 5
  volume: 39
  year: 2023

GitHub Events

Total
  • Create event: 4
  • Release event: 3
  • Issues event: 21
  • Watch event: 20
  • Delete event: 2
  • Issue comment event: 34
  • Push event: 57
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Create event: 4
  • Release event: 3
  • Issues event: 21
  • Watch event: 20
  • Delete event: 2
  • Issue comment event: 34
  • Push event: 57
  • Pull request event: 2
  • Fork event: 1

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 1,121
  • Total Committers: 8
  • Avg Commits per committer: 140.125
  • Development Distribution Score (DDS): 0.277
Past Year
  • Commits: 172
  • Committers: 6
  • Avg Commits per committer: 28.667
  • Development Distribution Score (DDS): 0.087
Top Committers
Name Email Commits
Martin Larralde m****e@e****e 810
Martin Larralde m****e@e****r 295
Zachary Kurtz z****z@g****m 6
Zachary Kurtz z****z@g****m 6
tmsincomb t****b@g****m 1
Artem 3****i 1
Humood Alanzi 7****i 1
Valentyn Bezshapkin 6****z 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 78
  • Total pull requests: 18
  • Average time to close issues: 2 months
  • Average time to close pull requests: 26 days
  • Total issue authors: 52
  • Total pull request authors: 9
  • Average comments per issue: 2.96
  • Average comments per pull request: 2.56
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 18
  • Pull requests: 4
  • Average time to close issues: 2 months
  • Average time to close pull requests: 11 days
  • Issue authors: 14
  • Pull request authors: 2
  • Average comments per issue: 1.89
  • Average comments per pull request: 0.75
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jpjarnoux (7)
  • apcamargo (4)
  • zdk123 (4)
  • chtsai0105 (4)
  • valentynbez (3)
  • willhuynh11 (2)
  • seanrjohnson (2)
  • arajkovic (2)
  • EvanKomp (2)
  • Sann5 (2)
  • erfanshekarriz (2)
  • vagkaratzas (2)
  • jolespin (2)
  • oschwengers (2)
  • alex-hh (1)
Pull Request Authors
  • althonos (6)
  • zdk123 (3)
  • arajkovic (2)
  • jolespin (2)
  • imgbot[bot] (1)
  • halanzi (1)
  • rtviii (1)
  • valentynbez (1)
  • tmsincomb (1)
Top Labels
Issue Labels
question (26) bug (21) enhancement (11) external (6) building (5) documentation (4) invalid (1) duplicate (1)
Pull Request Labels
bug (3) enhancement (3) documentation (2) building (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 3,695,876 last-month
  • Total docker downloads: 32
  • Total dependent packages: 19
    (may contain duplicates)
  • Total dependent repositories: 6
    (may contain duplicates)
  • Total versions: 69
  • Total maintainers: 2
pypi.org: pyhmmer

Cython bindings and Python interface to HMMER3.

  • Documentation: https://pyhmmer.readthedocs.io/en/stable/
  • License: MIT License Copyright (c) 2020-2025 Martin Larralde <martin.larralde@embl.de> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 0.11.1
    published 8 months ago
  • Versions: 65
  • Dependent Packages: 18
  • Dependent Repositories: 6
  • Downloads: 3,695,860 Last month
  • Docker Downloads: 32
Rankings
Dependent packages count: 0.8%
Downloads: 2.5%
Docker downloads count: 4.3%
Average: 5.2%
Dependent repos count: 6.0%
Stargazers count: 7.3%
Forks count: 10.5%
Maintainers (1)
Last synced: 4 months ago
pypi.org: pyhmmer-arm

Cython bindings and Python interface to HMMER3.

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 16 Last month
Rankings
Dependent packages count: 7.1%
Stargazers count: 8.3%
Forks count: 12.1%
Average: 14.9%
Dependent repos count: 32.0%
Maintainers (1)
Last synced: 4 months ago
spack.io: py-pyhmmer

HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is developed and maintained by the Eddy/Rivas Laboratory at Harvard University. pyhmmer is a Python package, implemented using the Cython language, that provides bindings to HMMER3.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
Last synced: 4 months ago

Dependencies

.github/workflows/requirements.txt pypi
  • auditwheel *
  • codecov *
  • coverage *
  • cython *
  • psutil *
  • setuptools >=46.4
  • wheel >=0.35.0
docs/requirements.txt pypi
  • cython *
  • dna_features_viewer *
  • ipykernel *
  • ipython *
  • nbsphinx *
  • pygments *
  • pygments-style-monokailight *
  • recommonmark *
  • semantic_version *
  • setuptools >=46.4
  • sphinx *
  • taxopy *
.github/workflows/package.yml actions
  • KSXGitHub/github-actions-deploy-aur v2.2.5 composite
  • actions/cache v2 composite
  • actions/checkout v1 composite
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
  • addnab/docker-run-action v2 composite
  • pypa/gh-action-pypi-publish master composite
  • rasmus-saks/release-a-changelog-action v1.0.1 composite
.github/workflows/test.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
pyhmmer/tests/requirements.txt pypi
  • numpy * test
setup.py pypi