Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 8 committers (12.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Cython bindings and Python interface to HMMER3.
Basic Info
- Host: GitHub
- Owner: althonos
- License: mit
- Language: Cython
- Default Branch: master
- Homepage: https://pyhmmer.readthedocs.io
- Size: 10.6 MB
Statistics
- Stars: 147
- Watchers: 9
- Forks: 13
- Open Issues: 19
- Releases: 59
Topics
Metadata Files
README.md
🐍🟡♦️🟦 PyHMMER 
Cython bindings and Python interface to HMMER3.
🗺️ Overview
HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is developed and maintained by the Eddy/Rivas Laboratory at Harvard University.
pyhmmer is a Python package, implemented using the Cython
language, that provides bindings to HMMER3. It directly interacts with the
HMMER internals, which has the following advantages over CLI wrappers
(like hmmer-py):
- single dependency: If your software or your analysis pipeline is
distributed as a Python package, you can add
pyhmmeras a dependency to your project, and stop worrying about the HMMER binaries being properly setup on the end-user machine. - no intermediate files: Everything happens in memory, in Python objects
you have control on, making it easier to pass your inputs to HMMER without
needing to write them to a temporary file. Output retrieval is also done
in memory, via instances of the
pyhmmer.plan7.TopHitsclass. - no input formatting: The Easel object model is exposed in the
pyhmmer.easelmodule, and you have the possibility to build aDigitalSequenceobject yourself to pass to the HMMER pipeline. This is useful if your sequences are already loaded in memory, for instance because you obtained them from another Python library (such as Pyrodigal or Biopython). - no output parsing: HMMER3 is notorious for its numerous output files
and its fixed-width tabular output, which is hard to parse (even
Bio.SearchIO.HmmerIOis struggling on some sequences). - efficient: Using
pyhmmerto launchhmmsearchon sequences and HMMs in disk storage is typically as fast as directly using thehmmsearchbinary (see the Benchmarks section).pyhmmer.hmmer.hmmsearchuses a different parallelisation strategy compared to thehmmsearchbinary from HMMER, which can help getting the most of multiple CPUs when annotating smaller sequence databases.
This library is still a work-in-progress. It follows semantic-versioning,
so API changes will be documented, but past v0.10 the API has been more or
less stable. It should already pack enough features to run biological analyses
or workflows involving hmmsearch, hmmscan, nhmmer, phmmer, hmmbuild
and hmmalign.
🔧 Installing
pyhmmer can be installed from PyPI,
which hosts some pre-built CPython wheels for Linux and MacOS on x86-64 and Arm64, as well as the code required to compile from source with Cython:
console
$ pip install pyhmmer
Compilation for UNIX PowerPC is not tested in CI, but should work out of the box. Note than non-UNIX operating systems (such as Windows) are not supported by HMMER.
A Bioconda package is also available:
console
$ conda install -c bioconda pyhmmer
🔖 Citation
PyHMMER is scientific software, with a published paper in the Bioinformatics. Please cite both PyHMMER and HMMER if you are using it in an academic work, for instance as:
PyHMMER (Larralde et al., 2023), a Python library binding to HMMER (Eddy, 2011).
Detailed references are available on the Publications page of the online documentation.
📖 Documentation
A complete API reference can
be found in the online documentation, or
directly from the command line using
pydoc:
console
$ pydoc pyhmmer.easel
$ pydoc pyhmmer.plan7
💡 Example
Use pyhmmer to run hmmsearch to search for Type 2 PKS domains
(t2pks.hmm)
inside proteins extracted from the genome of Anaerococcus provencensis
(938293.PRJEB85.HG003687.faa).
This will produce an iterable over
TopHits that can be used for further sorting/querying in Python.
Processing happens in parallel using Python threads, and a TopHits
object is yielded for every HMM passed in the input iterable.
```python import pyhmmer
with pyhmmer.easel.SequenceFile("pyhmmer/tests/data/seqs/938293.PRJEB85.HG003687.faa", digital=True) as seqfile: sequences = seqfile.read_block()
with pyhmmer.plan7.HMMFile("pyhmmer/tests/data/hmms/txt/t2pks.hmm") as hmmfile: for hits in pyhmmer.hmmsearch(hmmfile, sequences, cpus=4): print(f"HMM {hits.query.name.decode()} found {len(hits)} hits in the target sequences") ```
Have a look at more in-depth examples such as building a HMM from an alignment, analysing the active site of a hit, or fetching marker genes from a genome in the Examples page of the online documentation.
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See CONTRIBUTING.md for more details.
⏱️ Benchmarks
Benchmarks were run on a i7-10710U CPU running @1.10GHz with 6 physical / 12
logical cores, using a FASTA file containing 4,489 protein sequences extracted
from the genome of Escherichia coli
(562.PRJEB4685)
and the version 33.1 of the Pfam HMM library containing
18,259 domains. Commands were run 3 times on a warm SSD. Plain lines show
the times for pressed HMMs, and dashed-lines the times for HMMs in text format.
Raw numbers can be found in the benches folder.
They suggest that phmmer should be run with the number of logical cores,
while hmmsearch should be run with the number of physical cores (or less).
A possible explanation for this observation would be that HMMER
platform-specific code requires too many SIMD
registers per thread to benefit from simultaneous multi-threading.
To read more about how PyHMMER achieves better parallelism than HMMER for many-to-many searches, have a look at the Performance page of the documentation.
🔍 See Also
Building a HMM from scratch? Then you may be interested in the pyfamsa
package, providing bindings to FAMSA,
a very fast multiple sequence aligner. In addition, you may want to trim alignments:
in that case, consider pytrimal, which
wraps trimAl 2.0.
If despite of all the advantages listed earlier, you would rather use HMMER
through its CLI, this package will not be of great help. You can instead check
the hmmer-py package developed
by Danilo Horta at the EMBL-EBI.
⚖️ License
This library is provided under the MIT License.
The HMMER3 and Easel code is available under the
BSD 3-clause license.
See vendor/hmmer/LICENSE and vendor/easel/LICENSE for more information.
This project is in no way affiliated, sponsored, or otherwise endorsed by the original HMMER authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
Owner
- Name: Martin Larralde
- Login: althonos
- Kind: user
- Location: Heidelberg, Germany
- Company: EMBL / LUMC, @zellerlab
- Twitter: althonos
- Repositories: 91
- Profile: https://github.com/althonos
PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
PyHMMER: A Python library binding to HMMER for efficient
sequence analysis
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Martin
family-names: Larralde
email: martin.larralde@embl.de
affiliation: European Molecular Biology Laboratory
orcid: 'https://orcid.org/0000-0002-3947-4444'
- given-names: Georg
family-names: Zeller
email: zeller@embl.de
affiliation: European Molecular Biology Laboratory
orcid: 'https://orcid.org/0000-0003-1429-7485'
identifiers:
- type: doi
value: 10.1093/bioinformatics/btad214
description: Bioinformatics Application Note
repository-code: 'https://github.com/althonos/pyhmmer'
url: 'https://pyhmmer.readthedocs.io'
repository: 'https://git.embl.de/larralde/pyhmmer'
abstract: >-
PyHMMER provides Python integration of the popular profile
Hidden Markov Model software HMMER via Cython bindings.
This allows annotation of protein sequences with profile
HMMs and building new ones directly with Python. PyHMMER
increases flexibility of use, allowing creating queries
directly from Python code, launching searches and
obtaining results without I/O, or accessing previously
unavailable statistics like uncorrected p-values. A new
parallelization model greatly improves performance when
running multithreaded searches, while producing the exact
same results as HMMER.
PyHMMER supports all modern Python versions (Python 3.6+)
and similar platforms as HMMER (x86 or PowerPC UNIX
systems). Pre-compiled packages are released via PyPI
(https://pypi.org/project/pyhmmer/) and Bioconda
(https://anaconda.org/bioconda/pyhmmer). The PyHMMER
source code is available under the terms of the
open-source MIT licence and hosted on GitHub
(https://github.com/althonos/pyhmmer); its documentation
is available on ReadTheDocs
(https://pyhmmer.readthedocs.io). Supplementary data are
available at Bioinformatics online.
keywords:
- bioinformatics
- hmm
license: MIT
preferred-citation:
type: article
authors:
- given-names: Martin
family-names: Larralde
email: martin.larralde@embl.de
affiliation: European Molecular Biology Laboratory
orcid: 'https://orcid.org/0000-0002-3947-4444'
- given-names: Georg
family-names: Zeller
email: zeller@embl.de
affiliation: European Molecular Biology Laboratory
orcid: 'https://orcid.org/0000-0003-1429-7485'
doi: 10.1093/bioinformatics/btad214
journal: "Bioinformatics"
month: 5
title: "PyHMMER: a Python library binding to HMMER for efficient sequence analysis"
issue: 5
volume: 39
year: 2023
GitHub Events
Total
- Create event: 4
- Release event: 3
- Issues event: 21
- Watch event: 20
- Delete event: 2
- Issue comment event: 34
- Push event: 57
- Pull request event: 2
- Fork event: 1
Last Year
- Create event: 4
- Release event: 3
- Issues event: 21
- Watch event: 20
- Delete event: 2
- Issue comment event: 34
- Push event: 57
- Pull request event: 2
- Fork event: 1
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Larralde | m****e@e****e | 810 |
| Martin Larralde | m****e@e****r | 295 |
| Zachary Kurtz | z****z@g****m | 6 |
| Zachary Kurtz | z****z@g****m | 6 |
| tmsincomb | t****b@g****m | 1 |
| Artem | 3****i | 1 |
| Humood Alanzi | 7****i | 1 |
| Valentyn Bezshapkin | 6****z | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 78
- Total pull requests: 18
- Average time to close issues: 2 months
- Average time to close pull requests: 26 days
- Total issue authors: 52
- Total pull request authors: 9
- Average comments per issue: 2.96
- Average comments per pull request: 2.56
- Merged pull requests: 16
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 18
- Pull requests: 4
- Average time to close issues: 2 months
- Average time to close pull requests: 11 days
- Issue authors: 14
- Pull request authors: 2
- Average comments per issue: 1.89
- Average comments per pull request: 0.75
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jpjarnoux (7)
- apcamargo (4)
- zdk123 (4)
- chtsai0105 (4)
- valentynbez (3)
- willhuynh11 (2)
- seanrjohnson (2)
- arajkovic (2)
- EvanKomp (2)
- Sann5 (2)
- erfanshekarriz (2)
- vagkaratzas (2)
- jolespin (2)
- oschwengers (2)
- alex-hh (1)
Pull Request Authors
- althonos (6)
- zdk123 (3)
- arajkovic (2)
- jolespin (2)
- imgbot[bot] (1)
- halanzi (1)
- rtviii (1)
- valentynbez (1)
- tmsincomb (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 3,695,876 last-month
- Total docker downloads: 32
-
Total dependent packages: 19
(may contain duplicates) -
Total dependent repositories: 6
(may contain duplicates) - Total versions: 69
- Total maintainers: 2
pypi.org: pyhmmer
Cython bindings and Python interface to HMMER3.
- Documentation: https://pyhmmer.readthedocs.io/en/stable/
- License: MIT License Copyright (c) 2020-2025 Martin Larralde <martin.larralde@embl.de> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
Latest release: 0.11.1
published 8 months ago
Rankings
Maintainers (1)
pypi.org: pyhmmer-arm
Cython bindings and Python interface to HMMER3.
- Homepage: https://github.com/althonos/pyhmmer
- Documentation: https://pyhmmer.readthedocs.io/en/stable/
- License: MIT
-
Latest release: 0.7.5
published over 2 years ago
Rankings
Maintainers (1)
spack.io: py-pyhmmer
HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is developed and maintained by the Eddy/Rivas Laboratory at Harvard University. pyhmmer is a Python package, implemented using the Cython language, that provides bindings to HMMER3.
- Homepage: https://github.com/althonos/pyhmmer
- License: []
-
Latest release: 0.10.15
published about 1 year ago
Dependencies
- auditwheel *
- codecov *
- coverage *
- cython *
- psutil *
- setuptools >=46.4
- wheel >=0.35.0
- cython *
- dna_features_viewer *
- ipykernel *
- ipython *
- nbsphinx *
- pygments *
- pygments-style-monokailight *
- recommonmark *
- semantic_version *
- setuptools >=46.4
- sphinx *
- taxopy *
- KSXGitHub/github-actions-deploy-aur v2.2.5 composite
- actions/cache v2 composite
- actions/checkout v1 composite
- actions/checkout v2 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- addnab/docker-run-action v2 composite
- pypa/gh-action-pypi-publish master composite
- rasmus-saks/release-a-changelog-action v1.0.1 composite
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- codecov/codecov-action v1 composite
- numpy * test