https://github.com/althonos/mini3di
A NumPy port of the foldseek code for encoding protein structures to 3di.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
A NumPy port of the foldseek code for encoding protein structures to 3di.
Basic Info
Statistics
- Stars: 68
- Watchers: 2
- Forks: 3
- Open Issues: 0
- Releases: 4
Topics
Metadata Files
README.md
🚀 mini3di 
A NumPy port of the foldseek code for encoding structures to 3di.
🗺️ Overview
foldseek is a method developed
by van Kempen et al.[1] for the fast and accurate search of
protein structures. In order to search proteins structures at a large scale,
it first encodes the 3D structure into sequences over a structural alphabet,
3di, which captures tertiary amino acid interactions.
mini3di is a pure-Python package to encode 3D structures of proteins into
the 3di alphabet, using the trained weights from the foldseek VQ-VAE model.
This library only depends on NumPy and is available for all modern Python versions (3.7+).
🔧 Installing
Install the mini3di package directly from PyPi
which hosts universal wheels that can be installed with pip:
console
$ pip install mini3di
💡 Example
mini3di provides a single Encoder class, which expects the 3D coordinates
of the Cα, Cβ, N and C atoms from each peptide residue. For
residues without Cβ (Gly), simply write the coordinates as math.nan.
Call the encode_atoms method to get a sequence of 3di states:
```python
from math import nan
import mini3di
encoder = mini3di.Encoder() states = encoder.encode_atoms( ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...], cb=[[ nan, nan, nan], [35.3, 53.3, 26.4], ...], n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...], c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...], ) ```
The states returned as output will be a NumPy array of state indices. To turn
it into a sequence, use the build_sequence method of the encoder:
python
sequence = encoder.build_sequence(states)
print(sequence)
The encoder can work directly with Biopython objects, if Biopython is available.
A helper method encode_chain is provided to extract the atom coordinates from
a Bio.PDB.Chain
and encoding them directly. For instance, to encode all the chains from a
PDB file:
```python
import pathlib
import mini3di from Bio.PDB import PDBParser
encoder = mini3di.Encoder() parser = PDBParser(QUIET=True) struct = parser.get_structure("8crb", pathlib.Path("tests", "data", "8crb.pdb"))
for chain in struct.getchains(): states = encoder.encodechain(chain) sequence = encoder.buildsequence(states) print(chain.getid(), sequence) ```
💭 Feedback
⚠️ Issue Tracker
Found a bug? Have an enhancement request? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the BSD 3-clause license.
It includes some code ported from foldseek, which is licensed under the
GNU General Public License v3.0,
and relicensed with the permission of the authors.
This project is in no way not affiliated, sponsored, or otherwise endorsed
by the original foldseek authors.
It was developed by Martin Larralde during his
PhD project at the European Molecular Biology Laboratory
in the Zeller team.
📚 References
- [1] Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. doi:10.1038/s41587-023-01773-0.
Owner
- Name: Martin Larralde
- Login: althonos
- Kind: user
- Location: Heidelberg, Germany
- Company: EMBL / LUMC, @zellerlab
- Twitter: althonos
- Repositories: 91
- Profile: https://github.com/althonos
PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.
GitHub Events
Total
- Issues event: 1
- Watch event: 39
- Issue comment event: 1
- Push event: 10
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 39
- Issue comment event: 1
- Push event: 10
- Fork event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Larralde | m****e@e****e | 50 |
| Patrick Kunzmann | p****y@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 1
- Average time to close issues: 2 months
- Average time to close pull requests: about 14 hours
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 4.0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 14 hours
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kdinashi (1)
- genomeboy (1)
- johnlees (1)
Pull Request Authors
- padix-key (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 415 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 4
- Total maintainers: 1
pypi.org: mini3di
A NumPy port of the foldseek code for encoding structures to 3di.
- Homepage: https://github.com/althonos/mini3di
- Documentation: https://mini3di.readthedocs.io/
- License: BSD-3-Clause
-
Latest release: 0.2.1
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v2 composite
- actions/checkout v1 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- codecov/codecov-action v3 composite
- pypa/gh-action-pypi-publish master composite
- rasmus-saks/release-a-changelog-action v1.0.1 composite
- build *
- coverage *
- importlib-resources >=1.3
- setuptools >=46.4
- biopython * test
- numpy * test