https://github.com/althonos/lightmotif
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
Science Score: 59.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 18 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary
Keywords
Repository
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
Basic Info
Statistics
- Stars: 53
- Watchers: 5
- Forks: 2
- Open Issues: 2
- Releases: 15
Topics
Metadata Files
README.md
🎼🧬 lightmotif 
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
🗺️ Overview
Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:
The lightmotif library provides a Rust crate to run very efficient
searches for a motif encoded in a position weight matrix. The position
scanning combines several techniques to allow high-throughput processing
of sequences:
- Compile-time definition of alphabets and matrix dimensions.
- Sequence symbol encoding for fast table look-ups, as implemented in HMMER[1] or MEME[2]
- Striped sequence matrices to process several positions in parallel, inspired by Michael Farrar[3].
- Vectorized matrix row look-up using
permuteinstructions of AVX2. - High-throughput Gibbs sampler[4] implementation in oops
and zoops modes, featuring deterministic results using randomness from
the
randcrate.
Other crates from the ecosystem provide additional features if needed:
lightmotif-iois a crate with parser implementations for various count matrix, frequency matrix and position-specific scoring matrix formats such as TRANSFAC or JASPAR.lightmotif-tfmpvalueis an exact reimplementation of the TFM-PVALUE[5] algorithm for converting between a score and a p-value for a given scoring matrix.
This is the Rust version, there is a Python package available as well.
💡 Example
```rust use lightmotif::*; use lightmotif::abc::Nucleotide;
// Create a count matrix from an iterable of motif sequences
let counts = CountMatrix::
// Create a PSSM with 0.1 pseudocounts and uniform background frequencies. let pssm = counts.tofreq(0.1).toscoring(None);
// Use the pipeline to encode the target sequence into a striped matrix let seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"; let encoded = EncodedSequence::encode(seq).unwrap(); let mut striped = encoded.to_striped();
// Organize layout of striped matrix to allow scoring with PSSM. striped.configure(&pssm);
// Compute scores for every position of the matrix. let scores = pssm.score(&striped);
// Scores can be extracted into a Vec
// Find the highest scoring position. let best = scores.argmax().unwrap(); assert_eq!(best, 18);
// Find the positions above an absolute score threshold. let indices = scores.threshold(10.0); assert_eq!(indices, []); ``` This example uses a dynamic dispatch pipeline, which selects the best available backend (AVX2, SSE2, NEON, or a generic implementation) depending on the local platform.
⏱️ Benchmarks
Both benchmarks use the MX000001
motif from PRODORIC[5], and the
complete genome of an
Escherichia coli K12 strain.
Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native.
Score every position of the genome with the motif weight matrix:
console test bench_avx2 ... bench: 4,510,794 ns/iter (+/- 9,570) = 1029 MB/s test bench_sse2 ... bench: 26,773,537 ns/iter (+/- 57,891) = 173 MB/s test bench_generic ... bench: 317,731,004 ns/iter (+/- 2,567,370) = 14 MB/sFind the highest-scoring position for a motif in a 10kb sequence (compared to the PSSM algorithm implemented in
bio::pattern_matching::pssm):console test bench_avx2 ... bench: 12,797 ns/iter (+/- 380) = 781 MB/s test bench_sse2 ... bench: 62,597 ns/iter (+/- 43) = 159 MB/s test bench_generic ... bench: 671,900 ns/iter (+/- 1,150) = 14 MB/s test bench_bio ... bench: 1,193,911 ns/iter (+/- 2,519) = 8 MB/s
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the open-source MIT license.
This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
📚 References
- [1] Eddy, Sean R. ‘Accelerated Profile HMM Searches’. PLOS Computational Biology 7, no. 10 (20 October 2011): e1002195. doi:10.1371/journal.pcbi.1002195.
- [2] Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. ‘FIMO: Scanning for Occurrences of a given Motif’. Bioinformatics 27, no. 7 (1 April 2011): 1017–18. doi:10.1093/bioinformatics/btr064.
- [3] Farrar, Michael. ‘Striped Smith–Waterman Speeds Database Searches Six Times over Other SIMD Implementations’. Bioinformatics 23, no. 2 (15 January 2007): 156–61. doi:10.1093/bioinformatics/btl582.
- [4] Lawrence, Charles E., Stephen F. Altschul, Mark S. Boguski, Jun S. Liu, Andrew F. Neuwald, and John C. Wootton. ’Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment’. Science. 1993 Oct 8;262(5131):208-14. doi:10.1126/science.8211139.
- [5] Touzet, Hélène, and Jean-Stéphane Varré. ‘Efficient and Accurate P-Value Computation for Position Weight Matrices’. Algorithms for Molecular Biology 2, no. 1 (2007): 1–12. doi:10.1186/1748-7188-2-15.
- [6] Dudek, Christian-Alexander, and Dieter Jahn. ‘PRODORIC: State-of-the-Art Database of Prokaryotic Gene Regulation’. Nucleic Acids Research 50, no. D1 (7 January 2022): D295–302. doi:10.1093/nar/gkab1110.
Owner
- Name: Martin Larralde
- Login: althonos
- Kind: user
- Location: Heidelberg, Germany
- Company: EMBL / LUMC, @zellerlab
- Twitter: althonos
- Repositories: 91
- Profile: https://github.com/althonos
PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.
GitHub Events
Total
- Issues event: 3
- Watch event: 14
- Delete event: 2
- Issue comment event: 6
- Push event: 54
- Create event: 4
Last Year
- Issues event: 3
- Watch event: 14
- Delete event: 2
- Issue comment event: 6
- Push event: 54
- Create event: 4
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Larralde | m****e@e****e | 540 |
| Dirk Stolle | s****v@w****e | 3 |
| Jubilee | 4****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 4
- Average time to close issues: 3 months
- Average time to close pull requests: about 1 month
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 3.25
- Average comments per pull request: 2.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 0
- Average time to close issues: 3 months
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 3.25
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hannes-brt (2)
- Ebedthan (1)
- Thernn88 (1)
Pull Request Authors
- striezel (3)
- workingjubilee (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 6
-
Total downloads:
- pypi 2,266 last-month
- cargo 69,715 total
-
Total dependent packages: 7
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 72
- Total maintainers: 2
pypi.org: lightmotif
PyO3 bindings and Python interface to lightmotif, a library for platform-accelerated biological motif scanning using position weight matrices.
- Homepage: https://github.com/althonos/lightmotif/tree/main/lightmotif-py
- Documentation: https://lightmotif.readthedocs.io/
- License: MIT OR GPL-3.0-or-later
-
Latest release: 0.10.0
published 9 months ago
Rankings
Maintainers (1)
crates.io: lightmotif
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
- Homepage: https://github.com/althonos/lightmotif
- Documentation: https://docs.rs/lightmotif/
- License: MIT
-
Latest release: 0.10.0
published 9 months ago
Rankings
Maintainers (1)
crates.io: lightmotif-tfmpvalue
Rust reimplementation of TFMPvalue for the lightmotif crate.
- Homepage: https://github.com/althonos/lightmotif
- Documentation: https://docs.rs/lightmotif-tfmpvalue/
- License: GPL-3.0-or-later
-
Latest release: 0.10.0
published 9 months ago
Rankings
Maintainers (1)
crates.io: lightmotif-py
PyO3 bindings and Python interface to the lightmotif crate.
- Homepage: https://github.com/althonos/lightmotif/tree/main/lightmotif-py
- Documentation: https://docs.rs/lightmotif-py/
- License: MIT OR GPL-3.0-or-later
-
Latest release: 0.10.0
published 9 months ago
Rankings
Maintainers (1)
crates.io: lightmotif-transfac
TRANSFAC parser implementation for the lightmotif crate.
- Homepage: https://github.com/althonos/lightmotif/tree/main/lightmotif-transfac
- Documentation: https://docs.rs/lightmotif-transfac/
- License: MIT
-
Latest release: 0.6.0
published about 2 years ago
Rankings
Maintainers (1)
crates.io: lightmotif-io
Parser implementations of several formats for the lightmotif crate.
- Homepage: https://github.com/althonos/lightmotif/tree/main/lightmotif-io
- Documentation: https://docs.rs/lightmotif-io/
- License: MIT
-
Latest release: 0.10.0
published 9 months ago
Rankings
Maintainers (1)
Dependencies
- actions-rs/toolchain v1 composite
- actions/checkout v3 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v3 composite
- actions/upload-artifact v2 composite
- docker/setup-qemu-action v2 composite
- pypa/cibuildwheel v2.11.3 composite
- pypa/gh-action-pypi-publish master composite
- rasmus-saks/release-a-changelog-action v1.0.1 composite
- actions-rs/cargo v1 composite
- actions-rs/tarpaulin v0.1 composite
- actions-rs/toolchain v1 composite
- actions/cache v2 composite
- actions/checkout v3 composite
- codecov/codecov-action v3 composite
- rasmus-saks/release-a-changelog-action v1.0.1 composite
- bio 1.1.0 development
- ipython *
- nbsphinx *
- pygments *
- pygments-style-monokailight *
- recommonmark *
- semantic_version *
- setuptools >=46.4