https://github.com/althonos/lightmotif

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

https://github.com/althonos/lightmotif

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 18 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary

Keywords

bioinformatics genomics pssm rust-library sequence-analysis sequence-motif simd
Last synced: 5 months ago · JSON representation

Repository

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

Basic Info
  • Host: GitHub
  • Owner: althonos
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Homepage:
  • Size: 2.43 MB
Statistics
  • Stars: 53
  • Watchers: 5
  • Forks: 2
  • Open Issues: 2
  • Releases: 15
Topics
bioinformatics genomics pssm rust-library sequence-analysis sequence-motif simd
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.md

🎼🧬 lightmotif Star me

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

Actions Coverage License Crate Docs Source Mirror GitHub issues Changelog

🗺️ Overview

Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:

MX000274.svg

The lightmotif library provides a Rust crate to run very efficient searches for a motif encoded in a position weight matrix. The position scanning combines several techniques to allow high-throughput processing of sequences:

  • Compile-time definition of alphabets and matrix dimensions.
  • Sequence symbol encoding for fast table look-ups, as implemented in HMMER[1] or MEME[2]
  • Striped sequence matrices to process several positions in parallel, inspired by Michael Farrar[3].
  • Vectorized matrix row look-up using permute instructions of AVX2.
  • High-throughput Gibbs sampler[4] implementation in oops and zoops modes, featuring deterministic results using randomness from the rand crate.

Other crates from the ecosystem provide additional features if needed:

  • lightmotif-io is a crate with parser implementations for various count matrix, frequency matrix and position-specific scoring matrix formats such as TRANSFAC or JASPAR.
  • lightmotif-tfmpvalue is an exact reimplementation of the TFM-PVALUE[5] algorithm for converting between a score and a p-value for a given scoring matrix.

This is the Rust version, there is a Python package available as well.

💡 Example

```rust use lightmotif::*; use lightmotif::abc::Nucleotide;

// Create a count matrix from an iterable of motif sequences let counts = CountMatrix::::fromsequences( ["GTTGACCTTATCAAC", "GTTGATCCAGTCAAC"] .intoiter() .map(|s| EncodedSequence::encode(s).unwrap()), ) .unwrap();

// Create a PSSM with 0.1 pseudocounts and uniform background frequencies. let pssm = counts.tofreq(0.1).toscoring(None);

// Use the pipeline to encode the target sequence into a striped matrix let seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"; let encoded = EncodedSequence::encode(seq).unwrap(); let mut striped = encoded.to_striped();

// Organize layout of striped matrix to allow scoring with PSSM. striped.configure(&pssm);

// Compute scores for every position of the matrix. let scores = pssm.score(&striped);

// Scores can be extracted into a Vec, or indexed directly. let v = scores.unstripe(); asserteq!(scores[0], -23.07094); asserteq!(v[0], -23.07094);

// Find the highest scoring position. let best = scores.argmax().unwrap(); assert_eq!(best, 18);

// Find the positions above an absolute score threshold. let indices = scores.threshold(10.0); assert_eq!(indices, []); ``` This example uses a dynamic dispatch pipeline, which selects the best available backend (AVX2, SSE2, NEON, or a generic implementation) depending on the local platform.

⏱️ Benchmarks

Both benchmarks use the MX000001 motif from PRODORIC[5], and the complete genome of an Escherichia coli K12 strain. Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native.

  • Score every position of the genome with the motif weight matrix: console test bench_avx2 ... bench: 4,510,794 ns/iter (+/- 9,570) = 1029 MB/s test bench_sse2 ... bench: 26,773,537 ns/iter (+/- 57,891) = 173 MB/s test bench_generic ... bench: 317,731,004 ns/iter (+/- 2,567,370) = 14 MB/s

  • Find the highest-scoring position for a motif in a 10kb sequence (compared to the PSSM algorithm implemented in bio::pattern_matching::pssm): console test bench_avx2 ... bench: 12,797 ns/iter (+/- 380) = 781 MB/s test bench_sse2 ... bench: 62,597 ns/iter (+/- 43) = 159 MB/s test bench_generic ... bench: 671,900 ns/iter (+/- 1,150) = 14 MB/s test bench_bio ... bench: 1,193,911 ns/iter (+/- 2,519) = 8 MB/s

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

  • [1] Eddy, Sean R. ‘Accelerated Profile HMM Searches’. PLOS Computational Biology 7, no. 10 (20 October 2011): e1002195. doi:10.1371/journal.pcbi.1002195.
  • [2] Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. ‘FIMO: Scanning for Occurrences of a given Motif’. Bioinformatics 27, no. 7 (1 April 2011): 1017–18. doi:10.1093/bioinformatics/btr064.
  • [3] Farrar, Michael. ‘Striped Smith–Waterman Speeds Database Searches Six Times over Other SIMD Implementations’. Bioinformatics 23, no. 2 (15 January 2007): 156–61. doi:10.1093/bioinformatics/btl582.
  • [4] Lawrence, Charles E., Stephen F. Altschul, Mark S. Boguski, Jun S. Liu, Andrew F. Neuwald, and John C. Wootton. ’Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment’. Science. 1993 Oct 8;262(5131):208-14. doi:10.1126/science.8211139.
  • [5] Touzet, Hélène, and Jean-Stéphane Varré. ‘Efficient and Accurate P-Value Computation for Position Weight Matrices’. Algorithms for Molecular Biology 2, no. 1 (2007): 1–12. doi:10.1186/1748-7188-2-15.
  • [6] Dudek, Christian-Alexander, and Dieter Jahn. ‘PRODORIC: State-of-the-Art Database of Prokaryotic Gene Regulation’. Nucleic Acids Research 50, no. D1 (7 January 2022): D295–302. doi:10.1093/nar/gkab1110.

Owner

  • Name: Martin Larralde
  • Login: althonos
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL / LUMC, @zellerlab

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

GitHub Events

Total
  • Issues event: 3
  • Watch event: 14
  • Delete event: 2
  • Issue comment event: 6
  • Push event: 54
  • Create event: 4
Last Year
  • Issues event: 3
  • Watch event: 14
  • Delete event: 2
  • Issue comment event: 6
  • Push event: 54
  • Create event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 544
  • Total Committers: 3
  • Avg Commits per committer: 181.333
  • Development Distribution Score (DDS): 0.007
Past Year
  • Commits: 233
  • Committers: 1
  • Avg Commits per committer: 233.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Martin Larralde m****e@e****e 540
Dirk Stolle s****v@w****e 3
Jubilee 4****e 1
Committer Domains (Top 20 + Academic)
embl.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 4
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 3.25
  • Average comments per pull request: 2.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 3.25
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hannes-brt (2)
  • Ebedthan (1)
  • Thernn88 (1)
Pull Request Authors
  • striezel (3)
  • workingjubilee (1)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels

Packages

  • Total packages: 6
  • Total downloads:
    • pypi 2,266 last-month
    • cargo 69,715 total
  • Total dependent packages: 7
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 72
  • Total maintainers: 2
pypi.org: lightmotif

PyO3 bindings and Python interface to lightmotif, a library for platform-accelerated biological motif scanning using position weight matrices.

  • Versions: 14
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,266 Last month
Rankings
Dependent packages count: 7.2%
Stargazers count: 15.6%
Average: 20.3%
Forks count: 23.1%
Dependent repos count: 35.4%
Maintainers (1)
Last synced: 6 months ago
crates.io: lightmotif

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

  • Versions: 15
  • Dependent Packages: 5
  • Dependent Repositories: 0
  • Downloads: 23,688 Total
Rankings
Stargazers count: 25.2%
Dependent repos count: 28.8%
Forks count: 30.9%
Dependent packages count: 33.8%
Average: 42.5%
Downloads: 93.7%
Maintainers (1)
Last synced: 6 months ago
crates.io: lightmotif-tfmpvalue

Rust reimplementation of TFMPvalue for the lightmotif crate.

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 11,776 Total
Rankings
Stargazers count: 23.1%
Dependent repos count: 28.2%
Forks count: 30.7%
Dependent packages count: 32.7%
Average: 42.7%
Downloads: 98.6%
Maintainers (1)
Last synced: 6 months ago
crates.io: lightmotif-py

PyO3 bindings and Python interface to the lightmotif crate.

  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 14,811 Total
Rankings
Stargazers count: 25.2%
Dependent repos count: 28.8%
Forks count: 30.9%
Dependent packages count: 33.8%
Average: 42.7%
Downloads: 94.7%
Maintainers (1)
Last synced: 6 months ago
crates.io: lightmotif-transfac

TRANSFAC parser implementation for the lightmotif crate.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 8,675 Total
Rankings
Stargazers count: 25.2%
Dependent repos count: 28.8%
Forks count: 30.9%
Dependent packages count: 33.8%
Average: 42.7%
Downloads: 95.0%
Maintainers (1)
Last synced: 6 months ago
crates.io: lightmotif-io

Parser implementations of several formats for the lightmotif crate.

  • Versions: 8
  • Dependent Packages: 2
  • Dependent Repositories: 0
  • Downloads: 10,765 Total
Rankings
Dependent repos count: 30.7%
Dependent packages count: 36.1%
Average: 54.9%
Downloads: 98.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/python.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v3 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v3 composite
  • actions/upload-artifact v2 composite
  • docker/setup-qemu-action v2 composite
  • pypa/cibuildwheel v2.11.3 composite
  • pypa/gh-action-pypi-publish master composite
  • rasmus-saks/release-a-changelog-action v1.0.1 composite
.github/workflows/rust.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/tarpaulin v0.1 composite
  • actions-rs/toolchain v1 composite
  • actions/cache v2 composite
  • actions/checkout v3 composite
  • codecov/codecov-action v3 composite
  • rasmus-saks/release-a-changelog-action v1.0.1 composite
lightmotif-bench/Cargo.toml cargo
  • bio 1.1.0 development
Cargo.toml cargo
lightmotif/Cargo.toml cargo
lightmotif-io/Cargo.toml cargo
lightmotif-py/Cargo.toml cargo
lightmotif-tfmpvalue/Cargo.toml cargo
lightmotif-transfac/Cargo.toml cargo
lightmotif-py/docs/requirements.txt pypi
  • ipython *
  • nbsphinx *
  • pygments *
  • pygments-style-monokailight *
  • recommonmark *
  • semantic_version *
  • setuptools >=46.4
pyproject.toml pypi
setup.py pypi