https://github.com/althonos/diced

A Rust reimplementation of the MinCED method for identifying CRISPRs in full or assembled genomes.

https://github.com/althonos/diced

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary

Keywords

bioinformatics crispr genomics python-bindings python-library rust-library
Last synced: 5 months ago · JSON representation

Repository

A Rust reimplementation of the MinCED method for identifying CRISPRs in full or assembled genomes.

Basic Info
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 3
Topics
bioinformatics crispr genomics python-bindings python-library rust-library
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.md

🔪🧅 Diced Star me

A Rust re-implementation of the MinCED algorithm to Detect Instances of CRISPRs in Environmental Data.

Actions Coverage License Crate Docs Source Mirror GitHub issues Changelog

🗺️ Overview

MinCED is a method developed by Connor T. Skennerton to identify Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in isolate and metagenomic-assembled genomes. It was derived from the CRISPR Recognition Tool [1]. It uses a fast scanning algorithm to identify candidate repeats, combined with an extension step to find maximally spanning regions of the genome that feature a CRISPR repeat.

Diced is a Rust reimplementation of the MinCED method, using the original Java code as a reference. It produces exactly the same results as MinCED, corrects some bugs (minced#35), and is much faster. The Diced implementation is available as a Rust library for convenience.

This is the Rust version, there is a Python package available as well.

📋 Features

  • library interface: The Rust implementation is written as library to facilitate reusability in other projects. It is used to implement a Python library using PyO3 to generate a native extension.
  • zero-copy: The Scanner which iterates over candidate CRISPRs is zero-copy if provided with a simple &str reference, but it also supports data behind smart pointers such as Rc<str> or Arc<str>.
  • fast string matching: The Java implementation uses a handwritten implementation of the Boyer-Moore algorithm[2], while the Rust implementation uses the str::find method of the standard library, which implements the Two-way algorithm[3]. In addition, the memchr crate can be used as a fast SIMD-capable implementation of the memmem function.

💡 Example

Diced supports any sequence in string format.

```rust let mut reader = std::fs::File::open("tests/data/AquifexaeolicusVF5.fna") .map(std::io::BufReader::new) .map(noodlesfasta::Reader::new) .unwrap(); let record = reader.records().next().unwrap().unwrap(); let seq = std::str::fromutf8(record.sequence().as_ref()).unwrap();

for crispr in diced::Scanner::new(&seq) { println!("{} to {}: {} repeats", crispr.start(), crispr.end(), crispr.len()); for repeat in crispr.repeats() { println!(" - at {}: {}", repeat.start(), repeat.as_str()); } } ```

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source GPLv3 license, or later. The code for this implementation was derived from the MinCED source code, which is available under the GPLv3 as well.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original MinCED authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.

📚 References

  • [1] Bland, C., Ramsey, T. L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N. C., & Hugenholtz, P. (2007). 'CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats'. BMC bioinformatics, 8, 209. PMID:17577412 doi:10.1186/1471-2105-8-209.
  • [2] Boyer, R. S. and & Moore, J. S. (1977). 'A fast string searching algorithm'. Commun. ACM 20, 10 762–772. doi:10.1145/359842.359859
  • [3] Crochemore, M. & Perrin, D. (1991). 'Two-way string-matching'. J. ACM 38, 3, 650–674. doi:10.1145/116825.116845

Owner

  • Name: Martin Larralde
  • Login: althonos
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL / LUMC, @zellerlab

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

GitHub Events

Total
  • Release event: 1
  • Watch event: 3
  • Push event: 7
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 3
  • Push event: 7
  • Create event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 74
  • Total Committers: 1
  • Avg Commits per committer: 74.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 15
  • Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Martin Larralde m****e@e****e 74
Committer Domains (Top 20 + Academic)
embl.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cargo 3,268 total
    • pypi 476 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 6
  • Total maintainers: 2
pypi.org: diced

Rust re-implementation of the MinCED algorithm to Detect Instances of CRISPRs in Environmental Data.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 476 Last month
Rankings
Dependent packages count: 10.8%
Average: 35.8%
Dependent repos count: 60.8%
Maintainers (1)
Last synced: 6 months ago
crates.io: diced-py

PyO3 bindings and Python interface to the diced crate.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3,268 Total
Rankings
Dependent repos count: 27.1%
Dependent packages count: 35.9%
Average: 53.2%
Downloads: 96.4%
Maintainers (1)
Last synced: 7 months ago

Dependencies

Cargo.lock cargo
  • adler 1.0.2
  • autocfg 1.3.0
  • bit-vec 0.6.3
  • bitflags 2.5.0
  • bstr 1.9.1
  • byteorder 1.5.0
  • bytes 1.6.0
  • cfg-if 1.0.0
  • crc32fast 1.4.2
  • crossbeam-channel 0.5.13
  • crossbeam-utils 0.8.20
  • equivalent 1.0.1
  • flate2 1.0.30
  • hashbrown 0.14.5
  • heck 0.4.1
  • indexmap 2.2.6
  • indoc 2.0.5
  • libc 0.2.155
  • lock_api 0.4.12
  • memchr 2.7.2
  • memoffset 0.9.1
  • miniz_oxide 0.7.3
  • noodles-bgzf 0.30.0
  • noodles-core 0.15.0
  • noodles-csi 0.35.0
  • noodles-fasta 0.38.0
  • noodles-gff 0.33.0
  • once_cell 1.19.0
  • parking_lot 0.12.3
  • parking_lot_core 0.9.10
  • percent-encoding 2.3.1
  • portable-atomic 1.6.0
  • proc-macro2 1.0.84
  • pyo3 0.21.2
  • pyo3-build-config 0.21.2
  • pyo3-ffi 0.21.2
  • pyo3-macros 0.21.2
  • pyo3-macros-backend 0.21.2
  • quote 1.0.36
  • redox_syscall 0.5.1
  • scopeguard 1.2.0
  • serde 1.0.203
  • serde_derive 1.0.203
  • smallvec 1.13.2
  • strsim 0.11.1
  • syn 2.0.66
  • target-lexicon 0.12.14
  • unicode-ident 1.0.12
  • unindent 0.2.3
  • windows-targets 0.52.5
  • windows_aarch64_gnullvm 0.52.5
  • windows_aarch64_msvc 0.52.5
  • windows_i686_gnu 0.52.5
  • windows_i686_gnullvm 0.52.5
  • windows_i686_msvc 0.52.5
  • windows_x86_64_gnu 0.52.5
  • windows_x86_64_gnullvm 0.52.5
  • windows_x86_64_msvc 0.52.5
Cargo.toml cargo
diced/Cargo.toml cargo
  • noodles-fasta 0.38.0 development
  • noodles-gff 0.33.0 development
  • memchr 2.7.2
  • strsim 0.11
diced-py/Cargo.toml cargo
diced-py/diced/tests/requirements.txt pypi
  • biopython * test
docs/requirements.txt pypi
  • ipython *
  • nbsphinx *
  • pygments *
  • pygments-style-monokailight *
  • recommonmark *
  • semantic_version *
  • setuptools >=46.4
  • setuptools-rust >=1.0
  • sphinx >=5.0
  • sphinxcontrib-jquery *
pyproject.toml pypi
.github/workflows/python.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v3 composite
  • actions/upload-artifact v2 composite
  • docker/setup-qemu-action v2 composite
  • dtolnay/rust-toolchain stable composite
  • pypa/cibuildwheel v2.19.0 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • rasmus-saks/release-a-changelog-action v1.0.1 composite
.github/workflows/rust.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/install v0.1 composite
  • actions-rs/toolchain v1 composite
  • actions/cache v2 composite
  • actions/checkout v1 composite
  • codecov/codecov-action v4.0.1 composite
  • rasmus-saks/release-a-changelog-action v1.0.1 composite