https://github.com/althonos/nafcodec

Rust coder/decoder for Nucleotide Archival Format (NAF) files.

https://github.com/althonos/nafcodec

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

bioinformatics naf nom-parser nucleotide-sequences rust-library
Last synced: 5 months ago · JSON representation

Repository

Rust coder/decoder for Nucleotide Archival Format (NAF) files.

Basic Info
  • Host: GitHub
  • Owner: althonos
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Homepage:
  • Size: 3.03 MB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 4
Topics
bioinformatics naf nom-parser nucleotide-sequences rust-library
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.md

📦🧬 nafcodec Stars

Rust coder/decoder for Nucleotide Archive Format (NAF) files.

Actions Codecov License Source Mirror Crate Documentation Changelog GitHub issues

🗺️ Overview

Nucleotide Archive Format is a file format proposed in Kryukov et al.[1] in 2019 for storing compressed nucleotide or protein sequences combining 4-bit encoding and Zstandard compression. NAF files can be compressed and decompressed using the original C implementation.

This crate provides a Rust implementation of a NAF decoder, from scratch, using nom for parsing the binary format, and zstd for handling Zstandard decompression. It provides a complete API that allows iterating over the contents of a NAF file.

This is the Rust version, there is a Python package available as well.

📋 Features

  • streaming decoder: The decoder is implemented using different readers each accessing a region of the compressed file, allowing to stream records without having to decode full blocks.
  • optional decoding: Allow the decoder to skip the decoding of certains fields, such as ignoring quality strings when they are not needed.
  • flexible encoder: The encoder is implemented using an abstract storage interface for temporary data, which allows to keep sequence in memory or inside a temporary folder.

🔌 Usage

Use a Decoder to iterate over the contents of a Nucleotide Archive Format, reading from any BufRead + Seek implementor:

```rust let mut decoder = nafcodec::Decoder::from_path("../data/LuxC.naf") .expect("failed to open nucleotide archive");

for result in decoder { let record = result.unwrap(); // .. do something with the record .. // } ```

All fields of the obtained Record are optional, and actually depend on the kind of data that was compressed. The decoder can be configured through a DecoderBuilder to ignore some fields to make decompression faster, even if they are present in the source archive:

```rust let mut decoder = nafcodec::DecoderBuilder::new() .quality(false) .with_path("../data/phix.naf") .expect("failed to open nucleotide archive");

// the archive contains quality strings... assert!(decoder.header().flags().test(nafcodec::Flag::Quality));

// ... but we configured the decoder to ignore them for result in decoder { let record = result.unwrap(); assert!(record.quality.is_none()) } ```

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source MIT license. The NAF specification is in the public domain.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original NAF authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

  • [1] Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi. "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences". Bioinformatics, Volume 35, Issue 19, October 2019, Pages 3826–3828. doi:10.1093/bioinformatics/btz144

Owner

  • Name: Martin Larralde
  • Login: althonos
  • Kind: user
  • Location: Heidelberg, Germany
  • Company: EMBL / LUMC, @zellerlab

PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.

GitHub Events

Total
  • Create event: 2
  • Release event: 2
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 6
  • Push event: 10
Last Year
  • Create event: 2
  • Release event: 2
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 6
  • Push event: 10

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 138
  • Total Committers: 1
  • Avg Commits per committer: 138.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 74
  • Committers: 1
  • Avg Commits per committer: 74.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Martin Larralde m****e@e****e 138
Committer Domains (Top 20 + Academic)
embl.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 1
  • Average time to close issues: 7 days
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 7 days
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • apcamargo (2)
Pull Request Authors
  • j-schlesinger (2)
Top Labels
Issue Labels
bug (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • cargo 8,844 total
    • pypi 641 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 12
  • Total maintainers: 2
pypi.org: nafcodec

PyO3 bindings and Python interface to nafcodec, an encoder/decoder for Nucleotide Archive Format (NAF) files.

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 641 Last month
Rankings
Dependent packages count: 7.3%
Average: 37.9%
Dependent repos count: 68.5%
Maintainers (1)
Last synced: 6 months ago
crates.io: nafcodec

Rust coder/decoder for Nucleotide Archive Format (NAF) files.

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 5,447 Total
Rankings
Dependent repos count: 30.4%
Dependent packages count: 30.8%
Forks count: 40.8%
Stargazers count: 42.4%
Average: 48.6%
Downloads: 98.4%
Maintainers (1)
Last synced: 6 months ago
crates.io: nafcodec-py

PyO3 bindings and Python interface to the nafcodec crate.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3,397 Total
Rankings
Dependent repos count: 30.5%
Dependent packages count: 30.8%
Forks count: 40.8%
Stargazers count: 42.4%
Average: 48.6%
Downloads: 98.5%
Maintainers (1)
Last synced: 6 months ago