https://github.com/althonos/nafcodec
Rust coder/decoder for Nucleotide Archival Format (NAF) files.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 1 committers (100.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Repository
Rust coder/decoder for Nucleotide Archival Format (NAF) files.
Basic Info
Statistics
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 4
Topics
Metadata Files
README.md
📦🧬 nafcodec 
Rust coder/decoder for Nucleotide Archive Format (NAF) files.
🗺️ Overview
Nucleotide Archive Format is a file format proposed in Kryukov et al.[1] in 2019 for storing compressed nucleotide or protein sequences combining 4-bit encoding and Zstandard compression. NAF files can be compressed and decompressed using the original C implementation.
This crate provides a Rust implementation of a NAF decoder, from scratch,
using nom for parsing the binary format,
and zstd for handling Zstandard
decompression. It provides a complete API that allows iterating over
the contents of a NAF file.
This is the Rust version, there is a Python package available as well.
📋 Features
- streaming decoder: The decoder is implemented using different readers each accessing a region of the compressed file, allowing to stream records without having to decode full blocks.
- optional decoding: Allow the decoder to skip the decoding of certains fields, such as ignoring quality strings when they are not needed.
- flexible encoder: The encoder is implemented using an abstract storage interface for temporary data, which allows to keep sequence in memory or inside a temporary folder.
🔌 Usage
Use a Decoder
to iterate over the contents of a Nucleotide Archive Format,
reading from any BufRead +
Seek implementor:
```rust let mut decoder = nafcodec::Decoder::from_path("../data/LuxC.naf") .expect("failed to open nucleotide archive");
for result in decoder { let record = result.unwrap(); // .. do something with the record .. // } ```
All fields of the obtained Record
are optional, and actually depend on the kind of data that was compressed.
The decoder can be configured through a
DecoderBuilder
to ignore some fields to make decompression faster, even if they are present
in the source archive:
```rust let mut decoder = nafcodec::DecoderBuilder::new() .quality(false) .with_path("../data/phix.naf") .expect("failed to open nucleotide archive");
// the archive contains quality strings... assert!(decoder.header().flags().test(nafcodec::Flag::Quality));
// ... but we configured the decoder to ignore them for result in decoder { let record = result.unwrap(); assert!(record.quality.is_none()) } ```
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the open-source MIT license. The NAF specification is in the public domain.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the original NAF authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
📚 References
- [1] Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi. "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences". Bioinformatics, Volume 35, Issue 19, October 2019, Pages 3826–3828. doi:10.1093/bioinformatics/btz144
Owner
- Name: Martin Larralde
- Login: althonos
- Kind: user
- Location: Heidelberg, Germany
- Company: EMBL / LUMC, @zellerlab
- Twitter: althonos
- Repositories: 91
- Profile: https://github.com/althonos
PhD candidate in Bioinformatics, passionate about programming, SIMD-enthusiast, Pythonista, Rustacean. I write poems, and sometimes they are executable.
GitHub Events
Total
- Create event: 2
- Release event: 2
- Issues event: 3
- Watch event: 1
- Issue comment event: 6
- Push event: 10
Last Year
- Create event: 2
- Release event: 2
- Issues event: 3
- Watch event: 1
- Issue comment event: 6
- Push event: 10
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Larralde | m****e@e****e | 138 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 1
- Average time to close issues: 7 days
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 7 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- apcamargo (2)
Pull Request Authors
- j-schlesinger (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- cargo 8,844 total
- pypi 641 last-month
-
Total dependent packages: 1
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 12
- Total maintainers: 2
pypi.org: nafcodec
PyO3 bindings and Python interface to nafcodec, an encoder/decoder for Nucleotide Archive Format (NAF) files.
- Homepage: https://github.com/althonos/nafcodec
- Documentation: https://nafcodec.readthedocs.io/
- License: MIT
-
Latest release: 0.3.1
published about 1 year ago
Rankings
Maintainers (1)
crates.io: nafcodec
Rust coder/decoder for Nucleotide Archive Format (NAF) files.
- Homepage: https://github.com/althonos/nafcodec
- Documentation: https://docs.rs/nafcodec/
- License: MIT
-
Latest release: 0.3.1
published about 1 year ago
Rankings
Maintainers (1)
crates.io: nafcodec-py
PyO3 bindings and Python interface to the nafcodec crate.
- Homepage: https://github.com/althonos/nafcodec
- Documentation: https://docs.rs/nafcodec-py/
- License: MIT
-
Latest release: 0.3.1
published about 1 year ago