Recent Releases of https://github.com/bede/deacon
https://github.com/bede/deacon - 0.10.0
- Support for k-mer length up to 57 (previously 32) (@RagnarGrootKoerkamp)
- Rust
Published by bede 9 months ago
https://github.com/bede/deacon - 0.9.0
Performance optimisations (https://github.com/bede/deacon/pull/27) deliver up to 80% faster filtering with unchanged accuracy (@RagnarGrootKoerkamp). - >2Gbp/s with uncompressed long read input. - >500Mbp/s with gzip-compressed long read input.
- Rust
Published by bede 10 months ago
https://github.com/bede/deacon - 0.8.1
- Fixes bug handling paired reads introduced in 0.8.0 which could lead to mispaired read output (@Kaibondchau)
- Fixes bug handling multiline FASTA input introduced in 0.8.0 (@RagnarGrootKoerkamp)
- Rust
Published by bede 10 months ago
https://github.com/bede/deacon - 0.8.0
- Faster filtering on multicore systems through improved work allocation using the Paraseq library (@noamteyssier). Filtering at >1Gbp/s is possible with uncompressed long sequences, and >500Mbp/s is achievable on many systems with Gzip-compressed long reads. Filtering Illumina reads is roughly twice as fast as before in my testing at ~200Mbp/s.
- Added independent absolute (
-a) and relative (-r) match thresholds with respective default values of 2 and 0.01 (1%). The new default relative threshold improves search specificity for long sequences over the previous absolute-only default threshold, without affecting short read accuracy. These replace the previous dual purpose-mparameter which could accept either an absolute (integer) threshold or a relative (float) threshold. - Minimizers containing ambiguous nucleotides are now ignored.
deacon indexnow offers the ability to discard minimizers with information content below a specified scaled Shannon--entropy(-e) threshold. This is disabled by default.deacon filternow has a--debugmode which prints all records with minimizer matches to stderr including the matched minimizer sequence(s).- The default worst-case hash table capacity preallocation used in
deacon index unionoperations can now be overriden with the new--capacity(-c) argument, in similar fashion todeacon index build.
- Rust
Published by bede 10 months ago
https://github.com/bede/deacon - 0.7.0
- Deacon now uses the recently added
simd-minimizers::iter_canonical_minimizer_values(), increasing filtering speed by up to 50% on Linux/x86_64 systems. Speeds of 1Gbp/s have been observed with uncompressed FASTA input. Thanks @RagnarGrootKoerkamp for a PR and improvements to simd-minimizers.- Index format is now version 2. Existing indexes must be rebuilt for use with this version. A new version of the panhuman-1 index is available from Zenodo and object storage. Attempting to load an incompatible index throws an error.
deacon index diffcan now accept a fastx file or stream in place of a second index. This enables index masking using massive sequence collections without the need to first index them.- Position-dependent IUPAC ambiguous base canonicalisation was replaced with a simpler and faster fixed mapping, meaning that records containing ambiguous IUPAC bases may be classified differently to before.
deacon index unionnow automatically preallocates the required hash table capacity, eliminating slowdowns when combining indexes.- Compatible minimizer k and w is now validated (k+w-1 must be odd) prior to indexing.
- Default index capacity is now 400M (Was 500M).
- Rust
Published by bede 11 months ago
https://github.com/bede/deacon - 0.6.0
- Much faster gzip (de)compression via zlib-rs (@tmaklin)
- Mostly affects reading speed – directly writing gzip files is still best avoided if possible
- Support for reading and writing xz compressed files (in addition to Gzip and Zstandard)
- Adjustable output
--compression-levelfor gzip, zst and xz formats (requested by @dutchscientist) - Added report fields
seqs_out_proportionandbp_out_proportion deacon filternow displays the number and percentage of retained reads and base pairs during filtering
- Rust
Published by bede 11 months ago
https://github.com/bede/deacon - 0.5.0
- Default filtering behaviour now passes index matches
- Use
--deplete(-d) to remove index matches --inverthas therefore been removed
- Use
- Renamed arguments:
--nucleotides(-n) to--prefix-length(-p)--reportto--summary
- Adds support for relative thresholds (floats between 0.0 and 1.0) for required minimizer hits to
--matches - Adds
-Oshort argument name for--output2 - Adds tests
- Rust
Published by bede 12 months ago
https://github.com/bede/deacon - 0.4.0
- Faster indexing
- Adds non-interleaved paired output file support (
--output2) (#4) --logargument renamed to--report- Filter stats are now always sent to stderr (a json report can be written wherever one chooses)
- For paired input sequences, identical minimizer hits in both mates of a read pair are now counted only once (#5)
Thanks @pmenzel for feedback at ABPHM!
- Rust
Published by bede about 1 year ago
https://github.com/bede/deacon - 0.3.0
- Initial implementation of parallel filtering
- Up to 10x faster filtering from initial testing
- Configurable with new
--threadsparameter- Uses available CPU cores by default (
--threads 0)
- Uses available CPU cores by default (
- Default minimizer parameters changed to k=31 and w=15
- Added tests
- Rust
Published by bede about 1 year ago
https://github.com/bede/deacon - 0.2.0
- Faster indexing
- Paired read support using either third positional argument or interleaved stdin
- More accurate default parameters
- Optional argument changes
- Refactored lib.rs (@lmmx)
- Dependency updates
- Bincode2
- Rust
Published by bede about 1 year ago
https://github.com/bede/deacon - 0.1.0
Initial experimental release
- Rust
Published by bede about 1 year ago