Recent Releases of https://github.com/bede/deacon

https://github.com/bede/deacon - 0.10.0

  • Support for k-mer length up to 57 (previously 32) (@RagnarGrootKoerkamp)

- Rust
Published by bede 9 months ago

https://github.com/bede/deacon - 0.9.0

Performance optimisations (https://github.com/bede/deacon/pull/27) deliver up to 80% faster filtering with unchanged accuracy (@RagnarGrootKoerkamp). - >2Gbp/s with uncompressed long read input. - >500Mbp/s with gzip-compressed long read input.

- Rust
Published by bede 10 months ago

https://github.com/bede/deacon - 0.8.1

  • Fixes bug handling paired reads introduced in 0.8.0 which could lead to mispaired read output (@Kaibondchau)
  • Fixes bug handling multiline FASTA input introduced in 0.8.0 (@RagnarGrootKoerkamp)

- Rust
Published by bede 10 months ago

https://github.com/bede/deacon - 0.8.0

  • Faster filtering on multicore systems through improved work allocation using the Paraseq library (@noamteyssier). Filtering at >1Gbp/s is possible with uncompressed long sequences, and >500Mbp/s is achievable on many systems with Gzip-compressed long reads. Filtering Illumina reads is roughly twice as fast as before in my testing at ~200Mbp/s.
  • Added independent absolute (-a) and relative (-r) match thresholds with respective default values of 2 and 0.01 (1%). The new default relative threshold improves search specificity for long sequences over the previous absolute-only default threshold, without affecting short read accuracy. These replace the previous dual purpose -m parameter which could accept either an absolute (integer) threshold or a relative (float) threshold.
  • Minimizers containing ambiguous nucleotides are now ignored.
  • deacon index now offers the ability to discard minimizers with information content below a specified scaled Shannon --entropy (-e) threshold. This is disabled by default.
  • deacon filter now has a --debug mode which prints all records with minimizer matches to stderr including the matched minimizer sequence(s).
  • The default worst-case hash table capacity preallocation used in deacon index union operations can now be overriden with the new --capacity (-c) argument, in similar fashion to deacon index build.

- Rust
Published by bede 10 months ago

https://github.com/bede/deacon - 0.7.0

  • Deacon now uses the recently added simd-minimizers::iter_canonical_minimizer_values(), increasing filtering speed by up to 50% on Linux/x86_64 systems. Speeds of 1Gbp/s have been observed with uncompressed FASTA input. Thanks @RagnarGrootKoerkamp for a PR and improvements to simd-minimizers.
    • Index format is now version 2. Existing indexes must be rebuilt for use with this version. A new version of the panhuman-1 index is available from Zenodo and object storage. Attempting to load an incompatible index throws an error.
  • deacon index diff can now accept a fastx file or stream in place of a second index. This enables index masking using massive sequence collections without the need to first index them.
  • Position-dependent IUPAC ambiguous base canonicalisation was replaced with a simpler and faster fixed mapping, meaning that records containing ambiguous IUPAC bases may be classified differently to before.
  • deacon index union now automatically preallocates the required hash table capacity, eliminating slowdowns when combining indexes.
  • Compatible minimizer k and w is now validated (k+w-1 must be odd) prior to indexing.
  • Default index capacity is now 400M (Was 500M).

- Rust
Published by bede 11 months ago

https://github.com/bede/deacon - 0.6.0

  • Much faster gzip (de)compression via zlib-rs (@tmaklin)
    • Mostly affects reading speed – directly writing gzip files is still best avoided if possible
  • Support for reading and writing xz compressed files (in addition to Gzip and Zstandard)
  • Adjustable output --compression-level for gzip, zst and xz formats (requested by @dutchscientist)
  • Added report fields seqs_out_proportion and bp_out_proportion
  • deacon filter now displays the number and percentage of retained reads and base pairs during filtering

- Rust
Published by bede 11 months ago

https://github.com/bede/deacon - 0.5.0

  • Default filtering behaviour now passes index matches
    • Use --deplete (-d) to remove index matches
    • --invert has therefore been removed
  • Renamed arguments:
    • --nucleotides (-n) to --prefix-length (-p)
    • --report to --summary
  • Adds support for relative thresholds (floats between 0.0 and 1.0) for required minimizer hits to --matches
  • Adds -O short argument name for --output2
  • Adds tests

- Rust
Published by bede 12 months ago

https://github.com/bede/deacon - 0.4.0

  • Faster indexing
  • Adds non-interleaved paired output file support (--output2) (#4)
  • --log argument renamed to --report
  • Filter stats are now always sent to stderr (a json report can be written wherever one chooses)
  • For paired input sequences, identical minimizer hits in both mates of a read pair are now counted only once (#5)

Thanks @pmenzel for feedback at ABPHM!

- Rust
Published by bede about 1 year ago

https://github.com/bede/deacon - 0.3.0

  • Initial implementation of parallel filtering
    • Up to 10x faster filtering from initial testing
    • Configurable with new --threads parameter
      • Uses available CPU cores by default (--threads 0)
  • Default minimizer parameters changed to k=31 and w=15
  • Added tests

- Rust
Published by bede about 1 year ago

https://github.com/bede/deacon - 0.2.0

  • Faster indexing
  • Paired read support using either third positional argument or interleaved stdin
  • More accurate default parameters
  • Optional argument changes
  • Refactored lib.rs (@lmmx)
  • Dependency updates
    • Bincode2

- Rust
Published by bede about 1 year ago

https://github.com/bede/deacon - 0.1.0

Initial experimental release

- Rust
Published by bede about 1 year ago