Recent Releases of diamond
diamond - DIAMOND v2.1.13
- Fixed an invalid error message for the
cluster,deepclustandlinclustworkflows. - Added the option
--oid-outputto output ordinal IDs instead of accessions for the clustering workflows, reducing their memory use. - Added support for using the
--multiprocessingfeature on Windows. - Using
--multiprocessingrequires explicitly setting--parallel-tmpdir. - Fixed a bug that could cause a crash when the
--target-indexedoption was used. - As of now, a macOS binary is available for the GitHub release, supporting both x86 and Apple silicon CPUs. Using BLAST databases is also supported.
- Added compatibility with later CMake versions (tested up to v4.0.3).
- Added CMake option
-DCROSS_COMPILEto disable auto-detection of host architecture. - Added compilation script to produce macOS fat binary.
- C++
Published by bbuchfink 10 months ago
diamond - DIAMOND v2.1.12
- Added support for the new NCBI taxonomic ranks "cellular root", "acellular root", "domain" and "realm".
- Added support for using BLAST databases to the Bioconda release (thanks @mencian).
- Fixed compiler errors for Clang 20.
- Enabled transitive closure computation in earlier clustering rounds and for bi-directional coverage clustering.
- Fixed an issue that could cause hits to be partially lost in frameshift alignment mode when they occured in both query strands for the same target.
- Fixed an error parsing FASTQ files when quality value lines started with the @ character.
- Fixed a compiler error on macOS.
- C++
Published by bbuchfink 12 months ago
diamond - DIAMOND v2.1.11
- Improved the performance and sensitivity of the
cluster,deepclustandlinclustworkflows. - The
--fastermode will by default use a minimizer sketch of fixed size per sequence instead of window-based minimizers. - Added the option
--sketch-sizeto enable seeding using a minimizer sketch of the given size per sequence. - Cascaded clustering and iterated search will by default use the
--fastmode with linearization in the second round. - The
--round-coverageparameter is now also applied to uni-directional coverage clustering. - Cluster output files will correctly contain carriage returns on Windows.
- Fixed generation of the Docker container against the latest version of the NCBI toolkit.
- Fixed a bug that caused target coordinates not to be reported correctly in the tabular format in frameshift alignment mode.
- Added the options
--ungapped-evalueand--ungapped-evalue-shortto set e-value thresholds for the ungapped hit filter. - Linearization of search or clustering rounds is limited to seeds of weight >= 10.
- Fixed an issue that could cause an
array size overflowerror when using very large.dmnddatabases with taxonomic annotation. - Fixed a bug that caused query letters to be printed as
ARNDinstead ofACGTin theviewworkflow. - Fixed a bug that caused using paired end input files to malfunction with an error message.
- Fixed a bug that could produce clustering errors when clustering at sequence identities >= 50% and processing the database in multiple super blocks.
- Fixed a bug that could cause a crash in global ranking mode.
- Accession parsing rules applied to database sequence accessions for the purpose of matching them to accessions in the taxonomy mapping file are now by default also applied to the accessions in the mapping file (disable using
--no-parse-seqids). - Fixed an issue that could cause increased memory use in the hash join stage.
- Added support for FASTA headers containing multiple sequence IDs separated by blank spaces (so far only the
\1character was supported as a separator). - Fixed an issue that could cause hanging or crashes in the
Computing alignmentsstage. --linsearchcan now be used in conjunction with--iterate.- Fixed a compiler error for GCC 4.8.5.
- Fixed a compiler error on Solaris.
- Fixed compiler errors on systems that do not support the sysinfo function.
- Fixed
Bus erroroccuring on Sparc systems. - Compilation on Sparc systems can be performed without setting
-DX86=OFF. - Fixed two issues that could cause increased memory use in the computing alignments stage.
- Fixed a bug that caused superfluous quote characters in the JSON output format.
- Linear search modes will by default use full-matrix extension.
- Fixed an issue that could cause reduced performance in the masking sequences stage.
- Fixed a bug that could cause a crash when using mutual coverage thresholds in blastx mode.
- Fixed a bug that could cause a crash when the
--include-lineageoption was used. - When reading protein sequences that unexpectedly only contain DNA letters, an error message is only produced if the first 10 sequences in the input file all exhibit the problem.
- Fixed a bug that caused setting
--top 100not to function correctly. - Fixed a bug that caused target coordinates not to be reported correctly in the output of the
realignworkflow. - Fixed a bug that did not permit using the
--memory-limit/-Moption for therealignworkflow. - Fixed an issue that could cause non-deterministic output in frameshift alignment mode.
- Fixed a bug that could cause a crash when using the XML output format in the
viewworkflow. - Fixed an issue that could cause non-deterministic output for identically-scoring HSPs in the same target.
- Disabled the default use of increased coverage and identity cutoffs in earlier clustering rounds.
- Optimized the performance of the extension stage when coverage or approximate identity filters are used.
- Optimized the performance of the extension stage when not using output fields that require alignment traceback.
- Fixed an issue that could cause an incorrect order of cascaded clustering rounds.
- C++
Published by bbuchfink over 1 year ago
diamond - DIAMOND v2.1.10
- Fixed a bug that could cause a crash when using a bi-directional coverage cutoff in query-indexed mode.
- Fixed a bug that caused the
--include-lineageoption to malfunction for targets with no taxonomic assignment available.
- C++
Published by bbuchfink over 1 year ago
diamond - DIAMOND v2.1.9
- Corrected the prefix of the query length field for the SAM format.
- Added the size modifiers 'T', 'M' and 'K' for the
--memory-limit/-Moption. - Added the option
--mutual-coverto cluster sequences by mutual coverage percentage of the cluster representative and member sequence. - Added the option
--symmetricfor computing greedy vertex cover with symmetric edges. - Fixed an issue that caused the
--approx-idoption and theapprox_pidentoutput field not to work correctly when using the--anchored-swipeoption. - Added the option
--no-reassignto prevent reassignment to closest representative for the greedy vertex cover and clustering workflows. - Added the option
--connected-component-depthto activate clustering of connected components at a given maximum depth for the greedy vertex cover and the clustering workflows. - Fixed a compiler error for Clang v17.
- Improved search performance when searching with mutual coverage threshold by filtering for sequence length ratio.
- Added the sensitivity mode
--shapes-30x10with sensitivity approximately equivalent to--mid-sensitive. - Added the options
--round-coverageand--round-approx-idto set per round cutoffs for cascaded clustering. - The CMake switch
-DKEEP_TARGET_IDis now obsolete and the corresponding function is always available. - Added the option
--include-lineageto the taxonomic classification format to include taxonomic lineage in the output. - Added native support for the ARM NEON instruction set (contributed by @althonos).
- C++
Published by bbuchfink over 2 years ago
diamond - DIAMOND v2.1.8
- Fixed an issue that could cause reduced performance when running in query-indexed mode.
- Added support for the JSON output format (option
-f json-flat). - Added the option
--sam-query-lento output query length in SAM format.
- C++
Published by bbuchfink almost 3 years ago
diamond - DIAMOND v2.1.7
- Fixed a bug that caused taxonomy names not to be loaded correctly for the
makedbworkflow. - Fixed a bug that caused a crash when using the
--target-indexedoption. - Fixed an error when using the
--tmpdiroption for themakedbworkflow. - Added a warning message when sequence accessions are shortened due to parsing rules for the
makedbworkflow. - Added the option
--no-parse-seqidsto disable parsing of sequence accessions. - Changed the command line help to print options separated by command.
- Fixed an issue that the
--ignore-warningsoption could not be used for themakedbworkflow.
- C++
Published by bbuchfink about 3 years ago
diamond - DIAMOND v2.1.6
- Fixed compatibility issues on older systems without support for AVX2.
- Fixed linker errors when compiled with
-DX86=OFF. - Fixed a compiler error on macOS systems.
- Fixed a bug that could cause missing tags in the XML output format and unaligned queries not to be reported correctly.
- Fixed a bug that caused the PAF output format not to work correctly.
- C++
Published by bbuchfink about 3 years ago
diamond - DIAMOND v2.1.5
- Disabled the use of frequency based seed masking when using the linear-time search feature with respect to the targets.
- Fixed a bug that caused a
Database file is not a BLAST databaseerror message for theprepdbworkflow. - Fixed a bug that caused a segmentation fault when using BLAST databases.
- Added line numbers for error messages when reading taxonomy mapping files.
- Fixed a bug that could cause a crash when using the
greedy-vertex-coverworkflow without the--outand--centroid-outoptions. - Fixed a bug that caused the
greedy-vertex-coverworkflow to only produce a trivial clustering. - Fixed a bug that caused the last codon of the -2 reading frame to be translated incorrectly.
- Reduced the memory use of the clustering workflow.
- Updated the bundled NCBI toolkit to the latest version.
- C++
Published by bbuchfink about 3 years ago
diamond - DIAMOND v2.1.4
- Leading spaces are now trimmed and tabulator characters escaped as
\tin sequence titles, and a warning message is produced. - Blank sequence titles are now replaced by
N/A, and a warning message is produced. - Fixed a bug that could cause a
Traceback errorin certain cases. - Fixed a bug that caused the
qlenandscoreoutput fields not to be reported correctly for therealignworkflow. - Added an error message when using unsupported output fields for the
realignworkflow. - Fixed an issue that could cause a
Missing fields in input lineerror when clustering. - Optimized the performance of the
linclustworkflow. - Reduced the memory use of the clustering workflow.
- Fixed a bug that caused using standard input as the query not to work.
- C++
Published by bbuchfink over 3 years ago
diamond - DIAMOND v2.1.3
- Fixed compiler errors for GCC 4.8.
- Fixed a GCC compiler error.
- Fixed a segfault issue occuring when compiled using GCC 12 on ARM64 systems.
- Fixed an issue that caused missing support for AVX2.
- C++
Published by bbuchfink over 3 years ago
diamond - DIAMOND v2.1.2
- The iterated search mode (option
--iterate) now uses a linear-time feature as the first search round. - Added the
linclustcommand to cluster using only a single linear-time search round. - Fixed compiler errors on macOS.
- Fixed a bug that caused invalid alignment traceback output for the DAA
viewworkflow. - Added the
merge-daaworkflow to merge DAA files. - Fixed an error when using the
--max-target-seqs/-koption for the DAAviewworkflow. - Removed AVX2 support from the Windows release binary to ensure compatibility with older systems.
- Permitted the
--ignore-warningsoption for theclusteranddeepclustworkflows. - Use unlinked temporary files for database blocks in clustering workflows.
- Fixed a bug that could cause invalid results when using a clustering step with linearization as the final round in combination with database processing in multiple super blocks.
- The
--lin-stage1option can now be used without compilation using the-DEXTRA=ONcmake option. - Added the option to specify the
_linsuffix for sensitivity keywords for the--iterateoption to activate linear-time feature. - Added the option
--linsearchto activate linear-time feature for the search workflows. - Fixed a bug that caused the
pposandpositiveoutput fields not to work for therealignworkflow. - Fixed an issue that caused motif masking not to work when compiled with link time optimization.
- C++
Published by bbuchfink over 3 years ago
diamond - DIAMOND v2.1.1
- Fixed compilation errors on non-x86 systems and for the clang compiler.
- Fixed an error message when running the
reclusterworkflow. - Fixed a bug that could cause an
invalid varint encodingerror when using the DAA format. - Fixed a bug that could cause corrupted DAA output.
- Fixed a bug that caused an error in the
viewworkflow. - Adjusted the hit culling heuristic of the frameshift alignment mode to be less aggressive.
- C++
Published by bbuchfink over 3 years ago
diamond - DIAMOND v2.1.0
- Added the
clusterworkflow to cluster protein sequences. - Added the
realignworkflow to generate clustering output. - Added the
reclusterworkflow to correct errors in clusterings. - Added the
reassignworkflow to reassign cluster members to their closest centroid. - Added the option
-M/--memory-limitto set a memory limit for clustering workflows. - Added the
--approx-idoption to filter alignments by approximate sequence identity and to set an approximate sequence identity threshold for clustering. - Added the
--member-coveroption to set the coverage threshold of the cluster member sequence. - Added the
--cluster-stepsoption to set steps for cascaded clustering. - Added the
--clustersoption to specify clustering input file. - The
blastxmode will now mask any open reading frame below the minimum required length as specified by--min-orf. - The
blastxmode will only count unmasked letters towards the block size. - Fixed a bug that caused an error when using the global ranking mode.
- Added the fast mode as the first round in iterative searches.
- Fixed a bug that caused the program not to function on systems without support for SSE4.1.
- Improved multi-threaded load balancing of gapped extension computations.
- Improved performance of seed extension stage when HSP filter settings are used.
- Added the option
--soft-maskingwith possible values0andtantanto permit soft-masking using the tantan algorithm. - Fixed a bug that could cause an
inflate errorin multiprocessing mode. - Added the option
--swipeto compute full Smith Waterman alignments of all queries against all targets. - Added the sensitivity mode
--faster. - Added the output fields
approx_pidentandcorrected_bitscoreto the tabular format. - Added the
--lin-stage1option to linearize comparisons in the seeding stage by only considering hits against the longest query sequence for identical seeds (only supported when compiled with-DEXTRA=ON). - Added the
--kmer-rankingoption to rank sequences when--lin-stage1is used (only supported when compiled with-DKEEP_TARGET_ID=ON). - Added the option
--no-block-size-limitto deactivate upper limits for the block size when the--memory-limitoption is used. - Added the
greedy-vertex-coverworkflow to compute clustering based on alignments. - Added the
--edge-formatoption to set edge format for greedy vertex cover. - Added the
--edgesoption to set input file for greedy vertex cover. - Added the
--centroid-outoption to output centroid sequences for greedy vertex cover. - Added the
--unaligned-targetsoption to generate an output file of unaligned targets. - Fixed an issue that failed compilation using the Intel Compiler.
- Fixed an issue that could cause a segmentation fault in rare cases.
- The
--headeroption can now be used with the parametersimpleto enable simple headers for the tabular format, or without a parameter to enable headers for the clustering format. - Added the option
--mp-selfto optimize self-alignment in multiprocessing mode. - Added the option
--query-or-subject-coverto report alignments if the query or the subject cover (or both) are above the given threshold. - Removed support for the
--comp-based-stats 2option (now equivalent to--comp-based-stats 3). - Removed hit culling in case of overlapping target ranges in frameshift alignment mode.
- Added the option
--anchored-swipeto activate anchored SWIPE extension.
- C++
Published by bbuchfink over 3 years ago
diamond - DIAMOND v2.0.15
- Fixed a bug (present since v2.0.12) that caused the
diamond viewworkflow to report a zero bit score for all alignments.
- C++
Published by bbuchfink about 4 years ago
diamond - DIAMOND v2.0.14
- Fixed a compiler error on Linux systems that do not define
_SC_LEVEL3_CACHE_SIZE. - Fixed an error when using
--unal 1with thecigaroutput field. - Fixed an
illegal instructionerror on systems that did not support AVX2. - Fixed a bug (present since v2.0.12) that could cause an error or suboptimal alignments when HSP filter settings were used.
- C++
Published by bbuchfink over 4 years ago
diamond - DIAMOND v2.0.13
- Fixed a bug that caused invalid bit scores in frameshift alignment mode.
- C++
Published by bbuchfink over 4 years ago
diamond - DIAMOND v2.0.12
- Fixed an error when using HSP filter settings together with a BLAST database.
- Optimized the performance of alignment traceback.
- A non-default setting of
--max-hspswill now recompute a full-matrix Smith Waterman alignment with the ranges of the known HSPs masked in the target. - A non-default setting for
--max-hspscan now be used together with--ext full. - The sensitivity levels used for iterated searches can now be manually set by using a space-separated list after the
--iterateoption. - Seeds are masked based on complexity instead of frequency by default.
- Added the option
--seed-cutto set a complexity cutoff for indexed seeds. - Added the option
--freq-maskingto enable masking seeds based on frequency. - The fast, default, mid-sensitive and sensitive modes will by default softmask a fixed set of highly abundant sequence motifs.
- Added the option
--motif-masking (0,1)to enable or disable motif masking. - Added the option
--masking segto enable SEG masking of target sequences (BLAST default) instead of tantan masking. - Fixed a bug that caused the
full_sseqoutput field to contain invalid information or to produce an error when using a BLAST database. - Changed composition based statistics to use BLOSUM62 background frequencies.
- Fixed the zstd dependency in the Dockerfile.
- Added support for gap letters in BLAST databases.
- Fixed a bug that caused the
--custom-matrixoption not to function correctly. - Changed the overlap for merging adjoining bands to >0.0.
- Use more moderate filtering of HSPs in the chaining stage.
- C++
Published by bbuchfink over 4 years ago
diamond - DIAMOND v2.0.11
- Fixed a bug that could cause invalid output when using
--masking 0combined with the global ranking mode. - Enabled lazy repeat masking in the query-indexed and contiguous seed modes when using global ranking.
- Added detection of cache size to auto-enable query-indexed mode.
- C++
Published by bbuchfink almost 5 years ago
diamond - DIAMOND v2.0.10
- Using BLAST databases now requires a preprocessing step using the command
prepdb. The command line is:diamond prepdb -d /path/to/database. This call runs quickly and will write some small auxiliary files into the database directory. - Improved performance of searching small query files.
- Added the "iterative" search mode (option
--iterate) to search the query dataset with increasing sensitivity, only searching queries at the target sensitivity that do not produce a significant alignment at a lower sensitivity search. For example, using--sensitive --iteratewill first search the query file at default sensitivity, and search all query sequences again in--sensitivemode that fail to align in the first round. - Added the "global ranking" mode (option
-g) to set a limit on the number of Smith Waterman extensions per query, with the target sequences ranked by their ungapped extension scores. - Added the
--fastsensitivity mode that is faster and less sensitive than the default mode. - Reduced the time for loading target sequences from BLAST databases.
- Added the contiguous-seed mode (option
--algo ctg) to improve performance for small query files. - Added support for using
--comp-based-stats (3,4)in combination with--ext full. - Fixed a bug that could cause a
Traceback errorwhen using--comp-based-stats (3,4)in rare cases. - Changed the
full_sseqoutput field to always contain unmasked sequences. - Fixed an issue that could cause target output order to be nondeterministic in case of identically scoring hits.
- Added support for reading zstd-compressed input files (auto-detected) and writing zstd-compressed output files (option
--compress zstd) (requires compilation usingcmake -DWITH_ZSTD=ON). - Compilation with BLAST database support requires the zstd library.
- Added error message when reading protein sequences from FASTA files that only contain DNA letters (can be disabled using
--ignore-warnings).
- C++
Published by bbuchfink almost 5 years ago
diamond - DIAMOND v2.0.9
- Reduced the memory use of database building with taxonomy mapping.
- Removed the limitation of sequence accession length.
- Fixed a bug that could cause using a BLAST database not to function correctly.
- Added support for using BLAST alias databases (created by
blastdb_aliastool). - Reduced the memory use of the seed hit sorting stage.
- Improved the consistency of results when running in query-indexed mode (
--algo 1). - Added the option
--skip-missing-seqidsto ignore cases of missing sequences in the database when using the--seqidlistoption. - The
--min-orfparameter now defaults to 1 in frameshift alignment mode. - Added support for using BLAST databases to the Docker container.
- C++
Published by bbuchfink about 5 years ago
diamond - DIAMOND v2.0.8
- Added support for directly using BLAST database files instead of the Diamond-formatted
.dmnddatabase files. This feature is not yet available through all release channels. It can currently be accessed by downloading the GitHub release version or by compiling from source. Taxonomy features are not yet supported for BLAST databases. - Added the option
--seqidlistto filter the database by sequence accession (only supported for BLAST databases). - Fixed a bug that caused the
--dbsizeoption not to function correctly. - Added the command
makeidxand the option--target-indexedthat provide an optimisation specialized for small databases (<10 Mb). (see: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics#small-database-optimization) - Added the option
--mp-recoverto recover aborted runs in multiprocessing mode.
- C++
Published by bbuchfink about 5 years ago
diamond - DIAMOND v2.0.7
- Added support for computing full-matrix instead of banded Smith Waterman extensions (command line option
--ext full). - Added support for the new
prot.accession2taxid.FULL.gztaxonomy mapping file from NCBI. - Added the option
--gapped-filter-evalueto set the e-value threshold of the gapped filter heuristic. - Added setting the scores of the mask letter according to BLAST rules when a compositionally adjusted matrix is used.
- Changed formatting of e-values to print two decimals instead of one.
- Added the output field
qseq_translatedto print the translation of the aligned part of the query sequence. - Added support for providing two input files to
--query/-qwhen running alignment in blastx mode. - Added the output field
full_qseq_mateto print the sequence of the query's mate (enabled when using two query files in blastx mode). - Fixed a bug that could cause a crash in blastx mode for very long queries.
- C++
Published by bbuchfink over 5 years ago
diamond - DIAMOND v2.0.6
- Changed the computation of expected values to use the method described in Park, Y., Sheetlin, S., Ma, N. et al. New finite-size correction for local alignment score distributions. BMC Res Notes 5, 286 (2012).
- Enabled the use of a custom scoring matrix without having to specify the statistical parameters (option
--custom-matrix). - Added support for compositional matrix adjust as described in Yi-Kuo Yu, Stephen F. Altschul, The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions, Bioinformatics, Volume 21, Issue 7, 1 April 2005, Pages 902–911. Three additional modes have been added that can be enabled by setting
--comp-based-stats (2,3,4)(the feature is not enabled by default and does not support translated searches at the moment). - Fixed a bug that could cause incorrect alignment coordinates, gaps counts and sequence identities being reported by
diamond view. - Targets are sorted by bit score instead of e-value in the alignment output when the
--topparameter is used. - Disabled support of custom scoring matrices for the DAA format.
- Fixed a bug that caused the use of a custom scoring matrix not to function correctly.
- Fixed an issue that caused the portable binary not to function on systems that did not support AVX.
- Added the option
--no-unlinkto prevent unlinking of temporary files.
- C++
Published by bbuchfink over 5 years ago
diamond - DIAMOND v2.0.5
- Fixed an issue that could cause high memory use in frameshift alignment mode.
- C++
Published by bbuchfink over 5 years ago
diamond - DIAMOND v2.0.4
- Fixed a bug that could cause the
--max-target-seqs/-k,--ext-chunk-sizeand--file-buffer-sizeoptions not to function correctly on macOS.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v2.0.3
- Added a new sensitivity mode that is between the default mode and the sensitive mode in sensitivity (option
--mid-sensitive). - Added counters for total number of reference blocks, shapes and index chunks to the status messages.
- Fixed a bug (persisting since v2.0.2) that could cause secondary HSPs within one target not to be reported if the
--max-hspsoption was used with a non-default setting. - Fixed a bug that could cause an invalid error message with regard to the database format in certain cases.
- The
--no-self-hitsoption is no longer supported inblastxmode. - Changed the semantics of the
--no-self-hitsoption to check for equality of both sequence and sequence id, independent of the computed alignment. - The selection of the top hit when using
--topwill respect the identity, coverage and no-self-hits filter settings (does not apply when frameshift alignment is enabled). - The inclusion criterion for
--topis applied to the bit score instead of the raw score and is no longer affected by integer rounding (does not apply when frameshift alignment is enabled). - Improved the accuracy of the ranking heuristic.
- Added the options
--ext-chunk-sizeand--no-rankingto control the ranking heuristic.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v2.0.2
- Fixed a bug (persisting since v2.0.0) that could cause incomplete results in
blastxmode. - Reduced the use of temporary disk space.
- Fixed an issue that could cause long runtimes when using the
--taxon-listoption.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v2.0.1
- Added feature for using the tool in a distributed computing environment. (See here for details: http://www.diamondsearch.org/index.php?pages/distributed_computing/)
- Fixed an issue that could cause increased memory usage and runtimes in certain cases.
- Fixed a bug that could cause a crash when using
--comp-based-stats 0. - Fixed a bug that could cause a crash for small input files in certain cases.
- Fixed a bug that could cause filtering hits for identity or range cover not to function correctly when using the tabular format without traceback being enabled.
- Added warning messages to recommend block size parameters based on system RAM.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v2.0.0
- Added the sensitivity modes
--very-sensitiveand--ultra-sensitive. Both modes are designed for finding distant hits of <40% identity with a sensitivity similar to BLAST, with the ultra-sensitive mode being the slightly more sensitive mode. - The
--block-size/-bparameter is set to 0.4 and the--index-chunks/-cparameter is set to 1 by default in the new sensitivity modes. - Improved performance.
- Added the option
--extwith possible valuesbanded-fastandbanded-slowto adjust band setup for Smith Waterman extensions (new default isbanded-fastfor the default and sensitive mode, andbanded-slowotherwise). - Added automatic disabling of alignment traceback if not required by the user-defined output fields in tabular output format.
- Changed the default value of the
--max-hspsparameter (the maximum number of HSPs per target sequence to report for each query) to 1. - Changed the default value of the
--freq-sdparameter from 10 to 20 for the sensitive mode. - Fixed a compiler error on FreeBSD.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v0.9.36
- Fixed a bug that could cause
makedbto produce invalid database files when using taxonomy features. - Fixed a bug that could cause a crash when running in query-indexed mode.
WARNING: This version contains a serious bug that can cause incomplete results in blastx mode. Using it is not recommended.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v0.9.35
- Fixed a bug in
diamond viewthat would cause high memory usage and erroneous output. - Reduced the use of temporary disk space.
- Fixed a database compatibility issue with big endian architectures.
- Fixed a bug that would cause a crash for query sequences shorter than 5 letters in blastx mode.
- Fixed a bug that would cause a crash when using a FASTA file as database parameter in blastx mode.
- Added support for the following new ranks in the NCBI taxonomy: biotype, clade, forma specialis, genotype, isolate, morph, pathogroup, serogroup, serotype, strain, subvariety.
WARNING: This version contains a serious bug that can cause incomplete results in blastx mode. Using it is not recommended.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v0.9.34
- Fixed a compiler error for native builds.
- Fixed a compiler error for GCC 4.8.
- Fixed a compiler error when support for SSSE3 was enabled without support for SSE4.1.
- Implemented asynchronous loading of seed hits.
WARNING: This version contains a serious bug that can cause incomplete results in blastx mode. Using it is not recommended.
- C++
Published by bbuchfink almost 6 years ago
diamond - DIAMOND v0.9.33
- Improved performance and sensitivity.
- Increased use of temporary disk space.
- Implemented support for the AVX2 instruction set.
- Fixed a bug on big-endian architectures.
- Fixed bugs for compilers with unsigned char.
- Fixed compiler errors for generic builds.
- Added compatibility of database files between little and big endian architectures.
- Fixed various issues related to
Illegal instructionerrors on macOS. - Added option
--file-buffer-sizeto set the size of the I/O buffers and set the default value to 64 MB.
Edit: fixed portability issue in the attached Linux binary.
WARNING: This version contains a serious bug that can cause incomplete results in blastx mode. Using it is not recommended.
- C++
Published by bbuchfink about 6 years ago
diamond - DIAMOND v0.9.32
- Fixed a bug that would generate an incorrect count of positive scoring letters in all output formats.
- Fixed a compiler error on macOS.
- Fixed an
illegal instructionerror on macOS.
- C++
Published by bbuchfink about 6 years ago
diamond - DIAMOND v0.9.31
- Improved performance.
- Composition based statistics use integer scoring. (This slightly changes all alignment scores.)
- Option
--quietwill suppress startup message. - Added output field
scovhspto print the subject coverage per HSP to the tabular format. - Added option
--culling-overlapto set the minimum overlap with a higher scoring hit for a hit to be deleted and changed the default value from 90% to 50%. - Added command
diamond testto run a series of regression tests. - Fixed an off-by-one error of the query end position in the XML output format.
(Update 2020/06/08) Due to a bug, since this version DAA files are not backward compatible with previous versions when using frameshift alignment (option -F).
- C++
Published by bbuchfink about 6 years ago
diamond - DIAMOND v0.9.30
- Added support for output field
cigarto the tabular format. - Changed the maximum repeat period to 50 for tantan masking.
- Changed the tantan masking to ungapped mode.
- Improved the performance of repeat masking.
- Added output fields
sskingdoms,skingdoms, andsphylumsto print subject super kingdoms, subject kingdoms, and subject phylums.
- C++
Published by bbuchfink over 6 years ago
diamond - DIAMOND v0.9.29
- Fixed a bug that could cause taxonomy features to function incorrectly for databases created by versions 0.9.27 and 0.9.28. Please rebuild databases built with those versions if the
--taxonmapoption was used.
- C++
Published by bbuchfink over 6 years ago
diamond - DIAMOND v0.9.28
- Fixed a bug that could cause alignment score overflows for scores > 65535 in frameshift alignment mode.
- Fixed a clang compiler error.
- C++
Published by bbuchfink over 6 years ago
diamond - DIAMOND v0.9.27
- Improved performance of the seed matching stage.
- Seed frequency counts are computed based on hit seeds. (this slightly reduces performance while improving sensitivity)
- Added option
--taxon-excludeto exclude list of taxon ids from search. - Compiling from source will no longer perform a native build. Instead, a portable binary that contains code paths for multiple architectures will be produced, with dispatch logic that is invoked at runtime.
- C++
Published by bbuchfink over 6 years ago
diamond - DIAMOND v0.9.26
- Fixed a bug that could cause undefined behaviour when using a database file of format version < 2.
- Fixed a compiler error when compiled as generic C++.
- Program will no longer terminate with an error if unlink system call fails.
- Added option
--tantan-minMaskProbto set minimum repeat probability for tantan masking and changed the default value to 0.9. - Added option
--tantan-maxRepeatOffsetto set maximum tandem repeat period to consider and changed the default value to 15. - Added option
--tantan-ungappedto use tantan in ungapped mode and changed the default to gapped mode. - Changed score matrix lambda calculation for tantan masking.
- Reference masking is recomputed during alignment runs.
- C++
Published by bbuchfink over 6 years ago
diamond - DIAMOND v0.9.25
- Added support for the
sscinamesoutput field to print subject scientific names to the tabular output format. - Fixed a compiler error for GCC 8.2.
- Added option
--stop-match-scoreto set the match score of stop codons. - Fixed a bug that caused the
qqualoutput field to not be correctly clipped to the aligned part of the query. - Added output fields
qseq_gappedandsseq_gappedto the tabular format. - Raised compiler requirement to GCC 4.8.
- Fixed a bug that caused the final sequence positions to not be printed in the pairwise format.
- Allow using
--min-scoreinstead of--topfor the LCA computation of the taxonomy output format. - Reduced the number of temporary files.
- Added output field
qstrandto the tabular format. - Database format version changed to 3.
- Fixed a bug in the
--range-cullingmode that could cause undefined behaviour.
- C++
Published by bbuchfink almost 7 years ago
diamond - DIAMOND v0.9.24
- Fixed a compiler error on macOS.
- Added
--headeroption to output header for tabular output format. - The quality string output in tabular format (
qqualfield) is clipped to the aligned part of the query. - Print
*as quality string if quality values are not available in tabular output format. - Added field
full_qqualto print unclipped query quality values to the tabular format. - Added field
full_qseqto print unclipped query sequence to the tabular format. - Added support for using the hyphen character
-to denote the standard input for input file parameters. - Status messages are written to
stderr. - Fixed a bug that could incorrectly report queries as unaligned in the output of the
--unoption. - Added option
--alto write aligned queries to file. - Added options
--alfmtand--unfmtto set the format of the aligned/unaligned query file (supported values:fasta,fastq).
- C++
Published by bbuchfink over 7 years ago
diamond - DIAMOND v0.9.23
- Added shortcut
--long-readsto set suitable parameters for long read alignment:--range-culling(query range-based hit culling),--top 10(locally report hits within 10% of the best alignment score) and-F 15(use frameshift alignment mode). - Fixed a performance issue for very long query sequences. The "longs reads" mode can now efficiently align query sequences that are several megabases in length.
- Added support for using a FASTA file instead of a Diamond database as the
--dbparameter in alignment workflows. Note that this incurs substantial overhead and should not be used for large databases. - Fixed an issue that could cause too high memory usage.
- Added output field
qqualto print query FASTQ quality values to the tabular format. - Changed license to GPL.
- Raised compiler requirement to GCC 4.6.
- Added option to use the DAA output format for
diamond view. - Added CL (command line) and VN (version) fields to the @PG SAM format header line.
- C++
Published by bbuchfink over 7 years ago
diamond - DIAMOND v0.9.22
- Added output field
full_sseqto tabular output format. - Database sequences that exceed the maximum accession length will no longer cause an error.
- Added support for PAF output format.
- Optimized performance of database taxonomy filtering.
- C++
Published by bbuchfink about 8 years ago
diamond - DIAMOND v0.9.21
- Fixed compiler errors on some systems.
- C++
Published by bbuchfink about 8 years ago
diamond - DIAMOND v0.9.20
- Added Bioconda installation instructions to the manual.
- Added official docker release: https://hub.docker.com/r/buchfink/diamond/
- Fixed a bug that could cause corrupted output if compression was activated.
- Fixed an issue that could cause high memory usage by automatic use of the query-indexed algorithm.
- C++
Published by bbuchfink about 8 years ago
diamond - DIAMOND v0.9.19
This release provides the option to conduct filtered searches by taxonomy. Using --taxonlist followed by a comma-separated list of NCBI taxonomy IDs will search only against matching reference sequences. Any taxonomic rank can be used. For example, use --taxonlist 562 to search against all E. coli sequences; use --taxonlist 2 to search against all bacterial sequences.
This feature requires taxonomy data to be built directly into the database. The --taxonmap and --taxonnodes options now need to be provided exclusively to the makedb command if taxonomy features are to be used.
This release changes the database format and requires database rebuilding. To rebuild the database from an existing one, a Unix pipe can be used like this:
/path/to/old/diamond getseq -d dbfile | /path/to/new/diamond makedb -d newdb
- C++
Published by bbuchfink about 8 years ago
diamond - DIAMOND v0.9.18
- Optimized output writing performance.
- Fixed a bug in the XML output format.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.17
- Added option
--range-cullingto restrict hit culling to overlapping query ranges. This feature is designed for long query DNA sequences that may span several genes. In these cases, the default of reporting the 25 best overall hits could cause hits to a lower scoring gene to be overshadowed. But just increasing the number of alignments reported will bloat the output size and reduce performance. Using this feature along with-k 25(default), a hit will only be deleted if at least 50% of its query range is spanned by at least 25 higher or equal scoring hits. Using this feature along with--top 10, a hit will only be deleted if its score is more than 10% lower than that of a higher scoring hit over at least 50% of its query range. The percentage is configurable using--range-cover. Note that this feature is currently only available in frameshift alignment mode. - Fixed a compiler error on FreeBSD.
- Fixed escape sequences in XML output.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.16
- Fixed a bug that caused an error for non-SSSE3 builds.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.15
- Highly improved performance of frameshift alignment mode.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.14
Added support for frameshift alignments (option -F to set the frameshift penalty). For example: diamond blastx -F 15 ...
Enabling this feature will have the aligner tolerate missing bases in DNA sequences and is most recommended for long, error-prone sequences like MinION reads.
In the pairwise output format, frameshifts will be indicated by \ and / for a shift by +1 and -1
nucleotide in the direction of translation respectively.
Note that this feature is disabled by default. This release changes the DAA format. DAA files created by this version are not backward compatible with previous versions.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.13
- Fixed query positions in pairwise format for translated searches.
- Changed default behaviour of
--max-hspsoption to report an unlimited number of HSPs for a single query/subject pair. Previously, only the single best HSP for a query/subject pair was reported. Since several HSPs are not uncommon for multidomain proteins and may contain valuable information, the default behaviour has been changed to report any HSP if its query and subject ranges are not enveloped by a higher scoring HSP.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.12
- Fixed dbinfo command to be able to read older database formats.
- Adjusted XML format for better compatibility with Blast2Go.
- Fixed a potential error when running multiple instances of Diamond.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.11
- Added option
--xml-blord-formatfor alternative-style XML format. - Fixed a bug that could cause a crash when writing compressed output files.
- C++
Published by bbuchfink over 8 years ago
diamond - DIAMOND v0.9.10
- Added
--strandoption to choose query strand for translated searches. - Added
dbinfocommand to show information about a database file.
- C++
Published by bbuchfink almost 9 years ago
diamond - DIAMOND v0.9.9
- Added taxonomic classification format.
- Fixed a bug in getseq printing masked residues.
- Fixed parsing of UniRef100_ sequence id prefixes.
- Added support for using the staxids output field in diamond view.
The taxonomic classification format is a new output format that does not print alignments, but only a taxonomic classification for each read using the LCA algorithm.
- C++
Published by bbuchfink almost 9 years ago
diamond - DIAMOND v0.9.7
- Fixed compiler errors.
- Changed XML format for better compatibility with Blast2Go.
- C++
Published by bbuchfink almost 9 years ago
diamond - DIAMOND v0.9.5
- Added support for named pipes.
- Added support for reading input files from stdin.
- Added more elaborate file I/O error messages.
- C++
Published by bbuchfink almost 9 years ago
diamond - DIAMOND v0.9.4
- Improved performance.
- Fixed a bug in the query-indexed algorithm.
- Empty sequences are ignored instead of generating an error.
- C++
Published by bbuchfink almost 9 years ago
diamond - DIAMOND v0.9.3
- Fixed a bug that could cause hanging.
- Fixed a bug that could cause an error when using the staxids output field and the --unal option.
- C++
Published by bbuchfink about 9 years ago
diamond - DIAMOND v0.9.2
- Fixed a compiler error.
- Improved performance for very small query files.
- C++
Published by bbuchfink about 9 years ago
diamond - DIAMOND v0.9.0
- improved performance
- improved support for alignments with long gaps
- removed SEG masking
- added low complexity masking using tantan
- changed license to AGPL
Note that this release requires database rebuilding.
- C++
Published by bbuchfink about 9 years ago
diamond - DIAMOND v0.8.38
- Fixed unclear std::exception error messages.
- Fixed sequence titles in XML format to be compatible with blast2go.
- XML and pairwise format contain full length titles by default.
- C++
Published by bbuchfink about 9 years ago
diamond - DIAMOND v0.8.37
- Added support for the staxids field to the tabular format, allowing to generate a list of NCBI taxonomy IDs associated with the aligned subject sequence. A description of how to use the option is contained in the manual.
- Fixed a bug that would cause an error message for empty DAA files.
- All scoring matrices use the respective default gap penalties from BLAST.
- Added check for SSSE3 instruction set.
- Added diamond-sse2 to the binary package.
- C++
Published by bbuchfink about 9 years ago
diamond - DIAMOND v0.8.35
Fixed a compiler error on 32 bit systems.
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.33
- modified option
--no-self-hitsto also require matching sequence titles for filtering of a self hit - fixed a bug that could cause a crash in the joining output blocks stage
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.32
- improved speed and sensitivity
- fixed an issue that could cause too high memory usage in certain cases
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.31
- Added compositional score adjustments (option --comp-based-stats (0,1)). This is a useful feature for filtering false positive hits and is enabled by default.
- Removed --single-domain option and replaced by --max-hsps, to set the maximum number of Hsps per query/subject pair, to be consistent with BLAST. This option is set to 1 by default as getting more than 1 Hsp per subject seems to be confusing users.
- Added option --no-self-hits to filter identical self-hits.
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.30
- slightly improved sensitivity
- added option to report unaligned queries:
--unal(0=no, 1=yes) - pairwise, XML and SAM format will report unaligned queries by default
- added option to filter alignments by subject cover (
--subject-cover)
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.29
- fixed an issue that could cause a crash when using view on incomplete DAA files
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.28
- slightly improved sensitivity
- added support for the BLAST pairwise format (option -f 0)
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.27
Added support for gzip compressed files containing multiple gzip streams.
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.26
- The program now compiles as generic C++ code and thus can be used on hardware platforms other than the Intel/AMD x86-64.
- Added option to write unaligned queries to file (--un).
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.25
- fixed a bug with the qseq field in the blast tabular format
- added qtitle and btop fields to the blast tabular format
- fixed a bug that could cause a crash when passing a nonexistant input file
- fixed an issue that could cause unexpectedly long runtimes in certain cases
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.24
Added output of line numbers for errors that occur while reading the input file.
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.23
Added option to change the genetic code used for translation of query in blastx mode (option --query-gencode, see here for a list of possible values: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi).
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.22
Fixed the XML format to fill in the Hitid and Hitaccession fields.
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.21
Custom scoring matrices now also work when using the DAA format.
- C++
Published by bbuchfink over 9 years ago
diamond - DIAMOND v0.8.20
Added support for customizing the tabular output format. The format may now be specified similarly to BLAST:
diamond --outfmt 6 qseqid sseqid bitscore
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.19
This release adds support for custom scoring matrices. Syntax for using a custom matrix file:
diamond blastp --custom-matrix custom.mat --lambda 0.3 --K 0.1 --gapopen 10 --gapextend 2
Note that the statistical parameters Lambda and K have to be provided for your matrix. One way to compute these values is to use the FASTA suite. The file format of the matrix file also follows the format used by the FASTA suite.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.18
Added a new command getseq that can be used to retrieve sequences from database files. Usage:
Print whole database in FASTA format: diamond getseq -d db
Print 3rd and 7th sequence in FASTA format: diamond getseq -d db --seq 3 7
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.17
Added a new alignment mode (option --more-sensitive) that provides some additional sensitivity compared to the sensitive mode.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.16
With a new load balancing implementation, this release improves the performance particularly on systems with many cores and for longer sequences.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.15
Fixed a crash and a compiler error on some systems.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.14
Fixed a memory leak. Updating is recommended.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.13
Fixed a compiler error on some older systems.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.12
This release introduces a new database format that now allows to set the block size parameter for the alignment commands instead of the makedb command. This way the program's memory usage can be controlled more dynamically at runtime without database rebuilding.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.11
Fixed a compiler error on some systems.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.10
DIAMOND can now directly generate all of its output formats again - no need for the intermediate DAA file. Of course, the DAA format is still supported and should be used if you want to import the results into MEGAN.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.9
This release comes with a substantial increase in performance and also improved accuracy of local alignments.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.8
Fixed a compiler error on MAC.
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.7
By popular demand, DIAMOND now supports the BLAST XML output format (command line option -f xml). As these files can become very large, it is recommended to also activate compression of output files (option --compress 1).
- C++
Published by bbuchfink almost 10 years ago
diamond - DIAMOND v0.8.6
Fixed a problem of the Windows version that could cause errors for larger files & fixed a problem that could cause very long runtimes for certain highly repetitive sequences.
- C++
Published by bbuchfink almost 10 years ago