Releases | Open Source Science

gtdbtk - 2.5.0

Bug Fixes:

(#644 , #641) Fixed compatibility with recent versions of NumPy (≥1.24), which removed the tostring() method from numpy.ndarray.

Minor Changes: * (#650) Update CLI with an up-to-date taxon.

Major Changes:

GTDB-Tk now uses Skani exclusively for genome clustering, replacing the previous Mash/Skani hybrid approach. This change simplifies the CLI and removes the dependency on Mash, streamlining installation and execution.

- Python
Published by pchaumeil 6 months ago

gtdbtk - 2.4.1

Bug Fixes:

(#630) Fixed SyntaxWarning in Python 3.12 by using raw strings for regex in HMMResultsIO.py

Minor Changes:

(#631) gtdb_to_ncbi_majority_vote.py script has been included as part of the release

The GTDB-Tk version has been bumped to synchronise its release with GTDB R226.

- Python
Published by pchaumeil 10 months ago

gtdbtk - 2.4.0

Bug Fixes:

(#576) When all genomes fail the prodigal step in the classify_wf, The bac120 summary file is still produced with the all failed genomes listed as 'Unclassified'
(#573) When running the 3 classify steps independently, a genome can be filtered out in the align step but still be classified in the identify step. To avoid duplication of row, the genome is classified with a warning.
(#540 ) Empty files are skipped during the sketch step of Mash, they are then catched in the prodigal step and are returned as 'Unclassified'
(#549) : --force has been modified to deal with #540. Prodigal wasn't returning the empty files as failed genomes, it was only skipping them. These genomes are now returned in the summary file and flagged as Unclassified.

Major Changes:

FastANI has been replaced by skani as the primary tool for computing Average Nucleotide Identity (ANI).Users may notice slight variations in the results compared to those obtained using FastANI.
In the generated summary.tsv files, several columns have been renamed for clarity and consistency. The following columns have been affected:
- "fastani_reference" column has been renamed to "closest_genome_reference".
- "fastani_reference_radius" column has been renamed to "closest_genome_reference_radius".
- "fastani_taxonomy" column has been renamed to "closest_genome_taxonomy".
- "fastani_ani" column has been renamed to "closest_genome_ani".
- "fastani_af" column has been renamed to "closest_genome_af".

These changes have been implemented to improve the readability and understanding of the data within the summary.tsv files. Users should update their scripts or processes accordingly to reflect these renamed column headers.

- Python
Published by pchaumeil almost 2 years ago

gtdbtk - 2.3.2

Bug Fixes:

(#528) (#529) setup.py has been modified to restrict pydantic version to >=1.9.2 and < 2.0a1

Minor Changes:

(#526) change captures the Mash stderr in a separate buffer ( Thanks @wasade for your contribution)

- Python
Published by pchaumeil over 2 years ago

gtdbtk - 2.3.1

-- Disregard this release

- Python
Published by pchaumeil over 2 years ago

gtdbtk - 2.3.0

Bug Fixes:

(#508) (#509) If ALL genomes for a specific domain are either filtered out or classified with ANI they are now reported in the summary file.

Minor changes:

(#491) (#498) Allow GTDB-Tk to show --help and -v without GTDBTK_DATA_PATH being set.
- WARNING: This is a breaking change if you are importing GTDB-Tk as a library and importing values from gtdbtk.config.config, instead you need to import as from gtdbtk.config.common import CONFIG then access values via CONFIG.<var>
(#508) Mash distance is changed from 0.1 to 0.15 . This is will increase the number of FastANI comparisons but will cover cases wheere genomes have a larger Mash distance but a small ANI.
(#497) Add a convert_to_species function is GTDB-Tk to replace GCA/GCF ids with their GTDB species name
Add --db_version flag to check_install to check the version of previous GTDB-Tk packages.

- Python
Published by pchaumeil almost 3 years ago

gtdbtk - 2.2.6

2.2.6

Bug Fixes:

(#493) Fix issue with --full-tree flag (related to skipping ANI steps)

Minor changes:

Change URL for documentation to 'https://ecogenomics.github.io/GTDBTk/installing/index.html'
Improve portability of the ANIscreen step by regenerating the paths of reference genomes in the current filesystem for mashdb.msh

- Python
Published by pchaumeil almost 3 years ago

gtdbtk - 2.2.5

2.2.5

Bug Fixes: * gtdbtk.json is now reset when the pipeline is re run and the status of ani_screen is not 'complete'

Minor changes: * When using --genes , ANI steps are skipped and warnings are raised to the user to inform them that classification is less accurate. * (#486) Environment variables can be used in GTDBTKDATAPATH * is_consistent function in mash.py compares only the filenames, not the full paths * Add cutoff arguments to PfamScan ( Thanks @AroneyS for the contribution)

- Python
Published by pchaumeil almost 3 years ago

gtdbtk - 2.2.4

Bug Fixes: * (#475) If all genomes are classified using ANI, Tk will skip the identify step and align steps

Minor changes: * Add hidden '--skippplacer' flag to skip pplacer step ( useful for debugging) * Improve documentation * Convert stagelogger to a Singleton class * Use existing ANI results if available

- Python
Published by pchaumeil almost 3 years ago

gtdbtk - 2.2.3

2.2.3

Bug Fixes:

Fix prodigalfailcounter issue

- Python
Published by pchaumeil about 3 years ago

gtdbtk - 2.2.2

Bug Fix:

(#471): Fix pplacer issue

- Python
Published by pchaumeil about 3 years ago

gtdbtk - 2.2.1

(#470) Add missing Pydantic dependency.

- Python
Published by aaronmussig about 3 years ago

gtdbtk - 2.2.0

2.2.0

Minor changes:

(#433) Added additional checks to ensure that the --outgroup_taxon cannot be set to a domain (root, de_novo_wf).
(#459/ #462 ) Fix deprecated np.bool in prodigal_biolib.py. Special thanks to @neoformit for his contribution.
(#466 ) RED value has been rounded to 5 decimals after the comma.
(#451 ) Extra checks have been added when Prodigal fails.
(#448) Warning has been added when all the genomes are filtered out and not classified.

Bug Fixes:

(#420 ) Fixed an issue where GTDB-Tk might hang when classifying TIGRFAM markers (identify, classify_wf, de_novo_wf). Special thanks to @lfenske-93 and @sjaenick for their contribution.
(#428) Fixed an issue where the --gtdbtk_classification_file would raise an error trying to read the classify summary (root, de_novo_wf).
(#439) Fix the pipeline when using protein files instead of nucleotide files. symlink uses absolute path instead.

- Python
Published by pchaumeil about 3 years ago

gtdbtk - 2.1.1

(#399) Fix --genes options
(#400) Modify config.py file to resolve this issue
Updated documentation ( including #410 , documentation for itol)

- Python
Published by pchaumeil over 3 years ago

gtdbtk - 2.1.0

Major changes:

GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see #383).
Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
Genomes filtered out during the alignment step are now reported in the gtdbtk.bac120.summary.tsv or gtdbtk.ar53.summary.tsv as 'Unclassified Bacteria/Archaea'
--write_single_copy_genes flag in now available in the classify_wf and de_novo_wf workflows.

Features:

(#392) --write_single_copy_genes flag available in workflows.
(#387) specific memory requirements set in classify_wf depending on the classification approach.

Important

This version is not backwards compatible with GTDB package R207 v1. This version requires a new reference package

- Python
Published by pchaumeil almost 4 years ago

gtdbtk - 2.0.0

Major changes: * GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 35 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag. * Archaeal classification now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by Dombrowski et al., 2020. This set of archaeal marker genes is now used by GTDB for curating the archaeal taxonomy. * By default, all directories containing intermediate results are now removed by default at the end of the classify_wf and de_novo_wf pipelines. If you wish to retain these intermediates files use the --keep-intermediates flag. * All MSA files produced by the align step are now compressed with gzip. * The classification summary and failed genomes files are now the only files linked in the root directory of classify_wf.

Features: * convert_to_itol to convert trees into iTOL format (#373) * Output FASTA files are compressed by default (#369) * Intermediate files will be removed by default when using classify/de-novo workflows unless specified by --keep_intermediates (#369) * Add --genes flag for Error (#362) * A warning will be displayed if pplacer fails to place a genome (#360 / #356)

Important * This version is not backwards compatible with GTDB release 202. * This version requires a new reference package

- Python
Published by aaronmussig almost 4 years ago

gtdbtk - 1.7.0

(#336) Warn the user if they have provided an incorrectly formatted taxonomy file.
(#348) Gracefully exit the program if no single copy hits could be identified.
(#351) Fixed an issue where GTDB-Tk would crash if spaces were present in the reference data path.
(#354) Added optional --tmpdir argument to set temporary directory (thanks @tr11-sanger ).

- Python
Published by aaronmussig over 4 years ago

gtdbtk - 1.6.0

(#337) Set minimum tqdm version to 4.35.0
(#335) Fixed typo in output log messages (@fplaza)
Removed the option to re-calculate RED values (–recalculate_red)

- Python
Published by aaronmussig over 4 years ago

gtdbtk - 1.5.1

Changelog: * #327 Disallow spaces in genome names/file paths due to downstream application issues. * #326 Disallow genome names that are blank.

- Python
Published by aaronmussig over 4 years ago

gtdbtk - 1.5.0

Changes: * Updated to use PFAM 33.1 markers. * Updated to use GTDB R202 taxonomy (note, this will require an update to the reference package https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data)

Fixes: * Automatic drop of genome leads to error in downstream modules of classifywf (#312) * --scratchdir not working in v 1.4.1 (#311)

- Python
Published by aaronmussig almost 5 years ago

gtdbtk - 1.4.1

Updated GitHub CI/CD to trigger docker build / tag version on release.
(#255) (#297) Fixed 'Namespace' object has no attribute errors by adding default arguments to argparse.

- Python
Published by aaronmussig about 5 years ago

gtdbtk - 1.4.0

Check if stdout is being piped to a file before adding colour.
(#283) Significantly improved classify performance (noticeable when running trees > 1,000 taxa).
Automatically cap pplacer CPUs to 64 unless specifying --pplacer_cpus to prevent pplacer from hanging.
(#262) Added --write_single_copy_genes to the identify command. Writes unaligned single-copy AR122/BAC120 marker genes to disk.
When running -version warn if GTDB-Tk is not running the most up-to-date version (disable via GTDBTK_VER_CHECK = False in config.py). If GTDB-Tk encounters an error it will silently continue (3 second timeout).
(#276) Renamed the column aa_percent to msa_percent in summary.tsv (produced by classify).
(#286) Fixed a file not found error when the reference data is a symbolic link (thanks davidealbanese!).
(#277) Fixed an issue where if the user overrides the translation table using the optional 3rd column in the batchfile, the other coding density would appear as -100. Both translation table densities are now reported.
The check_install command now also checks that all third party binaries can be found on the system path.
The align step is now approximately 10x faster.
(#289) Added --min_af to classify and classify_wf which allows the user to specify the minimum alignment fraction for FastANI.
Added the --mash_db command to re-use the GTDB-Tk Mash reference database in ani_rep.

- Python
Published by aaronmussig about 5 years ago

gtdbtk - 1.3.0

This version of GTDB-Tk requires a new version of the GTDB-Tk reference package (gtdbtkr95data.tar.gz) available here.

Features: * Updated reference package to use the GTDB Release 95 taxonomy. * Report if the species-specific ANI circumscription criteria is satisfied in the aniclosest.tsv file output by anirep. * Estimated time until completion has been dampened.

- Python
Published by aaronmussig over 5 years ago

gtdbtk - 1.2.0

Bug fixes: * (#241) Moved GTDB-Tk entry point to main.py instead of bin/gtdbtk to support execution in some HPC systems (gtdbtk will still be aliased on install). * (#251) Allow parsing of FastANI v1.0 output files. However, a warning will be displayed to update FastANI. * (#254) Fixed an issue where --scratch_dir would fail, and not clean-up the mmap file.

Features: * (#242) Added the decorate command allowing the de novo workflow to be run * (#244) Added the infer_rank method which established the taxonomic ranks of internal nodes of user trees based on RED * (#248) If the identify command is run on the same directory, genomes which were already processed will be skipped. * (#248) Improved pplacer output with running the classify command

- Python
Published by aaronmussig over 5 years ago

gtdbtk - 1.1.1

Fix a issue with Tree parsing

- Python
Published by pchaumeil almost 6 years ago

gtdbtk - 1.1.0

Bug fixes: * In rare cases pplacer would assign an empty taxonomy string which would raise an error. * (#229) Genomes using windows line carriage \r\n would raise an error. * (#227) CentOS machines would fail when using ~ in paths. * The bac120 symlink was pointing to the archaeal tree when using the root command.
- Features:
  - Updated the gtdb_to_ncbi_majority_vote.py script for translating taxonomy.
  - (#195) Added the --pplacer_cpus argument to specify the number of pplacer threads when running classify and classify_wf (#195).
  - (#198) The --debug flag of align outputs aligned markers to disk before trimming.
  - (#225) An optional third column in the --batchfile will specify an override to which translation table should be used. Leave blank to automatically determine the translation table (default).
  - (#131) Users can now specify genomes which have NCBI accessions, as long as they are not GTDB-Tk representatives (a warning will be raised).
  - (#191) Added a new command ani_rep which calculates the ANI of input genomes to all GTDB representative genomes.
    - This command uses Mash in a pre-filtering step. If pre-filtering is enabled (default) then mash will need to be on the system path. To disable pre-filtering use the --no_mash flag.
  - (#230) Improved how markers are used in determining the correct domain, and gene selection for the alignment.

- Python
Published by aaronmussig almost 6 years ago

gtdbtk - 1.0.2

Fixed an issue where FastANI threads would timeout with FastANI returned a non-zero exit code.
- Versions affected: 1.0.0, and 1.0.1.

- Python
Published by aaronmussig about 6 years ago

gtdbtk - 1.0.1

Bugfix for 3rd party software versions.

- Python
Published by pchaumeil about 6 years ago

gtdbtk - 1.0.0

Migrated to Python 3, you must be running at least Python 3.6 or later to use this version.
check_install now does an exhaustive check of the reference data.
Resolved an issue where gene calling would fail for low quality genomes (#192).
Improved FastANI multiprocessing performance.
Third party software versions are reported where possible.

- Python
Published by aaronmussig about 6 years ago

gtdbtk - 0.3.3

A bug has been fixed which affected classify and classify_wf when using the --batchfile argument with genome IDs that differed from the FASTA filename. This issue resulted in the assigned taxonomy being derived only from tree placement without any ANI calculations being considered. Consequently, in some cases genomes may have been classified as a new species within a genus when they should have been assigned to an existing species. If you have genomes with species assignments this bug did not impact you.
Progress is now displayed for: hmmalign, and pplacer.
Fixed an issue where the root command could not be run independently.
Improved MSA masking performance.

- Python
Published by aaronmussig over 6 years ago

gtdbtk - 0.3.2

FastANI calculations are more robust.
Optimisation of RED calculations.
Improved output messages when errors are encountered.

- Python
Published by aaronmussig over 6 years ago

gtdbtk - 0.3.1

Pplacer taxonomy is now available in the summary file.
FastANI species assignment will be selected over phylogenetic placement (Topology case).

- Python
Published by pchaumeil over 6 years ago

gtdbtk - 0.3.0

GTDB-Tk v0.3.0 has been released (we recommend all users update to this version):
- Best translation table displayed in summary file.
- GTDB-Tk now supports gzipped genomes as inputs (--extension .gz).
- By default, GTDB-Tk uses precalculated RED values.
- New option to recalculate RED value during classify step (--recalculate_red).
- New option to export the untrimmed reference MSA files.
- New option to skip_trimming during align step.
- New option to use a custom taxonomy file when rooting a tree.
- New FAQ page available.
- New output structure.
- This version requires a new version of the GTDB-Tk data package (gtdbtkr89data.tar.gz) available here

- Python
Published by aaronmussig over 6 years ago

gtdbtk - 0.2.2

Fix scratch_dir (--mmap-file) option.

- Python
Published by pchaumeil almost 7 years ago

gtdbtk - 0.2.1

GTDB-Tk v0.2.1 has been released (we recommend all users update to this version):
- Species classification is now based strictly on the ANI to reference genomes
- The "classify" function now reports the closest reference genome in the summary file even if the ANI is <95%
- The summary.tsv file has 4 new columns: aapercent, redvalues, fastanireferenceradius, and warnings
- By default, the "align" function now performs the same MSA trimming used by the GTDB
- New pplacer support for writing to a scratch file (--mmap-file option)
- Random seed option for MSA trimming has been added to allow for reproducible results
- Configuration of the data directory is now set using the environmental variable GTDBTKDATAPATH (see pip installation)
- Perl dependencies has been removed
- Python libraries biolib, mpld3 and jinja have been removed
- This version requires a new version of the GTDB-Tk data package (gtdbtk.r86v2data.tar.gz) available here

- Python
Published by pchaumeil almost 7 years ago

gtdbtk - 0.1.6

align step in classifywf and denovo_wf function has been fixed.
improve summary file output.
"align" function now supports the same custom trimming GTDB will be performing.
returns closest reference genome to summary file (even if the ANI is less than 95%)
bug fixing

- Python
Published by pchaumeil about 7 years ago

gtdbtk - 0.1.3

v0.1.3 resolves bug that would occur when a user genome has a FastANI >= 95% with reference genomes but not with the closest pplacer leaf node.

- Python
Published by pchaumeil over 7 years ago

gtdbtk - 0.1.2

Release 0.1.2 addresses a bug that would occasionally cause genomes to not be correctly associated with a reference genome in the pplacer tree. FastANI was still identifying correct species assignments.

- Python
Published by pchaumeil over 7 years ago

gtdbtk - 0.1.1

config_template.py updated
rooting of the tree is now fixed.

- Python
Published by pchaumeil over 7 years ago

gtdbtk - 0.1.0

GTDB-Tk is now using archived (.gz) fna files.
Optimised for R86 version
summary.tsv file is now the main output file.
fastani.tsv file is now combined with summary.tsv.
red_value.tsv file has been removed.
Each Pplacer placement on a species branch is now verify by FastANI and the ANI is compared with all other species in the same genus to check Pplacer accuracy.
New functionality: "trim_msa" allows to trim an untrimmed MSA (41155AA for bac120 and 32675AA for ar122) based on GTDB-Tk masks

- Python
Published by pchaumeil over 7 years ago

gtdbtk - 0.0.6

Migration to R83 and integration of UBA genomes
add "debug" flag to classify options
error handling improvement

- Python
Published by pchaumeil almost 8 years ago

gtdbtk - 0.0.5

Small bug fix

- Python
Published by pchaumeil almost 8 years ago

gtdbtk - 0.0.4b1

fastANI Bug fixing

- Python
Published by pchaumeil almost 8 years ago

gtdbtk - 0.0.4-beta

First Beta version of GTDB-Tk

- Python
Published by pchaumeil almost 8 years ago

Recent Releases of gtdbtk

gtdbtk - 2.5.0

gtdbtk - 2.4.1

gtdbtk - 2.4.0

gtdbtk - 2.3.2

gtdbtk - 2.3.1

gtdbtk - 2.3.0

gtdbtk - 2.2.6

2.2.6

gtdbtk - 2.2.5

2.2.5

gtdbtk - 2.2.4

gtdbtk - 2.2.3

2.2.3

gtdbtk - 2.2.2

gtdbtk - 2.2.1

gtdbtk - 2.2.0

2.2.0

gtdbtk - 2.1.1

gtdbtk - 2.1.0

gtdbtk - 2.0.0

gtdbtk - 1.7.0

gtdbtk - 1.6.0

gtdbtk - 1.5.1

gtdbtk - 1.5.0

gtdbtk - 1.4.1

gtdbtk - 1.4.0

gtdbtk - 1.3.0

gtdbtk - 1.2.0

gtdbtk - 1.1.1

gtdbtk - 1.1.0

gtdbtk - 1.0.2

gtdbtk - 1.0.1

gtdbtk - 1.0.0

gtdbtk - 0.3.3

gtdbtk - 0.3.2

gtdbtk - 0.3.1

gtdbtk - 0.3.0

gtdbtk - 0.2.2

gtdbtk - 0.2.1

gtdbtk - 0.1.6

gtdbtk - 0.1.3

gtdbtk - 0.1.2

gtdbtk - 0.1.1

gtdbtk - 0.1.0

gtdbtk - 0.0.6

gtdbtk - 0.0.5

gtdbtk - 0.0.4b1

gtdbtk - 0.0.4-beta