Recent Releases of skder

skder - v1.3.3

  • Introduce new 'lowmemgreedy' dereplication mode in skder to efficiently handle really large datasets.
  • Fix issues with automated downloading of genomes from NCBI due to updating listings and add catch for bad files.
  • Switch cidder's skani-based secondary clustering from using skani triangle to skani dist.

Full Changelog: https://github.com/raufs/skDER/compare/v1.3.2...v1.3.3

- Python
Published by raufs 8 months ago

skder - v1.3.2

  • Fix: Correct default value of --skani-triangle-parameters skDER which was outdated. This parameter was recently updated to request a screening parameter of X%, where X = ANI threshold - 10%, by default. It was however being set as -s 90.0 by default my mistake (the previous default) - this is now corrected.

What's Changed

  • Update to v1.3.2 by @raufs in https://github.com/raufs/skDER/pull/10

New Contributors

  • @raufs made their first contribution in https://github.com/raufs/skDER/pull/10

Full Changelog: https://github.com/raufs/skDER/compare/v1.3.1...v1.3.2

- Python
Published by raufs 9 months ago

skder - v1.3.1

Minor fix: Update argument descriptions for inputs to clarify that genome/proteome files input for CiDDER need to be uncompressed unlike for skDER where gzipped files are allowed. Also added a check for this in case users provide compressed inputs.

Full Changelog: https://github.com/raufs/skDER/compare/v1.3.0...v1.3.1

- Python
Published by raufs 10 months ago

skder - v1.3.0

  • Restructure code and introduce new modules to simplify the main programs of skder and cidder.
  • Incorporate faster way to download genomes from NCBI belonging to a single genus/species of interest in GTDB.
  • Incorporate the latest GTDB release R226.
  • Add new option to cidder to select additional representative genomes if X% of non-representative genomes are not contained by an individual representative genome. This is performed as a secondary step after the primary representative selection method.
  • Change the default ANI cutoff from 99.0% to 99.5% identity in skder to reflect thresholds coinciding with sequence type designations as reported by Rodriguez-R et al. 2023.
  • Lowered the default AF* threshold from 90% to 50% to reflect perspective/insight shared in the dRep documentation.
  • Allow proteomes to be provided as inputs for CiDDER.

* = initially wrote ANI by mistake.

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.9...v1.3.0

- Python
Published by raufs 10 months ago

skder - v1.2.9

Minor updates

  • make overwriting the output directory more safe and introduce user prompt.
  • strip away quotes in string arguments with spaces in case they are not processed properly. (related to https://github.com/raufs/skDER/issues/8).

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.8...v1.2.9

- Python
Published by raufs about 1 year ago

skder - v1.2.8

  • Minor changes: update granet help function & graph creation to have representatives listed last and thus their nodes be shown on top and not hidden underneath non-representative genomes in really large graphs.

- Python
Published by raufs over 1 year ago

skder - v1.2.7

  • Update granet to make it deterministic and also introduce the --random-seed option to allow changing the layout if desired.
  • Fix indentation issues in help function and slight updates to logging and messages in CiDDER and skDER.
  • Create CiDDER_Results.txt file in CiDDER to capture the order in which representative genomes are selected.

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.6...v1.2.7

- Python
Published by raufs over 1 year ago

skder - v1.2.6

  • Make "greedy" mode the default algorithm in skder
  • Correct stumbling on gzipped files with new method to calculate N50s introduced in v1.2.4.
  • Introduce granet - for creating network visuals of genomes and where representative genomes selected fall.
  • Update help functions of skder and cidder to include citation notice for skani and CD-HIT, respectively.

What's Changed

  • Updated Docker folder by @Lolli-AK in https://github.com/raufs/skDER/pull/7

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.5...v1.2.6

- Python
Published by raufs over 1 year ago

skder - v1.2.5

  • In skDER, set the default value of -p - controlling additional arguments to pass to skani triangle - from nothing () to -s 90.0 to increase the screening parameter's value to 90.0 from the default value of 80.0.
  • Add support for providing directory paths, as well as files, to the -g/--genomes argument where the directories contain genome files to (also) include in skder/cidder analyses.
  • Begin development of Docker-based installation support - including convenience bash wrapper (still progress).

What's Changed

  • Docker files by @Lolli-AK in https://github.com/raufs/skDER/pull/6

New Contributors

  • @Lolli-AK made their first contribution in https://github.com/raufs/skDER/pull/6

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.4...v1.2.5

- Python
Published by raufs over 1 year ago

skder - v1.2.4

  • Introduce mgecut - a program that can use PhiSpy or geNomad to predict MGEs in genomes and filter them out prior to genomic dereplication.
  • Integrate mgecut usage into skder and cidder .
  • Switch to simple python implementation of N50 adapted from: https://gist.github.com/dinovski/2bcdcc770d5388c6fcc8a656e5dbe53c instead of using pyfastx.

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.3...v1.2.4

- Python
Published by raufs over 1 year ago

skder - v1.2.3

  • update and further polish arguments (e.g. underscores to dashs).
  • add options for secondary clustering for cidder using protein cluster containment* and/or skani (https://github.com/raufs/skDER/issues/5).

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.2...v1.2.3

- Python
Published by raufs over 1 year ago

skder - v1.2.2

  • Add memory option for CD-HIT usage in cidder and set default to unlimited.
  • Cosmetic changes to code / comments / help-function

Full Changelog: https://github.com/raufs/skDER/compare/v1.2.1...v1.2.2

- Python
Published by raufs over 1 year ago

skder - v1.2.1

  • Correct for default parameter settings in CiDDER
  • Add/update expected test results image

- Python
Published by raufs over 1 year ago

skder - v1.2.0

  • Introduce CiDDER - a CD-HIT based dereplication program to ensure you properly sample the pangenome space of a species/genus.
  • Introduce new option to skDER to test a bunch of cutoffs for ANI and AF and generate a heatmap on the number of representative genomes that results from different combinations. E.g. useful if you want to limit your analysis to X genomes but don't know what ANI/AF cutoffs to use.

Full Changelog: https://github.com/raufs/skDER/compare/v1.1.1...v1.2.0

- Python
Published by raufs over 1 year ago

skder - v1.1.1

  • add creation of "COMPLETED.txt" at the very end of skDER for incorporation in workflows.
  • add option to build indices locally when computing N50s instead of in the directory of the input genome

- Python
Published by raufs almost 2 years ago

skder - v1.1.0

  • Introduce ability to specify GTDB release and update to using GTDB R220 as default for when users request to auto-download and include all genomes from a particular genus/species.
  • Remove need for symlinking genomes locally, instead fastx index files are now written in the same folder as the input genomes and deleted afterwards.
  • Parallelization when computing N50 is done by splitting up number of genomes by the number of CPUs allocated and thus writing to at most X number of files at a time, where X is the number of CPUs. This is to address: https://github.com/raufs/skDER/issues/4

- Python
Published by raufs almost 2 years ago

skder - v1.0.10

  • Minor change, added new argument to use https://ftp.ncbi.nlm.nih.gov/genomes instead of https://ftp.ncbi.nih.gov/genomes in case there are issues with connecting to the latter. This gets passed to ncbi-genome-download's -u argument.

Full Changelog: https://github.com/raufs/skDER/compare/v1.0.9...v1.0.10

- Python
Published by raufs almost 2 years ago

skder - v1.0.9

  • Support for gzipped files added (https://github.com/raufs/skDER/issues/4)
  • GTDB/NCBI downloaded genomes are now kept in gzip form
  • FASTA files ending in *.fas now allowed (https://github.com/raufs/skDER/issues/4)
  • If local input genomes are provided, default behavior is now to symlink files in the skDER results directory and do indexing for N50 calculation there.
  • FASTA confirmation now optional (might paralelize in the future and turn back on as default - but currently iterative) - it can take a while if there are a lot of files.

Full Changelog: https://github.com/raufs/skDER/compare/v1.0.8...v1.0.9

- Python
Published by raufs almost 2 years ago

skder - v1.0.8

  • Fix broken GTDB-based downloading feature.
  • Polish names for genomic assemblies downloaded based on GTDB species names.

Full Changelog: https://github.com/raufs/skDER/compare/v1.0.7...v1.0.8

- Python
Published by raufs over 2 years ago

skder - v1.0.7

  • Corrected faulty usage of the -s option in skani triangle and now set it to the default value. This should now result in the more accurate ANI estimates being used for the dereplication methods as intended.
  • Updated stats and runtime info for running dynamic/greedy approaches on the Wiki.
  • Added new secondary clustering option, -n which will report the relation/distance of all genomes in the input set to their nearest representative genome.

Full Changelog: https://github.com/raufs/skDER/compare/v1.0.6...v1.0.7

- Python
Published by raufs over 2 years ago

skder - v1.0.6

  • Mostly just updates to the README & help function.
  • Added missing library import statements in util.py

Full Changelog: https://github.com/raufs/skDER/compare/v1.0.5...v1.0.6

- Python
Published by raufs over 2 years ago

skder - v1.0.5

  • update packaging of program + installation guide

- Python
Published by raufs over 2 years ago

skder - v1.0.4

  • fix SKDER_PATH now that skder moved to bin/

- Python
Published by raufs over 2 years ago

skder - v1.0.2

updates for v.1.0.2

  • KEY: Correct overflow issue in C++ code related to integer multiplication in computing scores for dynamic dereplication approach
  • Introduce a greedy set cover dereplication approach as an alternate method
  • Improve code + documentation organization
  • Add test case

Full Changelog: https://github.com/raufs/skDER/compare/v1.0.1...v1.0.2

- Python
Published by raufs over 2 years ago

skder - v1.0.1

  • Create directory with representative genomes in the output directory.
  • Add version flag , change input for --genomes argument from accepting a directory to multiple paths to genome files.
  • Update Enterococcus dereplication showcasing.

Full Changelog: https://github.com/raufs/skDER/compare/v1.0...v1.0.1

- Python
Published by raufs over 2 years ago

skder - v.1.0

First release of skDER.

- Python
Published by raufs over 2 years ago