Recent Releases of skder
skder - v1.3.3
- Introduce new 'lowmemgreedy' dereplication mode in skder to efficiently handle really large datasets.
- Fix issues with automated downloading of genomes from NCBI due to updating listings and add catch for bad files.
- Switch cidder's skani-based secondary clustering from using
skani triangletoskani dist.
Full Changelog: https://github.com/raufs/skDER/compare/v1.3.2...v1.3.3
- Python
Published by raufs 8 months ago
skder - v1.3.2
- Fix: Correct default value of
--skani-triangle-parametersskDER which was outdated. This parameter was recently updated to request a screening parameter of X%, whereX = ANI threshold - 10%, by default. It was however being set as-s 90.0by default my mistake (the previous default) - this is now corrected.
What's Changed
- Update to v1.3.2 by @raufs in https://github.com/raufs/skDER/pull/10
New Contributors
- @raufs made their first contribution in https://github.com/raufs/skDER/pull/10
Full Changelog: https://github.com/raufs/skDER/compare/v1.3.1...v1.3.2
- Python
Published by raufs 9 months ago
skder - v1.3.1
Minor fix: Update argument descriptions for inputs to clarify that genome/proteome files input for CiDDER need to be uncompressed unlike for skDER where gzipped files are allowed. Also added a check for this in case users provide compressed inputs.
Full Changelog: https://github.com/raufs/skDER/compare/v1.3.0...v1.3.1
- Python
Published by raufs 10 months ago
skder - v1.3.0
- Restructure code and introduce new modules to simplify the main programs of skder and cidder.
- Incorporate faster way to download genomes from NCBI belonging to a single genus/species of interest in GTDB.
- Incorporate the latest GTDB release R226.
- Add new option to cidder to select additional representative genomes if X% of non-representative genomes are not contained by an individual representative genome. This is performed as a secondary step after the primary representative selection method.
- Change the default ANI cutoff from 99.0% to 99.5% identity in skder to reflect thresholds coinciding with sequence type designations as reported by Rodriguez-R et al. 2023.
- Lowered the default AF* threshold from 90% to 50% to reflect perspective/insight shared in the dRep documentation.
- Allow proteomes to be provided as inputs for CiDDER.
* = initially wrote ANI by mistake.
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.9...v1.3.0
- Python
Published by raufs 10 months ago
skder - v1.2.9
Minor updates
- make overwriting the output directory more safe and introduce user prompt.
- strip away quotes in string arguments with spaces in case they are not processed properly. (related to https://github.com/raufs/skDER/issues/8).
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.8...v1.2.9
- Python
Published by raufs about 1 year ago
skder - v1.2.7
- Update granet to make it deterministic and also introduce the
--random-seedoption to allow changing the layout if desired. - Fix indentation issues in help function and slight updates to logging and messages in CiDDER and skDER.
- Create
CiDDER_Results.txtfile in CiDDER to capture the order in which representative genomes are selected.
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.6...v1.2.7
- Python
Published by raufs over 1 year ago
skder - v1.2.6
- Make "greedy" mode the default algorithm in skder
- Correct stumbling on gzipped files with new method to calculate N50s introduced in v1.2.4.
- Introduce granet - for creating network visuals of genomes and where representative genomes selected fall.
- Update help functions of skder and cidder to include citation notice for skani and CD-HIT, respectively.
What's Changed
- Updated Docker folder by @Lolli-AK in https://github.com/raufs/skDER/pull/7
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.5...v1.2.6
- Python
Published by raufs over 1 year ago
skder - v1.2.5
- In skDER, set the default value of
-p- controlling additional arguments to pass toskani triangle- from nothing () to-s 90.0to increase the screening parameter's value to 90.0 from the default value of 80.0. - Add support for providing directory paths, as well as files, to the
-g/--genomesargument where the directories contain genome files to (also) include in skder/cidder analyses. - Begin development of Docker-based installation support - including convenience bash wrapper (still progress).
What's Changed
- Docker files by @Lolli-AK in https://github.com/raufs/skDER/pull/6
New Contributors
- @Lolli-AK made their first contribution in https://github.com/raufs/skDER/pull/6
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.4...v1.2.5
- Python
Published by raufs over 1 year ago
skder - v1.2.4
- Introduce
mgecut- a program that can use PhiSpy or geNomad to predict MGEs in genomes and filter them out prior to genomic dereplication. - Integrate
mgecutusage intoskderandcidder. - Switch to simple python implementation of N50 adapted from: https://gist.github.com/dinovski/2bcdcc770d5388c6fcc8a656e5dbe53c instead of using pyfastx.
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.3...v1.2.4
- Python
Published by raufs over 1 year ago
skder - v1.2.3
- update and further polish arguments (e.g. underscores to dashs).
- add options for secondary clustering for cidder using protein cluster containment* and/or skani (https://github.com/raufs/skDER/issues/5).
Full Changelog: https://github.com/raufs/skDER/compare/v1.2.2...v1.2.3
- Python
Published by raufs over 1 year ago
skder - v1.2.0
- Introduce CiDDER - a CD-HIT based dereplication program to ensure you properly sample the pangenome space of a species/genus.
- Introduce new option to skDER to test a bunch of cutoffs for ANI and AF and generate a heatmap on the number of representative genomes that results from different combinations. E.g. useful if you want to limit your analysis to X genomes but don't know what ANI/AF cutoffs to use.
Full Changelog: https://github.com/raufs/skDER/compare/v1.1.1...v1.2.0
- Python
Published by raufs over 1 year ago
skder - v1.1.0
- Introduce ability to specify GTDB release and update to using GTDB R220 as default for when users request to auto-download and include all genomes from a particular genus/species.
- Remove need for symlinking genomes locally, instead fastx index files are now written in the same folder as the input genomes and deleted afterwards.
- Parallelization when computing N50 is done by splitting up number of genomes by the number of CPUs allocated and thus writing to at most X number of files at a time, where X is the number of CPUs. This is to address: https://github.com/raufs/skDER/issues/4
- Python
Published by raufs almost 2 years ago
skder - v1.0.10
- Minor change, added new argument to use
https://ftp.ncbi.nlm.nih.gov/genomesinstead ofhttps://ftp.ncbi.nih.gov/genomesin case there are issues with connecting to the latter. This gets passed to ncbi-genome-download's-uargument.
Full Changelog: https://github.com/raufs/skDER/compare/v1.0.9...v1.0.10
- Python
Published by raufs almost 2 years ago
skder - v1.0.9
- Support for gzipped files added (https://github.com/raufs/skDER/issues/4)
- GTDB/NCBI downloaded genomes are now kept in gzip form
- FASTA files ending in *.fas now allowed (https://github.com/raufs/skDER/issues/4)
- If local input genomes are provided, default behavior is now to symlink files in the skDER results directory and do indexing for N50 calculation there.
- FASTA confirmation now optional (might paralelize in the future and turn back on as default - but currently iterative) - it can take a while if there are a lot of files.
Full Changelog: https://github.com/raufs/skDER/compare/v1.0.8...v1.0.9
- Python
Published by raufs almost 2 years ago
skder - v1.0.7
- Corrected faulty usage of the
-soption in skani triangle and now set it to the default value. This should now result in the more accurate ANI estimates being used for the dereplication methods as intended. - Updated stats and runtime info for running dynamic/greedy approaches on the Wiki.
- Added new secondary clustering option,
-nwhich will report the relation/distance of all genomes in the input set to their nearest representative genome.
Full Changelog: https://github.com/raufs/skDER/compare/v1.0.6...v1.0.7
- Python
Published by raufs over 2 years ago
skder - v1.0.2
updates for v.1.0.2
- KEY: Correct overflow issue in C++ code related to integer multiplication in computing scores for dynamic dereplication approach
- Introduce a greedy set cover dereplication approach as an alternate method
- Improve code + documentation organization
- Add test case
Full Changelog: https://github.com/raufs/skDER/compare/v1.0.1...v1.0.2
- Python
Published by raufs over 2 years ago
skder - v1.0.1
- Create directory with representative genomes in the output directory.
- Add version flag , change input for --genomes argument from accepting a directory to multiple paths to genome files.
- Update Enterococcus dereplication showcasing.
Full Changelog: https://github.com/raufs/skDER/compare/v1.0...v1.0.1
- Python
Published by raufs over 2 years ago