Recent Releases of veba
veba - VEBA_v2.5.1
[2.5.1] - 2025.04.12
Added
- Added
install-gpu.shwhich installs GPU accelerated environments when applicable (i.e.,VEBA-binning-prokaryotic_envandVEBA-binning-viral_env) - Added
Dockerfile-GPUwhich is experimental
Changed
- Changed
install.shso it only installs CPU-based environments Issue #167 - Changed
containerize_environments.shso it only installs CPU-based environments Issue #167
Deprecated
- Deprecated
VirFinderalgorithm inbinning-viral.pyso now onlygeNomadis supported
- Python
Published by jolespin about 1 year ago
veba - VEBA_v2.5.0
[2.5.0] - 2025.04.10
Added
- Added
VAMBsupport tobinning-prokaryotic.py(now a default binner) andbinning_wrapper.py. - Added automatic gzipping of output files based on
.gzextension inedgelist_to_clusters.pyusingpyexeggutor.open_file_writer. - Added
xxhashdependency toVEBA-binning-prokaryotic_envfor bin name reproducibility (Issue #140). - Added
-e/--excludeand-d/--domain_predictionsoptions tofilter_binette_results.pyfor removing eukaryotic genomes and setting up domain assignments (Issue #153). - Added
semibin2-[biome]option tobinning-prokaryotic.pyallowing specification of multiple biomes (e.g.,semibin2-global,semibin2-ocean), replacing--semibin2_biome(Issue #155). - Added
--semibin2_orf_finderoption tobinning_wrapper.py. - Added
genome_statistics.tsv.gz,gene_statistics.cds.tsv.gz,gene_statistics.rRNA.tsv.gz, andgene_statistics.tRNA.tsv.gzoutputs toessentials.py. - Added
--identifiers,--index_name, and--no_headeroptions toconvert_metabat2_coverage.pyfor broader applicability, includingVAMB. - Added
-l eukaryota_odb12as default but also allow--auto-lineage-eukforBUSCOinbinning-eukaryotic.py
Changed
- Changed
binning-eukaryotic.pybehavior to provide a solution to BUSCO Issue #447 - Changed
CHANGELOG.mdformat to best practice Keep a Changelog - Changed
prodigal-gvtopyrodigal-gvin multithreaded mode forbinning-viral.pyfor performance. - Removed
metacoagfrom the default set of binning algorithms inbinning-prokaryotic.py. - Updated
geNomadtov1.11.0andgeNomad databasetov1.8to resolve numpy import errors (Issue #160). - Updated
Pyrodigalusage inbinning-eukaryotic.pyfor organelles to allow piping and threading. - Updated
BUSCOtov5.8.3and associated databases. - Updated
TiaratoTiara-NALinVEBA-binning-prokaryotic_envandVEBA-binning-eukaryotic_envto enablestdinusage. - Updated
biosynthetic.pyto useantiSMASH v7(Issue #159). - Changed behavior when
--taxon fungiis specified: precomputed genes are not used due to formatting issues. - Simplified the method for adding headers to
Diamondoutputs inbiosynthetic.py. - Changed
Dockerfileworking directory from/tmp/to/home/. - Integrated
Tiaraandconsensus_domain_classification.pyinto thebinettestep ofbinning-prokaryotic.py. - Renamed database identifier from
VDBtoVEBA-DB. - Updated
CheckM2andBinetteversions inbinning-prokaryotic.py. - Updated
CheckM2 Diamonddatabase included inVEBA-DB_v9(Issue #154). - Removed usage of precomputed genes in the
SemiBin2wrapper due to SemiBin2/issue-#185. - Allowed faulty return codes in iterative mode for
binetteto permit convergence in genome recovery.
Fixed
- Fixed
CONDA_ENVS_PATHdetection in thevebacontroller executable to correctly handle environments outside the base Conda directory. - Fixed bug where
VFDBhits were incorrectly counted asMIBiGinbiosynthetic.py(Issue #141). - Fixed
--tta_thresholdargument inbiosynthetic.pywhich was previously defined but not connected to the command execution. - Removed capitalization from column headers in
filter_binette_results.pyoutput. - Fixed missing
--antismash_optionsargument connection inbiosynthetic.py.
Removed
- Removed
CONCOCTsupport frombinning-eukaryotic.py.
Deprecated
- Deprecated
amplicon.pymodule in favor of external pipelines likenf-core/ampliseq.
- Python
Published by jolespin about 1 year ago
veba - VEBA_v2.4.2
v2.4.2 fixed a small bug where de bruijn graph for MEGAHIT wasn't included in output directory if the graph was created [2025.2.1] - Added --megahitbuilddebruijngraph to make de-Bruijn graph construction for MEGAHIT optional in assembly.py
- Python
Published by jolespin over 1 year ago
veba - VEBA_v2.4.1
- [2025.2.1] - Added
--megahit_build_de_bruijn_graphto make de-Bruijn graph construction forMEGAHIToptional inassembly.py
- Python
Published by jolespin over 1 year ago
veba - VEBA_v2.4.0
- [2025.1.24] - Added
Initial_binstoBinetteresults infilter_binette_results.py - [2025.1.23] - Added
essentials.pymodule - [2025.1.16] - Added
--serialized_annotationstoappend_annotations_to_gff.pyto avoid overhead from reparsing the annotations - [2025.1.15] - Fixed bug in
binning_wrapper.pywhere script was looking for bins in the wrong directory forMetaCoAG - [2025.1.14] - Fixed bug in
merge_annotations.pywherediamondoutputs were queried incorrectly - [2025.1.5] - Change default
--busco_completenessfrom50to30inbinning-eukaryotic.py - [2025.1.5] - Added
--busco_optionsand--busco_offlinearguments forbinning-eukaryotic.py - [2024.12.28] - Added
--semibin2_sequencing_typetobinning_wrapper.pyand added functionality for--long_reads. Moved--long_readsargument toparser_ioinstead ofparser_featurecounts - [2024.12.27] - Fixed issue in
consensus_domain_classification.pywheresoftmaxreturns anp.arrayinstead of apd.DataFrame - [2024.12.26] - Added support for precomputed coverage for
metadecoderinbinning_wrapper.py - [2024.12.26] - Added support for
binetteandtiarain updatedbinning_prokaryotic.pymodule - [2024.12.23] - Added
copy_attribute_in_gff.pyscript which copies attributes to a source and destination attribute - [2024.12.17] - Added
filter_binette_results.pyscript - [2024.12.16] - Added intermediate directory to
metacoaginbinning_wrapper.py - [2024.12.12] - Added
metacoagsupport and custom HMM support tometadecoderinbinning_wrapper.py - [2024.12.11] - Added
prepend_de-bruijn_path.pyscript and use this inassembly.pyandassembly-long.pyto prepend prefix to SPAdes/Flye de Bruijn graph paths. - [2024.12.10] - Changed default
--minimum_genome_sizeto200000from150000 - [2024.12.9] - Added support for
SemiBin2andMetaDecoderinbinning_wrapper.py - [2024.11.21] - Updated
--cluster_label_modedefault tomd5instead ofnumericto allow for easier cluster updates post hoc. Change reflected incluster.py,global_clustering.py,local_clustering.py, andupdate_genome_clusters.py - [2024.11.18] - Added
update_genome_clusters.pywhich runsskaniagainst all reference genome clusters. Does not do protein clustering nor does it update the graph, representatives, or proteins. - [2024.11.15] - Added
--header simpletodiamondoutput inannotate.pyand accounted for change inmerge_annotations.py - [2024.11.11] - Added
Enzymestoappend_annotations_to_gff.pyscript - [2024.11.9] - Added
kofam.enzymes.listandkofam.pathways.listinVDB_v8.1to provide subsets forpykofamsearch - [2024.11.8] - Updating VEBA database
VDB_v8toVDB_v8.1which adds serialized KOfam with enzyme support - [2024.11.8] - Added
Enzymestoannotate.pyandmerge_annotations.py[!untested] - [2024.11.7] - Updated
pyhmmsearchandpykofamsearchversion inVEBA-annotate_env.yml,VEBA-classify-eukaryotic_env.yml,VEBA-database_env, andVEBA-phylogeny_env. Also updated executables inannotate.py,classify-eukaryotic.py,phylogeny.py, anddownload_databases-annotate.sh. - [2024.11.7] - In
edgelist_to_clusters.py, added--cluster_label_mode {"numeric", "random", "pseudo-random", "md5", "nodes"}to allow for different types of labels. Added--threshold2option for a second weight. - [2024.11.7] - Added
--wraptofasta_utility.pyand split id and descriptions in header so prefix/suffix is only added to id. - [2024.11.7] - Added
prepend_gff.pyto prepend a prefix to contig and attribute identifiers - [2024.11.7] - Changed default
--skani_minimum_afto50from15as this is used in GTDB-Tk for determining species-level clusters incluster.py,global_clustering.py, andlocal_clustering.py - [2024.11.6] - Added
append_annotations_to_gff.pyscript - [2024.10.29] - Changed
manualmode tometaeukmode for preexistingmetaeukresults
- Python
Published by jolespin over 1 year ago
veba - VEBA_v2.3.0
- [2024.9.21] - Added
KEGG Pathway ProfilertoVEBA-database_envandVEBA-annotate_envwhich replacesMicrobeAnnotator-KEGGfor module completion ratios. Replacing${VEBA_DATABASE}/Annotate/MicrobeAnnotator-KEGGwith${VEBA_DATABASE}/Annotate/KEGG-Pathway-Profiler/database files. Note: New module completion ratio output does not have classes labels for KEGG modules. - [2024.8.30] - Added ${N_JOBS} to download scripts with default set to maximum threads available
- Python
Published by jolespin over 1 year ago
veba - VEBA_v2.2.1
- [2024.8.29] - Added
VERSIONfile created indownload_databases.sh - [2024.7.11] - Alignment fraction threshold for genome clustering only applied to reference but should also apply to query. Added
--af_modewith eitherrelaxed = max([Alignment_fraction_ref, Alignment_fraction_query]) > minimum_aforstrict = (Alignment_fraction_ref > minimum_af) & (Alignment_fraction_query > minimum_af)toedgelist_to_clusters.py,global_clustering.py,local_clustering.py, andcluster.py. - [2024.7.3] - Added
pigztoVEBA-annotate_envwhich isn't a problem with mostcondainstallations but needed fordockercontainers. - [2024.6.21] - Changed
choose_fastest_mirror.pytodetermine_fastest_mirror.py - [2024.6.20] - Added
-m/--include_mrnatocompile_metaeuk_identifiers.pyfor Issue #110
- Python
Published by jolespin over 1 year ago
veba - VEBA_v2.2.0
Disclaimer: I made some large updates in this version and I believe everything has been adequately tested but just in case anything has slipped between the cracks you can use v2.1.0 which has been thoroughly tested in accordance to the NAR Espinoza 2024 paper. Benefits of using this version include much faster and robust prokaryotic classifications and fast/scalable HMM-based annotation modeling.
Large performance updates for this version including: * Updating GTDB-Tk 2.3.0 -> 2.4.0 which means the GTDB needed to be updated from r214.1 -> r220 * VEBA-classifyenv was split up into VEBA-classify-eukaryoticenv, VEBA-classify-prokaryoticenv, and VEBA-prokaryoticenv * annotate.py, classify-eukaryotic.py, and phylogeny.py were rewritten (and their utility scripts) were updated to used PyHMMER (pyhmmsearch and pykofamsearch) which is faster than HMMSearch when multithreaded. * KOFAM was changed to KOfam
- Python
Published by jolespin almost 2 years ago
veba - VEBA_v2.1.0-zen
This is the exact same version as VEBA_v2.1.0. New VEBA releases will now automatically be synced to Zenodo.
- Python
Published by jolespin almost 2 years ago
veba - VEBA_v2.1.0
Official release of VEBA v2.1.0 with updates to address peer reviewers. Mostly documentation but also including the following:
- [2024.4.30] - Added
concatenate_files.pywhich can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g.,cat *.fasta > output.fastawhere *.fasta results in 50k files will crash) - [2024.4.29] - Added
/volumes/workspace/directory to Docker containers for situations when your input and output directories are the same. - [2024.4.29] -
featureCountscan only handle 64 threads at a time so addedmin(64, opts.n_jobs)for all the modules/scripts that usefeatureCountscommands. - [2024.4.23] - Added
uniprot_to_enzymes.pywhich reformats tables and fasta from https://www.uniprot.org/uniprotkb?query=ec%3A* - [2024.4.18] - Developed a faster CLI implementation of
KofamScancalledPyKofamSearchwhich leveragePyHmmer. This will be used in future versions of VEBA. - [2024.4.18] - Developed a faster CLI implementation of
HMMSearchcalledPyHMMSearchwhich leveragePyHmmer. This will be used in future versions of VEBA. - [2024.3.26] - Added
--metaeuk_split_memory_limittometaeuk_wrapper.py. - [2024.3.26] - Added
-d/--genome_identifier_directory_indextoscaffolds_to_bins.pyfor directories that are structuredpath/to/genomes/bin_a/reference.fastawhere you would use-d -2. - [2024.3.26] - Added
--minimum_aftoedgelist_to_clusters.pywith an option to accept 4 column inputs[id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction].global_clustering.py,local_clustering.py, andcluster.pynow use this by default--af_threshold 30.0. If you want to retain previous behavior, just use--af_threshold 0.0. - [2024.3.18] -
edgelist_to_clusters.pyonly includes edges where both nodes are in identifiers set. If--identifiersare provided, then only those identifiers are used. If not, then it includes all nodes. - [2024.3.18] - Added
--export_representativesargument foredgelist_to_clusters.pyto output table with[id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]. Also includes this information innx.Graphobjects. - [2024.3.18] - Changed singleton weight to
np.naninstead ofnp.infforedgelist_to_clusters.pyto allow for representative calculations. - YouTube channel (https://www.youtube.com/@VEBA-Multiomics)
- Python
Published by jolespin about 2 years ago
veba - VEBA_v2.1.0b (pre-release)
Beta release of VEBA v2.1.0b with updates to address peer reviewers. Mostly documentation but also including the following:
- [2024.4.30] - Added
concatenate_files.pywhich can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g.,cat *.fasta > output.fastawhere *.fasta results in 50k files will crash) - [2024.4.29] - Added
/volumes/workspace/directory to Docker containers for situations when your input and output directories are the same. - [2024.4.29] -
featureCountscan only handle 64 threads at a time so addedmin(64, opts.n_jobs)for all the modules/scripts that usefeatureCountscommands. - [2024.4.23] - Added
uniprot_to_enzymes.pywhich reformats tables and fasta from https://www.uniprot.org/uniprotkb?query=ec%3A* - [2024.4.18] - Developed a faster implementation of
KofamScancalledPyKofamSearchwhich leveragePyHmmer. This will be used in future versions of VEBA. - [2024.3.26] - Added
--metaeuk_split_memory_limittometaeuk_wrapper.py. - [2024.3.26] - Added
-d/--genome_identifier_directory_indextoscaffolds_to_bins.pyfor directories that are structuredpath/to/genomes/bin_a/reference.fastawhere you would use-d -2. - [2024.3.26] - Added
--minimum_aftoedgelist_to_clusters.pywith an option to accept 4 column inputs[id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction].global_clustering.py,local_clustering.py, andcluster.pynow use this by default--af_threshold 30.0. If you want to retain previous behavior, just use--af_threshold 0.0. - [2024.3.18] -
edgelist_to_clusters.pyonly includes edges where both nodes are in identifiers set. If--identifiersare provided, then only those identifiers are used. If not, then it includes all nodes. - [2024.3.18] - Added
--export_representativesargument foredgelist_to_clusters.pyto output table with[id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]. Also includes this information innx.Graphobjects. - [2024.3.18] - Changed singleton weight to
np.naninstead ofnp.infforedgelist_to_clusters.pyto allow for representative calculations.
- Python
Published by jolespin about 2 years ago
veba - VEBA_v2.0.0
- Changed default assembly algorithm to
metaflyeinstead offlyeinassembly-long.py - Added
number_of_genomes,number_of_genome-clusters,number_of_proteins, andnumber_of_protein-clusterstofeature_compression_ratios.tsv.gzfromcluster.py - Added
-A/--from_antismashinbiosynthetic.pyto use preexistingantiSMASHresults. Also changed-i/--inputto-i/--from_genomes. - Changed
antimash_genbanks_to_table.pytobiosynthetic_genbanks_to_table.pyfor future support ofDeepBGCandGECCO - Added
busco_versionparameter tomerge_busco_json.pywith default set to5.4.xand additional support for5.6.x. - Added
CONDA_ENVS_PATHtoupdate_environment_scripts.sh,update_environment_variables.sh, andcheck_installation.sh - Added
CONDA_ENVS_PATHtovebato allow for custom environment locations - Changed
install.shto support customCONDA_ENVS_PATHargumentbash install.sh path/to/log path/to/envs/ - Added
merge_counts_with_taxonomy.py
- Python
Published by jolespin about 2 years ago
veba - VEBA_v1.5.0
Warning:
For this release, use the https://github.com/jolespin/veba/releases/download/v1.5.0/v1.5.0.zip asset not the "Source code" assets as those are out of date.
Release v1.5.0 Highlights:
- Added
VeryFastTreetophylogeny.py - Added
--blacklisttocompile_eukaryotic_classifications.py - Added compatibility for
antismash_genbanks_to_table.pyto operate onantiSMASH v7genbanks - Added
compile_phylogenomic_functional_categories.pyscript which automates the methodology from Espinoza et al. 2022 (doi:10.1093/pnasnexus/pgac239) - Fixed error in
annotations.protein_clusters.tsvformatting fromannotate.py - Fixed situation where
unbinned.fastawere not added inbinning-prokaryotic.pyand bad symlinks were created for GFF, rRNA, and tRNA when no genoems were detected. - Fixed critical error where
classify_eukaryotic.pywas trying to access a deprecated database file from MicroEuk_v2.
Release v1.5.0 Details
* Cleaned up installation files * Changed `veba/src/` to `veba/bin/` * Checked `SCRIPT_VERSIONS` to `VEBA_SCRIPT_VERSIONS` which are now in `bin/` of conda environment * Fixed header being offset in `annotations.protein_clusters.tsv` where it could not be read with Pandas. * Fixed `binning-prokaryotic.py` the creation of non-existing symlinks where "'*.gff'", "'*.rRNA'", and "'*.tRNA'" were created. * Fixed .strip method on Pandas series in `antismash_genbanks_to_table.py` for compatibilty with `antiSMASH 6 and 7` * Fixed situation where `unbinned.fasta` is empty in `binning-prokaryotic.py` when there are no bins that pass qc. * Fixed minor error in `coverage.py` where `samtools sort --reference` was getting `reads_table.tsv` and not `reference.fasta` * Changed default behavior from deterministic to not deterministic for increase in speed in `assembly-long.py`. (i.e., `--no_deterministic` --> `--deterministic`) * Added `VeryFastTree` as an option to `phylogeny.py` with `FastTree` remaining as the default. * Changed default `--leniency` parameter on `classify_eukaryotic.py` and `consensus_genome_classification_ranked.py` to `1.0` and added `--leniecy_genome_classification` as a separate option. * Added `--blacklist` option to `compile_eukaryotic_classifications.py` with a default value of `species:uncultured eukaryote` in `classify_eukaryotic.py` * Fixed critical error where `classify_eukaryotic.py` was trying to access a deprecated database file from MicrEuk_v2. * Fixed minor error with `eukaryotic_gene_modeling_wrapper.py` not allowing for `Tiara` to run in backend. * Added `compile_phylogenomic_functional_categories.py` script which automates the methodology from [Espinoza et al. 2022 (doi:10.1093/pnasnexus/pgac239)](https://academic.oup.com/pnasnexus/article/1/5/pgac239/6762943)
- Python
Published by jolespin over 2 years ago
veba - VEBA_v1.5.0
Release v1.5.0 Highlights:
- Added
VeryFastTreetophylogeny.py - Added
--blacklisttocompile_eukaryotic_classifications.py - Added compatibility for
antismash_genbanks_to_table.pyto operate onantiSMASH v7genbanks - Added
compile_phylogenomic_functional_categories.pyscript which automates the methodology from Espinoza et al. 2022 (doi:10.1093/pnasnexus/pgac239) - Fixed error in
annotations.protein_clusters.tsvformatting fromannotate.py - Fixed situation where
unbinned.fastawere not added inbinning-prokaryotic.pyand bad symlinks were created for GFF, rRNA, and tRNA when no genoems were detected. - Fixed critical error where
classify_eukaryotic.pywas trying to access a deprecated database file from MicroEuk_v2.
Release v1.5.0 Details
* Cleaned up installation files * Changed `veba/src/` to `veba/bin/` * Checked `SCRIPT_VERSIONS` to `VEBA_SCRIPT_VERSIONS` which are now in `bin/` of conda environment * Fixed header being offset in `annotations.protein_clusters.tsv` where it could not be read with Pandas. * Fixed `binning-prokaryotic.py` the creation of non-existing symlinks where "'*.gff'", "'*.rRNA'", and "'*.tRNA'" were created. * Fixed .strip method on Pandas series in `antismash_genbanks_to_table.py` for compatibilty with `antiSMASH 6 and 7` * Fixed situation where `unbinned.fasta` is empty in `binning-prokaryotic.py` when there are no bins that pass qc. * Fixed minor error in `coverage.py` where `samtools sort --reference` was getting `reads_table.tsv` and not `reference.fasta` * Changed default behavior from deterministic to not deterministic for increase in speed in `assembly-long.py`. (i.e., `--no_deterministic` --> `--deterministic`) * Added `VeryFastTree` as an option to `phylogeny.py` with `FastTree` remaining as the default. * Changed default `--leniency` parameter on `classify_eukaryotic.py` and `consensus_genome_classification_ranked.py` to `1.0` and added `--leniecy_genome_classification` as a separate option. * Added `--blacklist` option to `compile_eukaryotic_classifications.py` with a default value of `species:uncultured eukaryote` in `classify_eukaryotic.py` * Fixed critical error where `classify_eukaryotic.py` was trying to access a deprecated database file from MicrEuk_v2. * Fixed minor error with `eukaryotic_gene_modeling_wrapper.py` not allowing for `Tiara` to run in backend. * Added `compile_phylogenomic_functional_categories.py` script which automates the methodology from [Espinoza et al. 2022 (doi:10.1093/pnasnexus/pgac239)](https://academic.oup.com/pnasnexus/article/1/5/pgac239/6762943)
- Python
Published by jolespin over 2 years ago
veba - VEBA_v1.4.2
- [2023.12.21] -
GTDB-Tkchanged name of archaea summary file so VEBA was not adding this to final classification. Fixed this inclassify-prokaryotic.py. - [2023.12.20] - Fixed files not being closed in
compile_custom_humann_database_from_annotations.pyand added options to use different annotation file formats (i.e., multilevel, header, and no header).
- Python
Published by jolespin over 2 years ago
veba - VEBA_v1.4.1
Release v1.4.1 Highlights:
VEBAModules:- Added
profile-taxonomic.pymodule which usessylphto build a sketch database for genomes and queries the genome database for taxonomic abundance. - Added long read support for
fastq_preprocessor,preprocess.py,assembly-long.py,coverage-long, and all binning modules. - Redesign
binning-eukaryoticmodule to handle customMetaEukdatabases - Added new usage syntax
veba --module preprocess --params “${PARAMS}”where the Conda environment is abstracted and determined automatically in the backend. Changed all the walkthroughs to reflect this change. - Added
skaniwhich is the new default for genome-level clustering based on ANI. - Added
Diamond DeepClustas an alternative toMMSEQS2for protein clustering.
- Added
VEBADatabase (VDB_v6):- Completely rebuilt
VEBA's Microeukaryotic Protein Databaseto produce a clustered databaseMicroEuk100/90/50similar toUniRef100/90/50. Available on doi:10.5281/zenodo.10139450. Number of sequences:
- MicroEuk100 = 79,920,431 (19 GB)
- MicroEuk90 = 51,767,730 (13 GB)
- MicroEuk50 = 29,898,853 (6.5 GB)
Number of source organisms per dataset:
- MycoCosm = 2503
- PhycoCosm = 174
- EnsemblProtists = 233
- MMETSP = 759
- TARA_SAGv1 = 8
- EukProt = 366
- EukZoo = 27
- TARA_SMAGv1 = 389
- NR_Protists-Fungi = 48217
- Completely rebuilt
**Release v1.4.0 Details**
* [2023.12.15] - Added `profile-taxonomic.py` module which uses `sylph` to build a sketch database for genomes and queries the genome database similar to `Kraken` for taxonomic abundance. * [2023.12.14] - Removed requirement to have `--estimated_assembly_size` for Flye per [Flye Issue #652](https://github.com/fenderglass/Flye/issues/652). * [2023.12.14] - Added `sylph` to `VEBA-profile_env` for abundance profiling of genomes. * [2023.12.13] - Dereplicate duplicate contigs in `concatenate_fasta.py`. * [2023.12.12] - Added `--reference_gzipped` to `index.py` and `mapping.py` with new default being that the reference fasta is not gzipped. * [2023.12.11] - Added `skani` as new default for genome clustering in `cluster.py`, `global_clustering.py`, and `local_clustering.py`. * [2023.12.11] - Added support for long reads in `fastq_preprocessor`, `preprocess.py`, `assembly-long.py`, `coverage-long`, and all binning modules. * [2023.11.28] - Fixed `annotations.protein_clusters.tsv.gz` from `merge_annotations.py` added in patch update of `v1.3.1`. * [2023.11.14] - Added support for missing values in `compile_eukaryotic_classifications.py`. * [2023.11.13] - Added `--metaeuk_split_memory_limit` argument with (experimental) default set to `36G` in `binning-eukaryotic.py` and `eukaryotic_gene_modeling.py`. * [2023.11.10] - Added `--compressed 1` to `mmseqs createdb` in `download_databases.sh` installation script. * [2023.11.10] - Added a check to `check_fasta_duplicates.py` and `clean_fasta.py` to make sure there are no `>` characters in fasta sequence caused from concatenating fasta files that are missing linebreaks. * [2023.11.10] - Added `Diamond DeepClust` to `clustering_wrapper.py`, `global/local_clustering.py`, and `cluster.py`. Changed `mmseqs2_wrapper.py` to `clustering_wrapper.py`. Changed `easy-cluster` and `easy-linclust` to `mmseqs-cluster` and `mmseqs-linclust`. * [2023.11.9] - Fixed viral quality in `merge_genome_quality_assessments.py` * [2023.11.3] - Changed `consensus_genome_classification.py` to `consensus_genome_classification_ranked.py`. Also, default behavior to allow for missing taxonomic levels. * [2023.11.2] - Fixed the `merge_annotations.py` resulting in a memory leak when creating the `annotations.protein_clusters.tsv.gz` output table. However, still need to correct the formatting for empty sets and string lists.
- Python
Published by jolespin over 2 years ago
veba - VEBA_v1.3.0
Release v1.3.0:
VEBAModules:- Added
profile-pathway.pymodule and associated scripts for buildingHUMAnNdatabases from de novo genomes and annotations. Essentially, a reads-based functional profiling method viaHUMAnNusing binned genomes as the database. - Added
marker_gene_clustering.pyscript which identifies core marker proteins that are present in all genomes within a genome cluster (i.e., pangenome) and unique to only that genome cluster. Clusters in either protein or nucleotide space. - Added
module_completion_ratios.pyscript which calculates KEGG module completion ratios for genomes and pangenomes. Automatically run in backend ofannotate.py. - Updated
annotate.pyandmerge_annotations.pyto provide better annotations for clustered proteins. - Added
merge_genome_quality.pyandmerge_taxonomy_classifications.pywhich compiles genome quality and taxonomy, respectively, for all organisms. - Added BGC clustering in protein and nucleotide space to
biosynthetic.py. Also, produces prevalence tables that can be used for further clustering of BGCs. - Added
pangenome_core_sequencesincluster.pywrites both protein and CDS sequences for each genome cluster. - Added PDF visualization of newick trees in
phylogeny.py.
- Added
VEBADatabase (VDB_v5.2):- Added
CAZy - Added
MicrobeAnnotator-KEGG
- Added
**Release v1.3.0 Details**
* Update `annotate.py` and `merge_annotations.py` to handle `CAZy`. They also properly address clustered protein annotations now. * Added `module_completion_ratio.py` script which is a fork of `MicrobeAnnotator` [`ko_mapper.py`](https://github.com/cruizperez/MicrobeAnnotator/blob/master/microbeannotator/pipeline/ko_mapper.py). Also included a database [Zenodo: 10020074](https://zenodo.org/records/10020074) which will be included in `VDB_v5.2` * Added a checkpoint for `tRNAscan-SE` in `binning-prokaryotic.py` and `eukaryotic_gene_modeling_wrapper.py`. * Added `profile-pathway.py` module and `VEBA-profile_env` environments which is a wrapper around `HUMAnN` for the custom database created from `annotate.py` and `compile_custom_humann_database_from_annotations.py` * Added `GenoPype version` to log output * Added `merge_genome_quality.py` which combines `CheckV`, `CheckM2`, and `BUSCO` results. * Added `compile_custom_humann_database_from_annotations.py` which compiles a `HUMAnN` protein database table from the output of `annotate.py` and taxonomy classifications. * Added functionality to `merge_taxonomy_classifications.py` to allow for `--no_domain` and `--no_header` which will serve as input to `compile_custom_humann_database_from_annotations.py` * Added `marker_gene_clustering.py` script which gets core marker genes unique to each SLC (i.e., pangenome). `average_number_of_copies_per_genome` to protein clusters. * Added `--minimum_core_prevalence` in `global_clustering.py`, `local_clustering.py`, and `cluster.py` which indicates prevalence ratio of protein clusters in a SLC will be considered core. Also remove `--no_singletons` from `cluster.py` to avoid complications with marker genes. Relabeled `--input` to `--genomes_table` in clustering scripts/module. * Added a check in `coverage.py` to see if the `mapped.sorted.bam` files are created, if they are then skip them. Not yet implemented for GNU parallel option. * Changed default representative sequence format from table to fasta for `mmseqs2_wrapper.py`. * Added `--nucleotide_fasta_output` to `antismash_genbank_to_table.py` which outputs the actual BGC DNA sequence. Changed `--fasta_output` to `--protein_fasta_output` and added output to `biosynthetic.py`. Changed BGC component identifiers to `[bgc_id]_[position_in_bgc]|[start]:[end]([strand])` to match with `MetaEuk` identifiers. Changed `bgc_type` to `protocluster_type`. `biosynthetic.py` now supports GFF files from `MetaEuk` (exon and gene features not supported by `antiSMASH`). Fixed error related to `antiSMASH` adding CDS (i.e., `allorf_[start]_[end]`) that are not in GFF so `antismash_genbank_to_table.py` failed in those cases. * Added `ete3` to `VEBA-phylogeny_env.yml` and automatically renders trees to PDF. * Added presets for `MEGAHIT` using the `--megahit_preset` option. * The change for using `--mash_db` with `GTDB-Tk` violated the assumption that all prokaryotic classifications had a `msa_percent` field which caused the cluster-level taxonomy to fail. `compile_prokaryotic_genome_cluster_classification_scores_table.py` fixes this by uses `fastani_ani` as the weight when genomes were classified using ANI and `msa_percent` for everything else. Initial error caused unclassified prokaryotic for all cluster-level classifications. * Fixed small error where empty gff files with an asterisk in the name were created for samples that didn't have any prokaryotic MAGs. * Fixed critical error where descriptions in header were not being removed in `eukaryota.scaffolds.list` and did not remove eukaryotic scaffolds in `seqkit grep` so `DAS_Tool` output eukaryotic MAGs in `identifier_mapping.tsv` and `__DASTool_scaffolds2bin.no_eukaryota.txt` * Fixed `krona.html` in `biosynthetic.py` which was being created incorrectly from `compile_krona.py` script. * Create `pangenome_core_sequences` in `global_clustering.py` and `local_clustering.py` which writes both protein and CDS sequences for each SLC. Also made default in `cluster.py` to NOT do local clustering switching `--no_local_clustering` to `--local_clustering`. * `pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects` in `biosynthetic.py` when `Diamond` finds multiple regions in one hit that matches. Added `--sort_by` and `--ascending` to `concatenate_dataframes.py` along with automatic detection and removal of duplicate indices. Also added `--sort_by bitscore` in `biosynthetic.py`. * Added core pangenome and singleton hits to clustering output * Updated `--megahit_memory` default from 0.9 to 0.99 * Fixed error in `genomad_taxonomy_wrapper.py` where `viral_taxonomy.tsv` should have been `taxonomy.tsv`. * Fixed minor error in `assembly.py` that was preventing users from using `SPAdes` programs that were not `spades.py`, `metaspades.py`, or `rnaspades.py` that was the result of using an incorrect string formatting. * Updated `bowtie2` in preprocess, assembly, and mapping modules. Updated `fastp` and `fastq_preprocessor` in preprocess module.
- Python
Published by jolespin over 2 years ago
veba - VEBA_v1.2.0
Release v1.2.0:
- Fixed minor error in
binning-prokaryotic.pywhere the--veba_databaseargument wasn't utilized and only the environment variableVEBA_DATABASEcould be used. - Updated the Docker images to have
/volumes/input,/volumes/output, and/volumes/databasedirectories to mount. - Replaced
prodigalwithpyrodigalas it is faster and under active development. - Added support for missing classifications in
compile_krona.pyandconsensus_genome_classification.py. - Updated
GTDB-Tkfrom version2.1.3→2.3.0andGTDBfrom versionr202_v2→r214. Changed${VEBA_DATABASE}/Classify/GTDBTk→${VEBA_DATABASE}/Classify/GTDB. Addedgtdb_r214.mshtoGTDBdatabase for ANI screening. - Added pangenome and singularity tables to
cluster.py(and associated global/local clustering scripts) to output automatically. - Added
compile_gff.pyto merge CDS, rRNA, and tRNA GFF files. Used inbinning-prokaryotic.pyandbinning-viral.py.binning-eukaryotic.pyuses the source of this in the backend offilter_busco_results.py. Includes GC content for contigs and various tags. - Updated
BUSCO v5.3.2 -> v5.4.3which changes the json output structure and made the appropriate changes infilter_busco_results.py. - Added
eukaryotic_gene_modeling_wrapper.pywhich 1) splits nuclear, mitochondrial, and plastid genomes; 2) performs gene modeling viaMetaEukandPyrodigal; 3) performs rRNA detection viaBARRNAP; 4) performs tRNA detection viatRNAscan-SE; 5) merges processed GFF files; and 5) calculates sequences statistics. - Added
gene_biotype=protein_codingtoP(y)rodigal(-GV)GFF output. - Added
VFDBtoannotate.pyand database. - Compiled and pushed
gtdb_r214.mshmash file to Zenodo:8048187 which is now used by default inclassify-prokaryotic.py. It is now included inVDB_v5.1. - Cleaned up global and local clustering intermediate files. Added pangenome tables and singelton information to outputs.
- Python
Published by jolespin almost 3 years ago
veba - VEBA_v1.1.2
Release v1.1.2
- Created Docker images for all modules
- Replaced all absolute path symlinks with relative symlinks
- Changed
prokaryotic_taxonomy.tsvandprokaryotic_taxonomy.clusters.tsvinclassify-prokaryotic.py(along with eukaryotic and viral) files totaxonomy.tsvandtaxonomy.clusters.tsvfor uniformity. - Updating all symlinks to relative links (also in
fastq_preprocessor) to prepare for dockerization and updating all environments to use updated GenoPype 2023.4.13. - Changed
nrtounirefinannotate.pyand addedpropagate_annotations_from_representatives.pyscript while simplifyingmerge_annotations_and_taxonomy.pytomerge_annotations.pyand excluding taxonomy operations. - Changed
nrtoUniRef90andUniRef50inVDB_v5 - Changed
orfs_to_orthogroups.tsvtoproteins_to_orthogroups.tsvfor consistency with thecluster.pymodule. Will eventually find some consitency withscaffolds_to_bins/scaffolds_to_magsbut this will be later. - Added a
scaffolds_to_mags.tsvin the clustering output. - Added
convert_counts_table.pywhich converts a counts table (and metadata) to Pandas pickle, Anndata h5ad, or Biom hdf5 - Fixed output directory for
mapping.pywhich now usesoutput_directory/${NAME}structure likebinning-*.py. - Removed "python" prefix for script calls and now uses shebang in script for executable. Also added single paranthesis around script filepath (e.g.,
'[script_filepath]') to escape characters/spaces in filepath. - Added support for
index.pyto accept individual--references [file.fasta]and--gene_models [file.gff]. - Added
stdinsupport forscaffolds_to_bins.pyalong with the ability to input genome tables [id_genome][filepath]. Also added progress bars. - As a result of issues/22,
assembly.py,assembly-sequential.py,binning-*.py, andmapping.pywill use-p --countReadPairsforfeatureCountsand updatessubread 2.0.1 -> subread 2.0.3. Forbinning-*.py, long reads can be used with the--long_readsflag. - Updated
cluster.pyand associatedglobal_clustering.py/local_clustering.pyscripts to usemmseqs2_wrapper.pywhich now automatically outputs representative sequences. - Added
check_fasta_duplicates.pyscript that gives0and1exit codes for fasta without and with duplicates, respectively. Addedreformat_representative_sequences.pyto reformat representative sequences fromMMSEQS2into either a table or fasta file where the identifers are cluster labels. Removed--dbtypefrom[global/local]_clustering.py. Removed appended prefix for.graph.pklanddict.pklinedgelist_to_clusters.py. Addedmmseqs2_wrapper.pyandhmmer_wrapper.pyscripts. - Added an option to
merge_generalized_mapping.pyto include the sample index in a filepath and also an option to remove empty features (useful for Salmon). Added anexecutable='/bin/bash'option to thesubprocess.Popencalls inGenoPypeto address issues/23. - Added
genbanks/[id_genome]/to output directory ofbiosynthetic.pywhich has symlinks to all the BGC genbanks fromantiSMASH.
- Python
Published by jolespin about 3 years ago
veba - VEBA_v1.1.1
Minor updates from v1.1.0.
- Most important update includes fixing a broken
VEBA-binning-viral.ymlinstall recipe which had package conflicts for aria2 https://github.com/jolespin/veba/commit/30e8b0a6aa6612c4db201423b304fc57362f996b. - Fixes on conda-related environment variables in the install scripts.
- Added MIBiG to database and
annotate.py - Added a composite label for annotations in
annotate.py - Added
--dastool_minimum_scoretobinning-prokaryotic.pymodule - Added a wrapper around
STARaligner - Updated
merge_generalized_mapping.pyscript to take in BAM files instead of being dependent on a specific directory. - Added option to have no header in
subst_table.py
- Python
Published by jolespin about 3 years ago
veba - VEBA_v1.1.0
Release v1.1.0
Modules:
annotate.py- Added
NCBIfam-AMRFinderAMR domain annotations - Added
AntiFamcontimination annotations - Uses
taxopyinstead ofete3in backend withmerge_annotations_and_score_taxonomy.py
- Added
assembly.py- Added a
transcripts_to_genes.pyscript which creates agenes_to_transcripts.tsvtable that can be used withTransDecoder.
- Added a
binning-prokaryotic.py- Updated
CheckM→CheckM2. This removes the dependency ofGTDB-Tkand EXTREMELY REDUCES compute resource requirements (e.g., memory and time) asCheckM2automatically handles candidate phyla radiation. With this, several backend scripts were deprecated. This cleans up the binning pipeline and error messages SUBSTANTIALLY. - Uses
binning_wrapper.pyfor all binning. This makes it easier to add new binning algorithms in the future (e.g.,VAMB). Also, check out the new multi-split binning functionality described below. - Added
--skip_concoctin addition to the already existing--skip_maxbin2option asMaxBin2takes very long when there's a lot of contigs andCONCOCTtakes a long time when there are a lot of samples (i.e., BAM files).MetaBAT2is not optional.
- Updated
binning-viral.py- Complete rewrite of this module which now uses
geNomadas the default binning algorithm but still supportsVirFinder. - If
VirFinderis used, thegenomad annotateis run via thegenomad_taxonomy_wrapper.pyscript included in the update. - Updated
Prodigal→Prodigal-GVto handle additional viral genetic codes.
- Complete rewrite of this module which now uses
biosynthetic.py- Introduces
component_idandbgc_idwhich are unique, pareseable, and informative. For example,component_id = SRR17458614__CONCOCT__P.2__9|NODE_3319_length_2682_cov_2.840502|region001_1|2-2681(+)contains the uniquebgc_id(i.e.,SRR17458614__CONCOCT__P.2__9|NODE_3319_length_2682_cov_2.840502|region001), shows that it is the 1st gene in the cluster (the_1inregion001_1), and the gene start/end/strand. Thebgc_idis composed of thegenome_id|contig_id|region_id.
- Introduces
classify-prokaryotic.py- Updated
GTDB-Tk v2.1.1→GTDB-Tk v2.2.3. For now,--skip_ani_screenis the only option because of this thread. However,--mash_dbmay be an option in the near future. - Added functionality to classify prokaryotic genomes that were not binned via
VEBAwhich is available with the--genomesoption (--prokaryotic_binning_directoryis still available which can leverage existing intermediate files).
- Updated
classify-eukaryotic.py- Added functionality to classify eukaryotic genomes that were not binned via
VEBAwhich is available with the--genomesoption (--eukaryotic_binning_directoryis still available which can leverage existing intermediate files). This is implemented by using theeukaryota_odb10markers from theVEBA Microeukaryotic Databaseto substantially improve performance and decrease resources required for gene models.
- Added functionality to classify eukaryotic genomes that were not binned via
classify-viral.py- Complete rewrite of this module which does not rely on (deprecated) intermediate files from
CheckV. - Uses taxonomy generated from
geNomadandconsensus_genome_classification_unranked.py(a wrapper aroundtaxopy) that can handle the chaotic taxonomy of viruses. - Added functionality to classify viral genomes that were not binned via
VEBAwhich is available with the--genomesoption (--viral_binning_directoryis still available which can leverage existing intermediate files).
- Complete rewrite of this module which does not rely on (deprecated) intermediate files from
cluster.py- Complete rewrite of this module which now uses
MMSEQS2as the orthogroup detection algorithm instead ofOrthoFinder.OrthoFinderis overkill for creating protein clusters and it generates thousands of intermediate files (e.g., fasta, alignments, trees, etc.) which substantially increases the compute time.MMSEQS2has very similar performance with a fraction of the resources and compute time. Clustered the entire Plastisphere dataset on a local machine in ~30 minutes compared to several days on a HPC. - Now that the resources are minimal, clustering is performed at global level as before (i.e., all samples in the dataset) and now at the local level, optionally but ON by default, which clusters all genomes within a sample. Accompanying wrapper scripts are
global_clustering.pyandlocal_clustering.py. - The genomic and functional feature compression ratios (FCR) (described here]) are now calculated automatically. The calculation is
1 - number_of_clusters/number_of_featureswhich can easily be converted into an unsupervised biodiversity metric. This is calculated at the global (original implementation) and local levels. - Input is now a table with the following columns:
[organism_type]<tab>[id_sample]<tab>[id_mag]<tab>[genome]<tab>[proteins]and is generated easily with thecompile_genomes_table.pyscript. This allows clustering to be performed for prokaryotes, eukaryotes, and viruses all at the same time. - SLC-specific orthogroups (SSO) are now refered to as SLC-specific protein clusters (SSPC).
- Support zfilling (e.g.,
zfill=3, SLC7 → SLC007) for genomic and protein clusters. - Deprecated
fastani_to_clusters.pyto now use the more generalizableedgelist_to_clusters.pywhich is used for both genomic and protein clusters. This also outputs aNetworkXgraph and a pickled dictionary{"cluster_a":{"component_1", "component_2", ..., "component_n"}}
- Complete rewrite of this module which now uses
phylogeny.py- Updated
MUSCLEtov5which has-alignand-super5algorithms which are now accessible with--alignment_algorithm. Cannot usestdinso now the fasta files are not gzipped. Themerge_msa.pynow output uncompressed fasta as default and can output gzipped with the--gzipflag.
- Updated
VEBA Database:VDB_v3.1→VDB_v4- Updated
CheckV DB v1.0→CheckV DB v1.5 - Added
geNomad DB v1.2 - Added
CheckM2 DB - Removed
CheckM DB - Removed
taxa.sqliteandtaxa.sqlite.traverse.pkl - Added
reference.eukaryota_odb10.listand correspondingMMSEQS2database (i.e.,microeukaryotic.eukaryota_odb10) - Added
NCBIfam-AMRFindermarker set for annotation - Added
AntiFammarker set for contamination - Marker sets HMMs are now all gzipped (previously could not gzip because
CheckMCPR workflow)
- Updated
Scripts:
- Added:
append_geneid_to_transdecoder_gff.pybowtie2_wrapper.pycompile_genomes_table.pyconsensus_genome_classification_unranked.pycut_table.pycut_table_by_column_labels.pydrop_missing_values.pyedgelist_to_clusters.pyfilter_checkm2_results.pygenomad_taxonomy_wrapper.pyglobal_clustering.pylocal_clustering.pypartition_multisplit_bins.pyscaffolds_to_clusters.pyscaffolds_to_samples.pytranscripts_to_genes.pytransdecoder_wrapper.py(Note: Requires separate environment to run due to dependency conflicts)
- Updated:
antismash_genbanks_to_table.py- Added option to output biosynthetic gene cluster (BGC) fasta. Adds unique (and parseable) BGC identifiers making the output much more useful.binning_wrapper.py- This binning wrapper now includes functionality to use multi-split binning (i.e., concatenated contigs from different assemblies, map all reads to the contigs, bin all together, and then parition bins by sample). This concept AFAIK was first introduced in theVAMBpaper.compile_reads_table.py- Minimal change but now the extension excludes the.to make usage more consistent with other tools.consensus_genome_classification.py- Changed the output to match that ofconsensus_genome_classification_unranked.py.filter_checkv_results.py- Option to use taxonomy and viral summaries generated bygeNomad.scaffolds_to_bins.py- Support for getting scaffolds to bins for a list of genomes via--genomesargument while maintaining original support with--binning_directoryargument.subset_table.py- Added option to set index column and to drop duplicates.virfinder_wrapper.r- Used to beVirFinder_wrapper.R. This now has an option to use FDR values instead of P values.merge_annotations_and_score_taxonomy.py- Completely rewritten. Usestaxopyinstead ofete3.merge_msa.py- Output uncompressed protein fasta files by default and can compress with--gzipflag.
- Deprecated:
adjust_genomes_for_cpr.pyfilter_checkm_results.pyfastani_to_clusters.pypartition_orthogroups.pypartition_clusters.pycompile_viral_classifications.pybuild_taxa_sqlite.py
- Added:
Miscellaneous:
- Updated environments and now add versions to environments.
- Added
mambato installation to speed up. - Added
transdecoder_wrapper.pywhich is a wrapper aroundTransDecoderwith direct support forDiamondandHMMSearchhomology searches. Also includesappend_geneid_to_transdecoder_gff.pywhich is run in the backend to clean up the GFF file and make them compatible with what is output byProdigalandMetaEukruns ofVEBA. - Added support for using
n_jobs -1to use all available threads (similar toscikit-learnmethodology).
- Python
Published by jolespin about 3 years ago
veba - VEBA_v1.0.4
Release v1.0.4
- Added
biopythontoVEBA-assembly_envwhich is needed when runningMEGAHITas the scaffolds are rewritten and an error was raised. aea51c3 - Updated Microeukaryotic protein database to exclude a few higher eukaryotes that were present in database, changed naming scheme to hash identifiers (from
cat reference.faa | seqkit fx2tab -s -n > id_to_hash.tsv). Switching database from FigShare to Zenodo. Uses database versionVDB_v3which has the updated microeukaryotic protein database (VDB-Microeukaryotic_v2) 0845ba6
- Python
Published by jolespin over 3 years ago
veba - VEBA_v1.0.3e
If you have 1.0.3 ≤ version < 1.0.3e, you can update easily on Patch Fix #1
Release v1.0.3e
- Patch fix for
install_veba.shwhereinstall/environments/VEBA-assembly_env.ymlraised a compatibilty error when creating theVEBA-assembly_envenvironment c2ab957 - Patch fix for
VirFinder_wrapper.Rwhere__version__ =variable was throwing an R error when runningbinning-viral.pymodule. 19e8f38 - Patch fix for
filter_busco_results.pywhere an error arose that produced emptyidentifier_mapping.metaeuk.tsvsubset tables. 359e4569 - Patch fix for
compile_metaeuk_identifiers.pywhere a Python error arised when duplicate gene identifiers were present. c248527 - Patch fix for
install_veba.shwhereinstall/environments/VEBA-preprocess_env.ymlraised a compatibilty error when creating theVEBA-preprocess_envenvironment 8ed6eea
- Added
biosynthetic.pymodule which runs antiSMASH and converts genbank files to tabular format. 6c0ed82 - Added
megahitsupport forassembly.pymodule (not yet available inassembly-sequential.py). 6c0ed82 - Changed
-P/--spades_programto-P/--programforassembly.py. 6c0ed82 - Replaced penultimate step in
binning-prokaryotic.pyto useadjust_genomes_for_cpr.pyinstead of the extremely long series of bash commands. This will make it easier to diagnose errors in this critical step. 6c0ed82 - Added support for contig descriptions and added MAG identifier in fasta files in
binning-eukaryotic.py. Now uses themetaeuk_wrapper.pyscript for theMetaEukstep. 6c0ed82 - Added separate option of
--run_metaplasmidspadesforassembly-sequential.pyinstead of making it mandatory (now it just runsbiosyntheticSPAdesandmetaSPAdesby default). 6c0ed82 - Added
--use_mag_as_descriptioninparition_gene_models.pyscript to include the MAG identifier in the contig description of the fasta header which is default inbinning-prokaryotic.py. 6c0ed82 - Added
adjust_genomes_for_cpr.pyscript to easier run and understand the CPR adjustment step ofbinning-prokaryotic.py. 6c0ed82 - Added support for fasta header descriptions in
binning-prokaryotic.py. 6c0ed82 - Added functionality to
replace_fasta_descriptions.pyscript to be able to use a string for replacing fasta headers in addition to the original functionality. 6c0ed82
- Python
Published by jolespin over 3 years ago
veba - VEBA_v1.0.2a
Release v1.0.2a
Not to be confused with v1.0.2 which is deprecated
- Updated GTDB-Tk in
VEBA-binning-prokaryotic_envfrom1.xto2.x(this version uses much less memory): f3507dd - Updated the GTDB-Tk database from
R202toR207_v2to be compatible with GTDB-Tk v2.x: f3507dd - Updated the GRCh38 no-alt analysis set to T2T CHM13v2.0 for the default human reference: 5ccb4e2
- Added an experimental
amplicon.pymodule for short-read ASV detection via the DADA2 workflow of QIIME2: cd4ed2b - Added additional functionality to
compile_reads_table.pyto handle advanced parsing of samples from fastq directories while also maintaining support for parsing filenames fromveba_output/preprocess: cd4ed2b - Added
sra-toolstoVEBA-preprocess_env: f3507dd - Fixed symlinks to scripts for
install_veba.sh: d1fad03 - Added missing
CHECKM_DATA_PATHenvironment variable toVEBA-binning-prokaryotic_envandVEBA-classify_env: d1fad03 - ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)
Module Versions:
amplicon.py __version__ = "2022.10.24"
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.10.20"
binning-prokaryotic.py __version__ = "2022.10.25"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.10.16"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2022.10.24"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/determine_trim_position.py __version__ = "2022.8.11"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.11.16"
scripts/fastq_position_statistics.py __version__ = "2022.10.24"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/replace_fasta_descriptions.py __version__ = "2022.9.1"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"
- Python
Published by jolespin over 3 years ago
veba - VEBA_v1.0.1
Small patch fix: * Fixed the fatal binning-eukaryotic.py error: https://github.com/jolespin/veba/commit/7c5addf9ed6e8e45502274dd353f20b211838a41 * Fixed the minor file naming in cluster.py: https://github.com/jolespin/veba/commit/58038451dac0791899aa7fca3f9d79454cb9ed46 * Removes left-over human genome tar.gz during database download/config: https://github.com/jolespin/veba/commit/58038451dac0791899aa7fca3f9d79454cb9ed46 * ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)
Module Versions:
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.10.20"
binning-prokaryotic.py __version__ = "2022.7.8"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.10.16"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2021.7.18"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.06.16"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"
- Python
Published by jolespin over 3 years ago
veba - VEBA_v1.0.0
Version released for manuscript submission.
- ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)
Module Versions:
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.7.8"
binning-prokaryotic.py __version__ = "2022.7.8"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.06.04"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2021.7.18"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.06.16"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"
- Python
Published by jolespin over 3 years ago