Recent Releases of bakta
bakta - v1.11.1
This is the first v1.11 patch release (v1.11.1).
Fixes
- Fixed wrong replicon ID matching for user-provided regions: #372 #373 (Thanks @amcomeau)
Improvements
- Deactivated warning
Numpy.core.getlimits.py UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero: #347 (Thanks @snail123815) - Improved argument checks for
bakta_db install: #374 (Thanks @pvanheus)
- Python
Published by oschwengers 8 months ago
bakta - v1.11 - Database ups and downs
This is the eleventh minor release (v1.11) introducing database schema version 6.
Compatible database scheme version: 6
Important
From this release on, Bakta will use xz instead of gz for all database files in order to mitigate increasing database file sizes.
- Python
Published by oschwengers about 1 year ago
bakta - v1.10.4
This is the fourth v1.10 patch release (v1.10.4).
Fixes
- Fixed wrong version tags and timestamps in recovered result file from
bakta_io: cbd38045a2a15a27e37d65e55404b6a8c38f299c - Added missing N90 info to recovered
bakta_io*.txt output files: c7a815574d2d51a7f27370117aa1aea8af7c3154
- Python
Published by oschwengers about 1 year ago
bakta - v1.10.2
This is the second v1.10 patch release (v1.10.2).
Improvements
- Made new
bakta_ioresult file recovery backward compatible tov1.9(at least): e068540cfbf41e4a55186c334a0b72932d29d158
Fixes
- Fixed wrong sequence type stored in REPLICON_CONTIG constant : #349
- Fixed plot warnings for truncated sequences: ff72dda4d6f1e10b5b72247164b2c78f24cf28e4
- Python
Published by oschwengers about 1 year ago
bakta - v1.10.1
This is the first v1.10 patch release (v1.10.1).
Improvements
- Improved handling of draft genome in circular plots: 01189833cf70ca9f1fe9e72773f7521a7bf729f0
- Added N90 to genome statistics: 7abfa9aa59e791104f652db316267f1635f408c5
Fixes
- Fixed changed Micromamba parameter in Dockerfile: #342 #345 (Thanks @pamelacamejom / @mjfos2r)
- Fixed INSDC
notequalifiers: #346 (Thanks @Dx-wmc)
- Python
Published by oschwengers over 1 year ago
bakta - v1.10 - Novel in & novel out
This is the tenth minor release (v1.10) introducing user-provided HMMs, output file recovery, feature inference scores, and various improvements.
Compatible database scheme version: 5
Important
Since v1.10.0, Bakta requires Python >=3.9, uses pyCirclize instead of Circos, and discarded support of DeepSig!
An important decision had to be made for this release regarding supported Python versions, external dependencies and features impacted by this. Both Circos and DeepSig seem to be out of support for a long time. Hence, Circos was replaced by pyCirclize, a pure-Python actively-maintained library, enabling a couple of new features. As a result, Bakta's Python dependency had to be bumped to >=3.9, thus unfortunately loosing compatibility with DeepSig. Dropping an existing feature feels odd and wrong, but as a developer, sticking to unmaintained external software for too long constantly increases your daily pain level and slows down the project as a whole. This hasn't been an easy decision, but a necessary. So, If you depend on the detection of signaling peptides, please keep using Bakta <=v1.9.4.
New features
- Accept user-provided trusted HMMs via
--hmms: #309 #327 (Thanks @StefDiV) - Recover output files based on Bakta's comprehensive
JSONoutput viabakta_io: #304 #339 (Thanks @cpauvert) - Write annotation inference metrics to new
*.inference.tsvfile: #314 #331 (Thanks @jvera888 / @gbouras13) - Deactivate feature overlap filters via
--skip-filter: #295 (Thanks @Daniel-Tichy)
Improvements
- Support translation table
25: #323 (Thanks @ndombrowski) - Export CRISPR and tRNA nucleotide sequences: #336 (Thanks @YoungerRen)
- Replacement of Circos by pyCirclize: #344 (Thanks @acarafat / @alexweisberg)
- Introduce
--label,--sizeand--dpiparameters inbakta_plotto customize plots: #344 - Add fixes for common submission errors: #330 (Thanks @kevinmyers)
- Improve INSDC compliance: 30b3928d62cab771f1b8f524946ba7c4aef9acf2 8ba8a146232861c28243b86f91624f4f455e042c
- Reduce the locus tag offset from 5 to 1 and allow specifications via
--locus-tag-increment: #279 (Thanks @1996xjm) - Revise internal/external designation of truncated features and (real) pseudogenes: #333
- Print msg if IPS and pseudogene detection is skipped using the db
lightversion: #320 (Thanks @flashton2003) - Review internal main data structure: #338
Fixes
- Fix import of user-provided CDS spanning sequence edges: #324 #332 (Thanks @aekazakov)
- Fix translated protein sequences for CDS truncated at both sides: #340 (Thanks @tpillone)
- Fix wrong truncation label: #341
- Fix import of user-provided regions in combination with a metadata replicon table: #326
Full Changelog: https://github.com/oschwengers/bakta/compare/v1.9.4...v1.10.0
- Python
Published by oschwengers over 1 year ago
bakta - v1.9.4
This is the fourth v1.9 patch release (v1.9.4).
Fixes
- Fixed CDS gene translation in meta mode: #301 (Thanks @ohickl)
- Fixed the CRISPR parser for short (<10 nt) spacer sequences: #299 #302 (Thanks @ohickl)
- Fixed Pyrodigal and Pyhmmer version detection: #294 (Thanks @EricDeveaud)
Improvements
- Skip certain genes in user-provided regions: #288 #303 (Thanks @Dx-wmc)
- Python
Published by oschwengers over 1 year ago
bakta - v1.9.3
This is the third v1.9 patch release (v1.9.3).
Fixes
- fixed a regex in the CRISPR parser skipping spacers in some special cases: c7478033265ccb2218582f9d3121825dbf7ce91a #265 (Thanks @ZarulHanifah)
- fixed wrong Piler-CR CRISPR array stop position omitting the last spacer: #276
- tmp. pinned Diamond version to
v2.1.8due to an upstream bug: 0dd84fb60d9fc03c280287af1d237da2fd7d74db
Improvements
- added parent ID to CRISPR repeat/spacer features in GFF3 outputs: ae71c09199981004db8c870d198e8ce11b2e60f8
- Python
Published by oschwengers almost 2 years ago
bakta - v1.9.2
This is the second v1.9 patch release (v1.9.2).
Fixes
- Changed tRNA pseudogene type to unknown in INSDC compliant mode: https://github.com/oschwengers/bakta/commit/bb8366aa9856994eca6687fd5325f09b278b4e6d
- Updated DB build scripts for v5.1: #270
- Allow uneven number of CRISPR spacers/repeats: #265 #267 https://github.com/oschwengers/bakta/commit/7c1a7e88cf3e2c56174e9aa3ab39374ee53c3011 (Thanks @ZarulHanifah, @marade)
- Fixed wrong complete sequence inference for some cases: https://github.com/oschwengers/bakta/commit/231788bfb3a98dfcd1f2668e9d0a0b36f670a1b0 https://github.com/oschwengers/bakta/commit/24a43b9d137cc0d210f6a98ed526b3cc63d66e7e
- Fixed minor issues with DB downloads/updates: https://github.com/oschwengers/bakta/commit/4d609de8a2b553fcbd43b57222edfcc04a65362a https://github.com/oschwengers/bakta/commit/1a0413c6562a26b71b343701fdb28141dad4eb18 https://github.com/oschwengers/bakta/commit/ff836d166be82ebe6216b7961e65a68132c5df50
- Python
Published by oschwengers about 2 years ago
bakta - v1.9.1
This is the first v1.9 patch release (v1.9.1).
Fixes
- Fixed a Python
KeyErrorwhen both--regionsand--keep-contig-headersare used: d3d7a98973e61d4f8fc5625ea41ee4984be4ec60 (Thanks @thorellk) - Fixed a bzip2 error in the Docker build process: 73ac39dd2bc3ee1063648ebc57322de038c37218 (Thanks @lukasjelonek)
- Python
Published by oschwengers over 2 years ago
bakta - v1.9 - Here's my region of interest
This is the ninth minor release (v1.9) introducing user-provided feature regions and various minor improvements.
Compatible database scheme version: 5
New features
- Support a priori user-provided feature regions via
--regionseither in Genbank or GFF3 format (currently, onlyCDSfeatures are supported). CDS coordinates are imported, supersede de novo predictedCDS, and are subject to the regular internal annotation workflow. To provide functional annotation, as well use--proteins: #216 #245 #247 #250 #259 (Thanks @marade @PengfanZhang @simone-pignotti @thorellk)
Improvements
- Extract & export CRISPR spacer & repeat sequences: #171 #249 (Thanks @alexweisberg)
- Add support for Pyrodigal v3: #240 #244 (Thanks @jsgounot)
- Replace HMMER with PyHMMER: #219 (Thanks @jhahnfeld)
- Re-activate parallel pyrodigal gene prediction: #252 (Thanks @althonos)
- Introduce auxiliary scripts: #246 #251 (Thanks @AhmedElsherbini)
- Add Podman wrapper script: bd50faabfe7a2cbad7c63f453baae854503a9d46
- Update dependencies to latest versions: 3260441ed181176c71e39125cb58a1e0df69b7dc 33c02f92305c51d3741172d3851bedc7463f83c8 a1d2ffeb465cc9afbb124fe903a877b3ef562d22 b2656722dd5199416b308291c6c7188ccd4f9f58 1a2c48ee177800e21d3ff3e2356fc253407ecd65
Fixes
- Fix PyPI CD: #231
- Add missing
--forceparameter to Docker wrapper script: f39434ed670d1310a144df9defcd3f556e36ffbf - Fix wrong runtime report: #243
Full Changelog: https://github.com/oschwengers/bakta/compare/v1.8.2...v1.9.0
- Python
Published by oschwengers over 2 years ago
bakta - v1.8.2
This is the second v1.8 patch release (v1.8.2).
Improvements:
- Replacd HMMER by PyHMMER: https://github.com/oschwengers/bakta/commit/3415e89924dc3699651cc959414f3c7bd5954de7 (Thanks @jhahnfeld)
- Refactored & improved CI scripts
- Refactored the code
- Tweaked code to expand required BioPython versions: https://github.com/oschwengers/bakta/commit/6a88c4bbbd2b2f2c7a7fb30f9de4bd2a67a980c0 (Thanks @alexweisberg)
- Deactivated --force parameter for the current working directory`: https://github.com/oschwengers/bakta/commit/a039bece7a8d76ed0d5cb1eeb94dd01b63d4403d
- Added Pyrodigal to dependency checks: https://github.com/oschwengers/bakta/commit/17bcfc99c48f5c221ee94e5259d52ab0074ad41a
Bug Fixes: - Fixed and improved the CWL wrapper: #221 #229 #230 (Thanks @bartns & @jjkoehorst)
- Python
Published by oschwengers over 2 years ago
bakta - v1.8.1
This is the first v1.8 patch release (v1.8.1) catching up an overlooked PR.
Improvements:
- Added all valid expert system hits to the final dbxrefs and JSON results: #198 #199 (Thanks @davidtong28)
Important:
199 introduced a breaking change in the JSON data structure of the feature->expert section. expert was changed from a dictionary (expert system -> hit) to a flat list of expert hits now having a new type field.
- Python
Published by oschwengers over 2 years ago
bakta - v1.8 - May the --force (parameter) be with you
This is the eighth minor release (v1.8) introducing a new output option and various minor improvements.
Compatible database scheme version: 5
Improvements:
- Introduced a new --force option explicitly allowing to overwrite existing data: #200 (Thanks @Dx-wmc)
- Increased sensitivity of protein sequence expert system: #197 (Thanks @davidtong28)
- Improved compatibility of FNA output with NCBI Bankit Submission: #201 (Thanks @menickname)
- Improved --plasmid parameter functionality: #201 https://github.com/oschwengers/bakta/commit/426bfd39c1f83cf8ed240b61be2b867452bcf573
- Introduced output of bakta_proteins full annotation results as JSON: #204 (Thanks @Rridley7)
- Refactored QC and description of imported genome sequences: https://github.com/oschwengers/bakta/commit/af835b4313d3cb5931d68616d71d0d2623d13afc
Fixes:
- Fixed rare occasions of wrong 5' / 3' ("prime") characters in product descriptions: #215 (Thanks @axbazin)
- Python
Published by oschwengers almost 3 years ago
bakta - v1.7 - Lightweight database & harmonized gene symbols
This is the seventh minor release (v1.7) introducing a lightweight database version, various gene symbol improvements, and a metagenome mode.
Compatible database scheme version: 5
New features: - Introduced a lightweight database version: https://github.com/oschwengers/bakta/pull/196 (Thanks @tseemann)) - Introduced an operon gene symbol harmonization feature: https://github.com/oschwengers/bakta/pull/190 - Introduced a simple metagenome mode: https://github.com/oschwengers/bakta/issues/191 - Added IS transposase to protein expert system: https://github.com/oschwengers/bakta/issues/10
Improvements: - Improved CDS gene symbols: https://github.com/oschwengers/bakta/issues/186 - Amended tRNA & rRNA gene symbols: https://github.com/oschwengers/bakta/issues/192 - Amended uppercase ncRNA gene symbols: https://github.com/oschwengers/bakta/issues/194 - Added model IDs and dbxrefs to expert annotation systems: https://github.com/oschwengers/bakta/issues/183 (Thanks @davidtong28) - Updated to Pyrodigal v2.1.0 fixing a bug in the SD motif-detection on reverse contig edges: https://github.com/oschwengers/bakta/commit/599fe709a090331ab6fb7bd3398a5a8ca9899688
Fixes: - Fixed system-wide db path stored in software volume: https://github.com/oschwengers/bakta/pull/177 (Thanks @standage)
- Python
Published by oschwengers about 3 years ago
bakta - v1.6.1
This is the first v1.6 patch release (v1.6.1) fixing 2 bugs.
Improvements: - Deactivated Circos' max contig limit: https://github.com/oschwengers/bakta/commit/0962df7243002875e8126aac85dd55b05034bdb6
Bug fixes:
- Fixed P(y)rodigal meta mode for short sequences and provided training files: https://github.com/oschwengers/bakta/issues/175 (Thanks @pimarin)
- Fixed an unbound variable in bakta_plot: https://github.com/oschwengers/bakta/issues/174 (Thanks @Rridley7)
- Python
Published by oschwengers about 3 years ago
bakta - v1.6 - Draw me a genome, using P(y)rodigal
This is the sixth minor release (v1.6) introducing the creation of circular genome/plasmid plots and fixing false de novo gene predictions by Prodigal.
Compatible database scheme version: 4
New features: - Creation of circular genome/plasmid plots using Circos: https://github.com/oschwengers/bakta/issues/163 https://github.com/oschwengers/bakta/pull/166
Improvements:
- Replace Prodigal by pyrodigal fixing false gene prediction scores on reverse strands: https://github.com/oschwengers/bakta/issues/149 https://github.com/oschwengers/bakta/pull/165 (Thanks @jhahnfeld, @althonos)
- Improve tRNA product descriptions including anticodons: https://github.com/oschwengers/bakta/issues/170 https://github.com/oschwengers/bakta/pull/173 (Thanks @acvill)
Fixes: - Fix issues with IUPAC ambiguity codes on tRNA prediction using tRNAscan-SE: https://github.com/oschwengers/bakta/issues/150
- Python
Published by oschwengers over 3 years ago
bakta - v1.5.1
This is the first v1.5 patch release (v1.5.1) fixing a crucial bug causing Diamond runtime errors during the pseudogene detection step. A patch upgrade is highly recommended!
Improvements:
- Added a --debug option in order to keep temporary files for debugging purposes: #137 #141 (Thanks @EricDeveaud)
- Bakta now obeys the number of available CPUs instead of mere CPU counts on Linux: #135 #139 (Thanks @EricDeveaud)
- The Docker image now allows the direct execution of Bakta via Singularity's exec mode in Nextflow: #138 #144 (Thanks @rujinlong & @lukasjelonek)
Bug fixes:
- Fixed an IndexError during the pseudogene detection: #130 #133 (Thanks @samnooij & @jhahnfeld)
- Fixed an off-by-1 error: #131 (Thanks @jhahnfeld)
- Python
Published by oschwengers over 3 years ago
bakta - v1.5 - Pseudogenes, they're coming...
This is the fifth minor release (v1.5) introducing the detection of pseudogenes and KEGG Kofams, along with several improvements.
Compatible database scheme version: 4
New features: - detection of CDS pseudogenes: #4 (Thanks @jhahnfeld) - pre-annotation of PSCs with KEGG's Kofams also massively increasing the number of available E.C. numbers: #9
Improvements: - pre-annotation of PSCs with NCBI's NCBIfams leading to many improved functional annotations: https://github.com/oschwengers/bakta/commit/e268df28e848fcc28c318062d6be70421245d5d7 #102 (Thanks @hkaspersen) - add CI tests for species / strain parameters: https://github.com/oschwengers/bakta/commit/a98670bad4a6fafd547a791f2f08de412525b929 - revert oriCVT inference tags in compliant mode: https://github.com/oschwengers/bakta/commit/4f31ab41bf97373d9483a7708ba4d2b16697dea7 - improve the functional pre-annotation: https://github.com/oschwengers/bakta/commit/46e645a9d20084ad9e784467ebea971c87223837 https://github.com/oschwengers/bakta/commit/de55d305d61e6f8fdb6926761a111c4d02b670c4 https://github.com/oschwengers/bakta/commit/1893220081b4f8452b0ab0337fb3973c41b9a538 https://github.com/oschwengers/bakta/commit/183557432c582df8c6c714fd166b58eb33972a57
- Python
Published by oschwengers over 3 years ago
bakta - v1.4.2
This is the second v1.4 patch release (v1.4.2) fixing wrong EC annotations in compliant mode.
Improvements: - added GFF3 inference tags in compliant mode for ncRNA region, CRISPR, oriC, oriV, oriT: https://github.com/oschwengers/bakta/commit/85c9c4215fed5ffa7e7e846270db1ecec41c9a37 https://github.com/oschwengers/bakta/commit/d81d9b73667a8ed96d78b246a02e70d7f94b8922 https://github.com/oschwengers/bakta/commit/a5b31af772593f2a63cc8acd0433ba9465bfa8dc - improved chromosome/plasmid auto detection: https://github.com/oschwengers/bakta/commit/9a83055a586a025e5f3e673ff09175f95504aede - refactored code: https://github.com/oschwengers/bakta/commit/606c23742362bfa2dcd755dbec461f45a69d152e https://github.com/oschwengers/bakta/commit/21e647038932ed8426afcca7245d5bcdeee0f31d https://github.com/oschwengers/bakta/commit/b36172c274e2c187aacba19b79434f25d611576c
Bug fixes: - fixed EC_number annotation in compliant mode: https://github.com/oschwengers/bakta/commit/2960e6bdac1bc35a36d8696a91eb5db335ec27fa
- Python
Published by oschwengers over 3 years ago
bakta - v1.4.1
This is the first v1.4 patch release (v1.4.1) fixing a tiny simple but critical bug causing all imported contigs being treated as complete sequences. A patch upgrade is highly recommended!
Improvements: - removed trailing asterik chars on AA import https://github.com/oschwengers/bakta/issues/97: https://github.com/oschwengers/bakta/commit/a447f327a4b7344883c1154e12ba7287ff8d86a2 - caught AA bulk annotation import errors https://github.com/oschwengers/bakta/issues/97: https://github.com/oschwengers/bakta/commit/58cedf148a847e946f21f2115b69244eba1cd692
Bug fixes: - fixed fasta import contig attribute types https://github.com/oschwengers/bakta/issues/108 - fixed dnaA/repA revisions https://github.com/oschwengers/bakta/issues/108 (Thanks @conmeehan)
- Python
Published by oschwengers almost 4 years ago
bakta - v1.4 - Some exceptional translations
This is the fourth minor release (v1.4) introducing the detection of translational exceptions and protein bulk annotations.
Compatible database scheme version: 3
New features: - detection & annotation of selenocysteine translational exceptions: #100 - support for protein bulk annotations (direct annotation of proteins w/o genomes): #101 (Thanks @conmeehan)
Improvements: - maintainability improvements like Python type hints code refactorings: #96 https://github.com/oschwengers/bakta/commit/943485c12fc60a0f07f2b72c38dae43c25e21155 - added a large(r) genome test dataset and a corresponding Nextflow script: https://github.com/oschwengers/bakta/commit/32e0f9e847f2b39abef58d3c3e694a4146d03f1e - add more CI tests: https://github.com/oschwengers/bakta/commit/ce2c68f972029e7b7b11782cb18c5b2fd199f653 https://github.com/oschwengers/bakta/commit/4549f2582ec6f7388fa6f6ad177ea288c1066a1c
Bug fixes: - fixed molecular weight in hypothetical tsv output https://github.com/oschwengers/bakta/commit/d83c1b8fa1604a98b36710d14c96514150a8e998
And of course all improvements and bug fixes from all v1.3.x patch releases.
- Python
Published by oschwengers almost 4 years ago
bakta - v1.3.3
This is the third v1.3 patch release (v1.3.3).
This is a quick fix reverting an alive-progress version bump to v2.1.0 that is not available via Conda. Sorry for any inconvenience caused.
Bug fixes:
- fixed a file permission issue on bakta_db update: https://github.com/oschwengers/bakta/commit/de42d3fad5cbe574b6c2b7a643ea246c247a2017
- pinned alive-progress version to v1.6.2 to fix potential API errors: https://github.com/oschwengers/bakta/commit/dbc79acc40b3999da916f2aaa319e2de3c432db0 https://github.com/oschwengers/bakta/commit/60ac3ed02537628bb9264d18e16080e4a9ee5b54
- Python
Published by oschwengers about 4 years ago
bakta - v1.3.2
This is the second v1.3 patch release (v1.3.2).
Bug fixes: - fixed a file permission issue on bakta_db update: https://github.com/oschwengers/bakta/commit/de42d3fad5cbe574b6c2b7a643ea246c247a2017 - bumped alive-progress version to >=v2.1.0 to fix potential API errors: https://github.com/oschwengers/bakta/commit/dbc79acc40b3999da916f2aaa319e2de3c432db0
- Python
Published by oschwengers about 4 years ago
bakta - v1.3.1
This is the first v1.3 patch release (v1.3.1).
Bug fixes:
- fixed the locus-tag checks: use a more relaxed format as default and INSDC restrictions on --compliant: https://github.com/oschwengers/bakta/commit/e2651c3c7c85efbbceacff1ac28482b3c636b536 (Thanks @taylorreiter)
- skip /pseudo annotated genes in --proteins-provided GenBank files: https://github.com/oschwengers/bakta/commit/0f0165d0909ef26be29477751db301e68754f233 (Thanks Charlotte Reuschel)
- Python
Published by oschwengers about 4 years ago
bakta - v1.3 - Quo vadis, protein?
This is the third minor release (v1.3) introducing a new feature and many improvements and bug fixes.
Compatible database scheme version: 3
New features: - prediction of signal peptides via DeepSig: #32 (Thanks @Anna-Rehm)
Improvements:
- provide summary with genome & annotation statistics as .txt file: #88 (Thanks @hkaspersen)
- accept user provided proteins in GenBank format: #89 (Thanks @Tonny-zhou)
- removed alignment gaps (-) in input sequences: #87 (Thanks @RotimiDada)
- improved sORF overlap filter runtime performance: https://github.com/oschwengers/bakta/commit/cb204fbe52539abd92059b67b08b5f3352dea0b0
- improved outputs: https://github.com/oschwengers/bakta/commit/76e07a4db073bf839a2280e8fad1951d44e6c9e8 https://github.com/oschwengers/bakta/commit/90bfe99c65565036c0943e70fd9daa4d2321f610
- added more checks & tests: https://github.com/oschwengers/bakta/commit/b779e1fda82302e9156af0c1be9a7285e263c21d https://github.com/oschwengers/bakta/commit/4e6c35d3177977e9cec9bcf6cc86116f55cdd590 https://github.com/oschwengers/bakta/commit/f4103f09b7dc6edb49164e82a83a9b931d42eee6 https://github.com/oschwengers/bakta/commit/08ebb037bca1e976d102fe293d775a70b80ebd36 https://github.com/oschwengers/bakta/commit/1672a876141711f22d1cc8dd930a6f397458616d https://github.com/oschwengers/bakta/commit/26a8ed2f30eff456b808f62b0646039f566ff3b7
- added --gram to CWL file: https://github.com/oschwengers/bakta/commit/22463196c55bb93f20f6161ade76512a0bb36b2e
Bug fixes: - fixed tmRNA predictions that cross sequence origins: #90 (Thanks @LuisFF) - moved oriC/T/V product to Note in INSDC and compliant GFF3 outputs: https://github.com/oschwengers/bakta/commit/2f5cb1dae8da57c6d00dfbe66c0b3d48769b128e https://github.com/oschwengers/bakta/commit/716302e63b66c34a0e30bcea83c8818b6a0d2255 https://github.com/oschwengers/bakta/commit/5e1a0d831fa4d948aa459a5b8340f2a18afa2a6c - fixed sequence description on Fasta import: https://github.com/oschwengers/bakta/commit/bfdf67f53ed0e3995a4cdaa8c227e12c9b122848 - fixed Fasta import log error: https://github.com/oschwengers/bakta/commit/52096727d857d44e98cfbb821094dbc8853b4a8b - fixed EDAM out types in CWL file: https://github.com/oschwengers/bakta/commit/a1a0ad116f1a0e8133aa41877f3b67af599820a6
And of course all improvements and bug fixes from all v1.2.x patch releases.
- Python
Published by oschwengers about 4 years ago
bakta - v1.2.4
This is the 4th v1.2 patch release (v1.2.4) fixing a database download issue.
Bug fixes: - fixed the database download/update logic by restoring an accidentally removed URL: https://github.com/oschwengers/bakta/commit/247a0475ecd7570e094e6c85f15c2f8893e88369 (Thanks @andreaniml)
- Python
Published by oschwengers about 4 years ago
bakta - v1.2.3
This is the third v1.2 patch release (v1.2.3) providing some minor improvements and a bug fix.
Compatible database scheme version: v3
Improvements: - added several CI argument tests: https://github.com/oschwengers/bakta/commit/e26b5f56d1aa02563bc71ca6d6b9700c181c07fa https://github.com/oschwengers/bakta/commit/f5c57efd63fb2bdbfb9185fc765299652fdb159c https://github.com/oschwengers/bakta/commit/af20f66b133d5fc70eaaae04af4099eaa737c2c0 https://github.com/oschwengers/bakta/commit/dfc9a66cbec94727a4da10badbf8b688f91cff35 https://github.com/oschwengers/bakta/commit/a37b5b54237a046f655c1b95049cb17d80c8a65c - added locus and locus-tag prefix argument checks: https://github.com/oschwengers/bakta/commit/842437b5050a23afbb5ce6c6708cae18d94efb77 https://github.com/oschwengers/bakta/commit/31fd72e7881aa23ad2c51496729ca0550b894540 https://github.com/oschwengers/bakta/commit/c530e4c1ad8bd0ba68c6302e04a507df546f4170 https://github.com/oschwengers/bakta/commit/8e27f6834beb4b028eacd8d0954237a15195755c https://github.com/oschwengers/bakta/commit/8c58851acebd01fc33e09dd291049e00384c1e60 - polished readme and error messages - added citation information
Bug fixes: - fixed the creation of a user defined tmp directory: https://github.com/oschwengers/bakta/commit/753e6e80d49ddfe285f6225c8f3347877bafe526
- Python
Published by oschwengers over 4 years ago
bakta - v1.2.2
This is the second v1.2 patch release (v1.2.2) providing a couple of improvements and bug fixes.
Compatible database scheme version: v3
Improvements:
- set BLASTUSAGEREPORT environment variable to false forestalling blastn to hang for 90 secs if no internet connection is available
- add a check for duplicated input sequence IDs: https://github.com/oschwengers/bakta/commit/888bb9b06bda62f88d8a93e0dfbd85eb7cf50e3e (Thanks @joyn-sromero)
- add further SQLite URI parameters to improve read-only access: https://github.com/oschwengers/bakta/commit/e658a2ea2e0f48434eade93198d9c6c346480cb2
- synchronize Conda environment dependencies and runtime dependency version checks: https://github.com/oschwengers/bakta/commit/95ef71fd40dbeb471a3c45e04bc389ccf2ca08df https://github.com/oschwengers/bakta/commit/c516656661eaedaa933c1689a0ec6dd4ddc8267b
- add AMRFinder protein hits to dbxref outputs: https://github.com/oschwengers/bakta/commit/505a2ce8743eb227e8422567f779d6d3c46b33b3
Bug fixes:
- fix NA issue on HMM-only AMRFinder hits: https://github.com/oschwengers/bakta/commit/71e2dac3c17136e55c258dff8cf2cfe700b2c670 https://github.com/oschwengers/bakta/commit/e9ff01cead6524c4c96a09fec1d8bd7df144d23e (Thanks @taylorreiter)
- Python
Published by oschwengers over 4 years ago
bakta - v1.2.0 - User provided proteins arrived ...
This is the second minor release (v1.2.0) introducing user provided proteins as well as many improvements.
Compatible database scheme version: 3 v3.0
New features:
- introduce --proteins accepting a user provided set of trusted protein sequences: https://github.com/oschwengers/bakta/commit/992d9295ab1e11ab12d5e7fd80e60d41db610b77 https://github.com/oschwengers/bakta/commit/1541f1059b0b99de9d5677e2f55683e53650f2b9 https://github.com/oschwengers/bakta/commit/691646defc0388571675deb4ebbec73fe0e45ce2 (Thanks to @Tonny-zhou)
Improvements:
- revise Dbxrefs for UniParc & UniRef: https://github.com/oschwengers/bakta/commit/02b5dc8d9453f5fe704d48405660ff592e7d9e02
- revise truncated dnaA/repA genes on rotated replicon sequences: https://github.com/oschwengers/bakta/commit/0c4ad10f29988cd616d94b3515a293569d4169e1 (Thanks to Jochen Blom)
- assign locus tags to gene features only: https://github.com/oschwengers/bakta/commit/c1e1c0584ab18abaa205ad8c9536cb1fad8f1f26
- extract nt seqs for ncrna-regions: https://github.com/oschwengers/bakta/commit/f77e308ec2b51e162ba603df67637b78d012c985
- generate unique feature IDs used as ID in GFF3: https://github.com/oschwengers/bakta/commit/df32038309228443e4bc714b0f572a19abb8a82d
- add products to oriC/oriV/oriT features: https://github.com/oschwengers/bakta/commit/2eabd93c4435bb8abd98b6d5dfde07b5b395691d
- reduce numbers of similar to aa sequence inference qualifiers in INSDC outputs to 1: https://github.com/oschwengers/bakta/commit/16401da73b76ab6c7228511875742cc82e4cdaf4
- improve and fix CWL file: https://github.com/oschwengers/bakta/commit/6776ca790242d9c6e96850d4f5beaa2eff1d826c https://github.com/oschwengers/bakta/commit/1d18e7740fc6cb378c80a144caaa00ad2085fa36 https://github.com/oschwengers/bakta/commit/5c80d15a3609d13b4280fb97ad2247d450e35eec
- exrtact & store Pfam HMM hits on hypothetical proteins: https://github.com/oschwengers/bakta/commit/2ba985a15cfeb9d07e8517a37ae003ead4e1fa7c
- Python
Published by oschwengers over 4 years ago
bakta - v1.1.1
This is the first v1.1 patch release (v1.1.1) providing a couple of improvements and minor bug fixes.
Compatible database scheme version: v3
Improvements: - added a revision & refinement logic for gene symbols: https://github.com/oschwengers/bakta/commit/539137be5cd82b7b01960b09f1a69e2a6536ae17 - added further CDS product revision rules: https://github.com/oschwengers/bakta/commit/f8ec2aa69c9b01da542e0cd49a4878bb6281be3f https://github.com/oschwengers/bakta/commit/1fcca891087e16d0ab5b6844ac21a230901e09d4 https://github.com/oschwengers/bakta/commit/8915af7d5fc81e5e7d306bbc8e1e18ca4c717b10 https://github.com/oschwengers/bakta/commit/626981bb9a385f890c8f817f45d7d31e75d8f50b https://github.com/oschwengers/bakta/commit/321391b1d0a336541c03732a28c95e99ca4dc0b6 https://github.com/oschwengers/bakta/commit/0d93c2fd0edf1affa22c172fbe87045bc16ec16f https://github.com/oschwengers/bakta/issues/69 (Thanks @michoug)
Bug fixes:
- fixed AMRFinderDB issue in docker shell script: https://github.com/oschwengers/bakta/commit/e8253616dd37ad55ff61df64a7494253351692a2
- fixed None values of CDS rbs_motif attributes: https://github.com/oschwengers/bakta/commit/0727b2551207e1bf57514f3824572f004a11fd9e
- skip CDS annotation if no CDS are predicted: https://github.com/oschwengers/bakta/commit/3787e01a2e44eb5c80d6a19d8350a6bce3732415
- fix several CDS product revision rules
- Python
Published by oschwengers over 4 years ago
bakta - v1.1 - INSDC submission & annotation of MAGs: Hey! Ho! Let's go
This is the first minor release (v1.1) introducing several new features, many improvements and countless bug fixes.
Compatible database scheme version: 3 v3.0
New features:
- add new --compliant option for INSDC genome submissions: https://github.com/oschwengers/bakta/issues/69 (Thanks @michoug)
- introduce PSCCs (UniRef50) as a fallback if PSCs (UniRef90) are not detected, greatly improving the annotation of less represented species and metagenome-assembled genomes (MAGs): https://github.com/oschwengers/bakta/commit/84c808fb343cadb36b4afdbdd56dbf416b7512e3
- export nucleotide sequences: https://github.com/oschwengers/bakta/issues/57 (Thanks @mcroxen)
- revise suspect CDS product names: https://github.com/oschwengers/bakta/commit/f85970b7a9b6a55336b91570955777c062f03852
Improvements:
- various improvements and fixes in GFF3, GenBank and EMBL files to adhere to INSDC specs: https://github.com/oschwengers/bakta/issues/69
- improve internal DB download: https://github.com/oschwengers/bakta/commit/bdd665aa87bcb14d7ddaba22744b45f9b700b74a
- use Diamond version v2.0.11 and its --fast option: https://github.com/oschwengers/bakta/commit/c82f4e21aae75ecc8d48146aab8499c5967d8b1b https://github.com/oschwengers/bakta/commit/19c95d802cd038500e53fdfa30353420c89f5480 (Thanks to @bbuchfink for https://github.com/bbuchfink/diamond/issues/419)
- use stable CLI progress library alive-progress https://github.com/oschwengers/bakta/commit/0ac56256dda51e16fabce08b4500048bd1ea5d79
- improve GFF3 output regarding GFF3 specs: https://github.com/oschwengers/bakta/commit/697121f6192bbee883b3102578891d2de6beb2db
- store AMRFinderPlus DB within the Bakta DB directory: https://github.com/oschwengers/bakta/commit/04ff7b00c0b4b71738072710355a45a8f9fa8137 (Thanks to @LuisFF)
- redirect BAKTA-TMP directory to AMRFinderPlus to prevent stale files: https://github.com/oschwengers/bakta/commit/26f81fcc82e1001ce2021011fcbb40416a9c11c2
- detect & mark tmRNA on sequence edges: https://github.com/oschwengers/bakta/commit/782f640f920368216920f9bf8d0ff4f219c3d91d
- adhere to translation table in tmRNA prediction: https://github.com/oschwengers/bakta/commit/450d6b7224469e4bf39cbad9ac24d3e9e3dbc9f1
Bug fixes: - fix var name in Docker shell script: https://github.com/oschwengers/bakta/commit/e6bab09dba5ff9ea478242edb6b2863e5dd2e122 (Thanks @joyn-sromero) - fix inflated blastn thread numbers: https://github.com/oschwengers/bakta/commit/e9073e188c5888701bbcf5ab800058d191384463 (Thanks @LuisFF) - fix location of features on sequence edges: https://github.com/oschwengers/bakta/commit/34873e7bb91984a34d43d69bf2eb9b24573ab289 - fix BLAST+ tool name in GFF3: https://github.com/oschwengers/bakta/commit/efff065f212f9a627995291b0590f8bc6e55b9fb and many more ...
- Python
Published by oschwengers over 4 years ago
bakta - v1.0.4
This is the fourth v1.0 patch release (v1.0.4) fixing a couple of minor bugs and adding further minor improvements.
Compatible database scheme version: v2.0
Bug fixes: - fixed an hmmsearch error if no CDS/sORF remain as hypotheticals: https://github.com/oschwengers/bakta/commit/9d4fc71672bf394d871209276c628641aa3b31e4 (Thanks Matthew Croxen) - fixed a threadpool issue upon single core executions: https://github.com/oschwengers/bakta/commit/631fac67f5d3831ea108a4b1990a2e825274cbc1
Improvements:
- added GFF3 feature IDs to features w/o locus_tag: https://github.com/oschwengers/bakta/commit/08b8eae6fd35140b461db2717261d9015aa2abcb (Thanks @ZarulHanifah #54)
- add clipping of UniParc DB prefixes: https://github.com/oschwengers/bakta/commit/c861af8c86ea2eba1db0a55c6a3bc030d4f45c4e (Thanks Jochen Blom)
- Python
Published by oschwengers over 4 years ago
bakta - v1.0.3
This is the 3rd v1.0 patch release (v1.0.3).
Compatible database scheme version: v2.0
Bug fixes:
- activated the lookup of PSC if all CDS/sORF are identified as IPS which led to large amounts of false hypotheticals: https://github.com/oschwengers/bakta/commit/12df1af41ce7f92259551570eff399d76de5e1b3
Improvements: - skip further analysis of spurious CDS/sORF: https://github.com/oschwengers/bakta/commit/f22e114234750957054ee4e94fef77309b74acc3
- Python
Published by oschwengers over 4 years ago
bakta - v1.0.2
This is the second v1.0 patch release (v1.0.2) fixing a couple of minor bugs and adding further minor improvements.
Compatible database scheme version: v2.0
Bug fixes:
- fix a JSON NaN serialization issue: https://github.com/oschwengers/bakta/commit/f8ff6b96b9c92cc54a9ed253e56252179e16dc6e (Thanks @lukasjelonek )
- fix calculation of mol weight and isoelectric points for hypotheticals: https://github.com/oschwengers/bakta/commit/b443e54dd66d2c7e2b6bdd9cbfdca242259952ba
- fix duplicated annotation of ncRNA region features: https://github.com/oschwengers/bakta/commit/30df23cf4b09728c6e9fde07d14c4d84d9b8e1cc
Improvements: - accept ambiguity NT codes in input fasta files: https://github.com/oschwengers/bakta/commit/d4f615bca6d529dda08f97ea235f80fc1b755279 - add more tests: https://github.com/oschwengers/bakta/commit/d8e67c0ee60daf56259b2aa357f39764ad050faf (Thanks @Anna-Rehm ) - only write INSDC chromosome tags only on explicit names: https://github.com/oschwengers/bakta/commit/efaf5cf593195eb5c33bd3b24964112b3c68a7e9 - write locus tag based GFF3 IDs to CRISPR and oriC/oriT/oriV: https://github.com/oschwengers/bakta/commit/04ed19535d79a3c3b5f9f6682d2659270f21fac4 https://github.com/oschwengers/bakta/commit/7a8242efa7b0be0fb145910a00a5858dcd7d3880 (Thanks @ZarulHanifah https://github.com/oschwengers/bakta/issues/54)
- Python
Published by oschwengers over 4 years ago
bakta - v1.0.1
This is the first v1.0 patch release (v1.0.1) fixing some bugs & typos and add a couple of minor improvements.
Compatible database scheme version: v2.0
Bug fixes: - fix calculation of GC & N50 stats: https://github.com/oschwengers/bakta/commit/2829a24c7a01c94bcd4bf7d93319f2139b10765f (Thanks Matthew Croxen) - fix a typo in GenBank/EMBL output: https://github.com/oschwengers/bakta/commit/1595133c2aede5a853397b0499c123676905d6e7 (Thanks @davised)
Improvements: - download/update AMRFinderPlus db upon Bakta db download/update: https://github.com/oschwengers/bakta/commit/94ccc695e2604dce19581d5bd1d2ed46f02d28c9 - add a short note to error msg if AMRFinderPlus db is not available: https://github.com/oschwengers/bakta/commit/750396feff7edb12c4e4c96e3c1ee2a15470adac - add more sORF unit tests: https://github.com/oschwengers/bakta/commit/685a3bc045f4fec51d96f6d754989b5c9c31601b - fix link to old db files: https://github.com/oschwengers/bakta/commit/140e34884ce555590cccef70bb8757562bcb463b (Thanks @davised) - readme updates & improvements
- Python
Published by oschwengers almost 5 years ago
bakta - v1.0 - In all beginnings dwells a magic force
This is the first official and stable v1.0 release introducing many new features and countless bug fixes.
Compatible database scheme version: 2 v2.0
New features and improvements over 0.5:
- expert annotation systems comprising AMRFinderPlus, BlastRules and VFDB
- support for EMBL flat files; INSDC-compliant and submission-ready, validated via ENA Webin-CLI
- db download from within Bakta: bakta_db --help
- parallelized sORF overlap filter significantly reduces wall clock runtimes
- replicon files can also be provided as CSV
- more integration tests added
- code reviews & optimizations
- updated readme and usage
as well as countless bug fixes.
- Python
Published by oschwengers almost 5 years ago
bakta - v1.0-rc3
This is the 3rd v1.0 release candidate introducing expert annotation systems incorporating AMRFinderPlus, BlastRules and VFDB.
Bug fixes over rc2:
- several INSDC qualifier issues (1d0f6dcc8a388935881bcdb001507fda05076c27 1d0f6dcc8a388935881bcdb001507fda05076c27)
- INSDC tRNA anticodon and pseudo qualifier issues (ff668d5ee3b8fbdda5f5eddb01b5c033da550c61 fb03559c160bcf921fea1931b6c3c8ede406f57d)
- lacking python __init__ file (0560102fe7c1fc57fe2900bc75dde955e827d976)
Plus several minor bug fixes & improvements.
- Python
Published by oschwengers almost 5 years ago
bakta - v1.0-rc2
This is the 2nd v1.0 release candidate introducing expert annotation systems incorporating AMRFinderPlus, BlastRules and VFDB.
Improvements over rc1: - added parameter argument checks - added more tests - updated usage and readme
Bug fixes: - fixed name of main entry module
- Python
Published by oschwengers almost 5 years ago
bakta - v1.0-rc1
This is a v1.0 release candidate introducing expert annotation systems incorporating AMRFinderPlus, BlastRules and VFDB.
Further improvements: - db download logic - parallelized sORF overlap filter - support for EMBL flat file output - replicon files can also be provided as csv - more integration tests added - several code reviews & optimizations
Bug fixes: - fix max sORF length to 29
- Python
Published by oschwengers almost 5 years ago
bakta - v0.5 - Hypotheticals: rule them all(most)
Improvements:
- analysis & output of hypothetical proteins added (pfam detection & calculation of sequence statistics)
- improved CI
- improved DB compilation process
- added annotation summary to GenBank output files
- added CWL file
Bug fixes: - fix error msgs
- Python
Published by oschwengers about 5 years ago
bakta - v0.4 - Let there be tests...
Improvements: - automated integration tests added - genome & plasmid test sequences added - massively improved fasta output performance - several genbank flat file output improvements - code reviews - adhere to translation table in sORF detection
Bug fixes: - fix annotations of truncated CDS - fix truncated CDS positions in GenBank
- Python
Published by oschwengers over 5 years ago
bakta -
Fix breaking bug in annotation statistics.
- Python
Published by oschwengers over 5 years ago
bakta - Patching the first bugs
This is a patch release comprising the following fixes: - fix Prodigal training file logic (creation & usage) - fix coding density calculation on edge features
as well as some minor improvements:
- introduce omit_readlock PRAGMA for SQLite queries
- add genome completeness info to JSON output
- Python
Published by oschwengers over 5 years ago
bakta - Initial Release
The first public release for broad testing & debugging.
- Python
Published by oschwengers over 5 years ago