Recent Releases of tesseract-ocr

tesseract-ocr - 5.5.1

What's Changed

  • Fix linear congruential random number generator by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4357
  • Make list classes templated by @egorpugin in https://github.com/tesseract-ocr/tesseract/pull/4356
  • add cli -c parameter(s) to init vectors by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/4363
  • handle colormaps correctly by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/4369
  • use constexpr for kDawgMagicNumber by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/4378
  • Extend elfauxinfo() support for RISC-V on FreeBSD and OpenBSD by @brad0 in https://github.com/tesseract-ocr/tesseract/pull/4376
  • Fix building elfauxinfo() support on OpenBSD/arm by @brad0 in https://github.com/tesseract-ocr/tesseract/pull/4383
  • Fix invalid empty interval in punct_stripped() for all-punctuation words by @EnodoGH in https://github.com/tesseract-ocr/tesseract/pull/4404
  • Fix function addAvailableLanguages (issue #4416) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4417
  • Avoid error in pixSauvolaBinarizeTiled (issue #4390) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4418
  • Fix duplicated IDs in ALTO XML when multiple pages are present by @jankal in https://github.com/tesseract-ocr/tesseract/pull/4386
  • Remove unused include statements for tprintf.h by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4419
  • Use links to the git history and online release notes instead of local ChangeLog by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4423

New Contributors

  • @brad0 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4376
  • @EnodoGH made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4404
  • @jankal made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4386

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.5.0...5.5.1

- C++
Published by stweil 9 months ago

tesseract-ocr - 5.5.0

What's Changed

  • Fix TARGETPDBFILE error for static linking. by @hglee in https://github.com/tesseract-ocr/tesseract/pull/4271
  • Make regular usage of CMAKEINSTALLLIBDIR and GNUInstallDirs by @Zopolis4 in https://github.com/tesseract-ocr/tesseract/pull/4272
  • Ignore illegal TESSDATA_PREFIX (not existing filesystem entry, issue #4277) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4278
  • Fix confidence output for the PAGE XML renderer by @JKamlah in https://github.com/tesseract-ocr/tesseract/pull/4283
  • Set hOCR capabilities ocrpdir and ocrplang unconditionally by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4301
  • Reduce clock syscalls by @heshpdx in https://github.com/tesseract-ocr/tesseract/pull/4303
  • Calculate row bounding box in single-word mode per #4304 by @Balearica in https://github.com/tesseract-ocr/tesseract/pull/4305
  • Replace access/_access by std::filesystem::exists by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4307
  • Modernize code for list of available models by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4308
  • Fix performance and other issues reported by Codacy by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4309
  • Remove unnecessary assignment and assertions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4313
  • Update code for tprintf by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4306
  • Add C++ stream for log messages and use it in two debug messages by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4314
  • cmake: Correctly set the soversion based on SemVer properties by @Conan-Kudo in https://github.com/tesseract-ocr/tesseract/pull/4319
  • Replace deprecated runner macos-12 by macos-latest in GitHub actions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4326
  • Modernize code for renderers and remove filename conversion for Windows by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4330
  • Fix some typos and grammer issues by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4337
  • Add GitHub action and Makefile target for Windows installer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4341
  • Support symbolic values for --oem and --psm options by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4344
  • Replace some tprintf by tesserr stream (fixes Windows compiler warnings) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4345
  • Add RISC-V V support #4346
  • Fix and improve Windows installer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4348
  • Remove Tensorflow support by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4350
  • Update submodule googletest to release v1.15.2 by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4352

New Contributors

  • @hglee made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4271
  • @Zopolis4 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4272
  • @Conan-Kudo made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4319
  • @hleft made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4346

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.4.1...5.5.0

- C++
Published by stweil over 1 year ago

tesseract-ocr - 5.4.1

What's Changed

This release fixes a regression with legacy or mixed models (issue #4257).

  • Avoid FP overflow in NormEvidenceOf (fixes issue #4257) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4259
  • Update deprecated Node.js 16 GitHub actions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4262
  • Fix code style issues which were reported by Codacy by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4263
  • Fix some issues which were reported by Codacy by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4266
  • Fix more Codacy issues by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4267
  • Several build fixes by @zdenop

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.4.0...5.4.1

- C++
Published by stweil over 1 year ago

tesseract-ocr - 5.4.0

What's Changed

This releases provides an improved PDF renderer, adds a new PAGE XML renderer, extends the API to retrieve the text angle/gradient and has lots of smaller updates for code and documentation:

  • Update appveyor.yml - Url has changed by @softwaretirol in https://github.com/tesseract-ocr/tesseract/pull/4188
  • Fix grey result of indexed PNG in pdfrenderer. by @sjbronner in https://github.com/tesseract-ocr/tesseract/pull/4189
  • Fix some typos by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4191
  • normstrngs: add more hyphens and quotes by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/4195
  • Rename frk -> deu_latf (ISO 639-3, ISO 15924) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4202
  • Fix some performance issues which were reported by Coverity Scan by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4204
  • Remove broken Dockerfile by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4205
  • PAGE XML renderer / export by @JKamlah in https://github.com/tesseract-ocr/tesseract/pull/4214
  • Remove unsupported OpenCL code and related API functions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4220
  • facilitate vectorization for generic build by @heshpdx in https://github.com/tesseract-ocr/tesseract/pull/4223
  • Support training without lstmf files by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4215
  • Simplify GridSearch<...> variables using typedef equivalents by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4226
  • Use std::min and std::max for min & max operations in makerow.cpp::most_overlapping_row() by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4229
  • Fix a few typos in comments by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4227
  • Remove an unused variable in paragraphs.cpp::DetectParagraphs() by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4228
  • A few refactors in some files by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4225
  • Fix output and issues reported by Coverity Scan for PAGE XML renderer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4234
  • Update documentation by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4235
  • Fix some issues which were reported by GitHub code scanning by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4236
  • Improve CCUtil::main_setup (fixes issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4239
  • Allow for text angle/gradient to be retrieved by @Balearica in https://github.com/tesseract-ocr/tesseract/pull/4070
  • Fix setup of datadir on installations with Conda (issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4240
  • Fix FP exception in Wordrec::angle_change (issue #4242) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4243
  • Use AM_CPPFLAGS also for compilation of all sources by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4244
  • Fix some compiler warnings by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4245
  • Remove unused xmlns:xlink from ALTO renderer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4241
  • Fix some compiler warnings by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4246
  • Fixes #4247: remove unnecessary nullptr checks by @hribz in https://github.com/tesseract-ocr/tesseract/pull/4248
  • Avoid redundant conversion from std::string to char * to std::string by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4249
  • Replace strcpy and strncpy by new inline helper function by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4250
  • Make function Network::spec pure virtual by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4253

New Contributors

  • @softwaretirol made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4188
  • @sjbronner made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4189
  • @JKamlah made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4214
  • @heshpdx made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4223
  • @Balearica made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4070
  • @hribz made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4248

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.4...5.4.0

- C++
Published by stweil over 1 year ago

tesseract-ocr - 5.4.0-rc1

What's Changed

  • Fix setup of datadir on installations with Conda (issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4240
  • Fix FP exception in Wordrec::angle_change (issue #4242) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4243
  • Update sw.yml

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.4.0-rc1...5.4.0-rc2

- C++
Published by stweil almost 2 years ago

tesseract-ocr - 5.4.0-rc1

What's Changed

This releases provides an improved PDF renderer, adds a new PAGE XML renderer, extends the API to retrieve the text angle/gradient and has lots of smaller updates for code and documentation:

  • Update appveyor.yml - Url has changed by @softwaretirol in https://github.com/tesseract-ocr/tesseract/pull/4188
  • Fix grey result of indexed PNG in pdfrenderer. by @sjbronner in https://github.com/tesseract-ocr/tesseract/pull/4189
  • Fix some typos by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4191
  • normstrngs: add more hyphens and quotes by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/4195
  • Rename frk -> deu_latf (ISO 639-3, ISO 15924) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4202
  • Fix some performance issues which were reported by Coverity Scan by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4204
  • Remove broken Dockerfile by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4205
  • PAGE XML renderer / export by @jkamlah in https://github.com/tesseract-ocr/tesseract/pull/4214
  • Remove unsupported OpenCL code and related API functions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4220
  • facilitate vectorization for generic build by @heshpdx in https://github.com/tesseract-ocr/tesseract/pull/4223
  • Support training without lstmf files by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4215
  • Simplify GridSearch<...> variables using typedef equivalents by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4226
  • Use std::min and std::max for min & max operations in makerow.cpp::most_overlapping_row() by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4229
  • Fix a few typos in comments by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4227
  • Remove an unused variable in paragraphs.cpp::DetectParagraphs() by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4228
  • A few refactors in some files by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4225
  • Fix output and issues reported by Coverity Scan for PAGE XML renderer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4234
  • Update documentation by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4235
  • Fix some issues which were reported by GitHub code scanning by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4236
  • Improve CCUtil::main_setup (fixes issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4239
  • Allow for text angle/gradient to be retrieved by @Balearica in https://github.com/tesseract-ocr/tesseract/pull/4070

New Contributors

  • @softwaretirol made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4188
  • @sjbronner made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4189
  • @JKamlah made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4214
  • @heshpdx made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4223
  • @Balearica made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4070

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.4...5.4.0-rc1

- C++
Published by stweil almost 2 years ago

tesseract-ocr - 5.3.4

What's Changed

  • Fixes for autoconf, clang and sw builds
  • Send output of combine_tessdata -d to stdout instead of stderr. Fixes #4149 by @tfmorris in https://github.com/tesseract-ocr/tesseract/pull/4150
  • Move bail_out function before libtoolize check by @STMiki in https://github.com/tesseract-ocr/tesseract/pull/4151
  • Improve OCR for an image URL
    • Fail on curl download errors
    • Add new parameter curlcookiefile for curleasy_setopt by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4156
    • Set User-Agent: header field in HTTP request for curl downloads
  • Force TCP v4 for socket to ScrollView server. Fixes #3000 by @tfmorris in https://github.com/tesseract-ocr/tesseract/pull/4162
  • Fix some compiler warnings and avoid unnecessary conversions from std::string to char pointer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4174
  • Fix a tiny typo in publictypes.h by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4178
  • Fixes for autoconf, clang and sw builds
  • Other small improvements for code and documentation.

New Contributors

  • @STMiki made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4151
  • @sadra-barikbin made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4178

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.3...5.3.4

- C++
Published by stweil about 2 years ago

tesseract-ocr - 5.3.3

What's Changed

  • Disable -mfpu=neon for aarch64 by @hesmar in https://github.com/tesseract-ocr/tesseract/pull/4098
  • Fix build without git clone in cloned directory by @pkubaj in https://github.com/tesseract-ocr/tesseract/pull/4099
  • Fix some issues which were reported by Coverity Scan by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4097
  • Update ScrollView.java by @Parryword in https://github.com/tesseract-ocr/tesseract/pull/4103
  • Fix some code comments by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4113
  • Optimize function ImageFind::FindImages by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4114
  • Rename BibTex file to please GitHub by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4115
  • Fix Broken URLs in citations.bib by @kevinunger in https://github.com/tesseract-ocr/tesseract/pull/4118
  • initDSProfile: correct std::vector usage by @stima in https://github.com/tesseract-ocr/tesseract/pull/4124
  • Fix typo in stepblob.h by @eltociear in https://github.com/tesseract-ocr/tesseract/pull/4133
  • Fix regression in layout detection since 5.0.0 (fixes issue #4014) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4136
  • Update ScrollView.java by @Parryword in https://github.com/tesseract-ocr/tesseract/pull/4104
  • Fix loading of sublangs (regression) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4141

New Contributors

  • @hesmar made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4098
  • @Parryword made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4103
  • @kevinunger made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4118
  • @stima made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4124
  • @eltociear made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4133

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.2...5.3.3

- C++
Published by stweil over 2 years ago

tesseract-ocr - 5.3.2

What's Changed

  • fix: Fix snap package building by @brlin-tw in https://github.com/tesseract-ocr/tesseract/pull/4043
  • Support for Sgaw and W Pwo Karen languages in the Myanmar validator. by @ben417 in https://github.com/tesseract-ocr/tesseract/pull/4065
  • Replace bool array by more compact vector by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4067
  • Replace deprecated sprintf by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4068
  • Improve format of logging from lstmtraining by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4066
  • Clean code by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4071
  • Abort with error message if OSD is requested with LSTM-only model by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4073
  • Fix typos by @luzpaz in https://github.com/tesseract-ocr/tesseract/pull/4096

New Contributors

  • @ben417 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4065
  • @luzpaz made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4096

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.1...5.3.2

- C++
Published by stweil over 2 years ago

tesseract-ocr - 5.3.1

What's Changed

  • Update README.md by @seupedro in https://github.com/tesseract-ocr/tesseract/pull/3992
  • Fix FP division by zero (issue #3995) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3996
  • Fix linkage of icu and pango by @autoantwort in https://github.com/tesseract-ocr/tesseract/pull/4006
  • Fix build with gcc 13 by including by @kraj in https://github.com/tesseract-ocr/tesseract/pull/4009
  • msvc debug: fix wrong lib name in generated pkgconfig file by @autoantwort in https://github.com/tesseract-ocr/tesseract/pull/4008
  • Fix libdir in tesseract.pc from CMake by @ferdnyc in https://github.com/tesseract-ocr/tesseract/pull/4013
  • Replace 'can not' by 'cannot' by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4015
  • Readme: Link to list of supported languages by @tooomm in https://github.com/tesseract-ocr/tesseract/pull/4027
  • Improve the DebugDump output by slightly adjusting the format. by @GerHobbelt in https://github.com/tesseract-ocr/tesseract/pull/4022
  • Fix issue #4010 by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/4041

New Contributors

  • @seupedro made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3992
  • @autoantwort made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4006
  • @kraj made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4009
  • @ferdnyc made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4013
  • @tooomm made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4027

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.0...5.3.1

- C++
Published by stweil almost 3 years ago

tesseract-ocr - 5.3.0

This is a new minor version of Tesseract 5.

What's Changed

  • Fix memory issues in ScrollView::MessageReceiver by @p12tic in https://github.com/tesseract-ocr/tesseract/pull/3872
  • autotools: Add rule for svpaint executable by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3873
  • Replace call of exit function by return statement in main function by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3878
  • Fix the build on CodeQL/Analyze by @arseniy-sonar in https://github.com/tesseract-ocr/tesseract/pull/3888
  • CI: Remove Ubuntu 18.04 by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3902
  • configure.ac: fix build on aarch64_be by @ffontaine in https://github.com/tesseract-ocr/tesseract/pull/3907
  • SW CI: Add paths filter by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3908
  • Create .mailmap by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3910
  • Fix tesseract.pc from cmake to match autotools by @jeroen in https://github.com/tesseract-ocr/tesseract/pull/3930
  • Update README.md by @nicholasz2510 in https://github.com/tesseract-ocr/tesseract/pull/3935
  • Fixed 2 errors by @Gitoffthelawn in https://github.com/tesseract-ocr/tesseract/pull/3938
  • fix issue #3940 - remove colormap before thresholding by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/3942
  • Update upload-artifact action by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3949
  • Update checkout action to version 3 by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3948
  • Fix Markdownlint by @Saibamen in https://github.com/tesseract-ocr/tesseract/pull/3950
  • Fix broken links in CONTRIBUTING.md by @doraeric in https://github.com/tesseract-ocr/tesseract/pull/3951
  • pdfrenderer.cpp: Ignore non-text blocks by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3959
  • lstm.train: allow .box from .raw.png too by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/3962
  • Fix a number of performance issues (reported by Coverity Scan) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3967
  • Fix training tools for legacy engine (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3970
  • Fix function tesseract::WriteFeature (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3972
  • Modernize function ObjectCache::DeleteUnusedObjects (fix issue with s… by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3978
  • More fixes for issue #3925 by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3977

New Contributors

  • @p12tic made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3872
  • @arseniy-sonar made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3888
  • @nicholasz2510 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3935
  • @rettinghaus made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3949
  • @Saibamen made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3950
  • @doraeric made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3951

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.2.0...5.3.0

- C++
Published by stweil about 3 years ago

tesseract-ocr - 5.3.0-rc1

What's Changed

  • Fix memory issues in ScrollView::MessageReceiver by @p12tic in https://github.com/tesseract-ocr/tesseract/pull/3872
  • autotools: Add rule for svpaint executable by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3873
  • Replace call of exit function by return statement in main function by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3878
  • Fix the build on CodeQL/Analyze by @arseniy-sonar in https://github.com/tesseract-ocr/tesseract/pull/3888
  • CI: Remove Ubuntu 18.04 by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3902
  • configure.ac: fix build on aarch64_be by @ffontaine in https://github.com/tesseract-ocr/tesseract/pull/3907
  • SW CI: Add paths filter by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3908
  • Create .mailmap by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3910
  • Fix tesseract.pc from cmake to match autotools by @jeroen in https://github.com/tesseract-ocr/tesseract/pull/3930
  • Update README.md by @nicholasz2510 in https://github.com/tesseract-ocr/tesseract/pull/3935
  • Fixed 2 errors by @Gitoffthelawn in https://github.com/tesseract-ocr/tesseract/pull/3938
  • fix issue #3940 - remove colormap before thresholding by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/3942
  • Update upload-artifact action by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3949
  • Update checkout action to version 3 by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3948
  • Fix Markdownlint by @Saibamen in https://github.com/tesseract-ocr/tesseract/pull/3950
  • Fix broken links in CONTRIBUTING.md by @doraeric in https://github.com/tesseract-ocr/tesseract/pull/3951
  • pdfrenderer.cpp: Ignore non-text blocks by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3959
  • lstm.train: allow .box from .raw.png too by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/3962
  • Fix a number of performance issues (reported by Coverity Scan) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3967
  • Fix training tools for legacy engine (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3970
  • Fix function tesseract::WriteFeature (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3972
  • Modernize function ObjectCache::DeleteUnusedObjects (fix issue with s… by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3978
  • More fixes for issue #3925 by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3977

New Contributors

  • @p12tic made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3872
  • @arseniy-sonar made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3888
  • @nicholasz2510 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3935
  • @rettinghaus made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3949
  • @Saibamen made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3950
  • @doraeric made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3951

Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.2.0...5.3.0-rc1

- C++
Published by stweil about 3 years ago

tesseract-ocr - 5.2.0

This is a new minor version of Tesseract 5.

  • Improvements and fixes for continuous integration, autoconf and cmake builds.
  • Set /Os for some 32 bit MS compilers (fixes #3769).
  • Improve comments and other documentation.
  • Add initial support for Intel AVX512F.
  • Fix for very large PDF files on 32 bit hosts (fixes #3805).
  • Fix NEON detection on FreeBSD.
  • Fix regression with UZN files (fixes #3837).
  • Fix calling delete[] for memory allocated by malloc in C API.
  • Add an API function to init tesseract with traineddata from memory (fixes #3691).
  • Replace direct access to Leptonica internal data structures by function calls and support latest releases of Leptonica.
  • Replace std::regex by std::string functions (fixes issue #3830).
  • Use compiled-in TESSDATA_PREFIX also on Windows (fixes #3767).
  • Add new parameter 'invertthreshold', change the default threshold from 0.5 to 0.7 and mark parameter 'tesseditdo_invert' as deprecated.

See also list of all changes.

- C++
Published by stweil over 3 years ago

tesseract-ocr - 5.1.0

This is a new minor version of Tesseract 5.

  • Handle image and line regions in output formats ALTO, hOCR and text.
  • New parameter curltimeout for curleasy_setop.
  • Build fixes and improvements.
  • Catch nullptr in PageIterator::Orientation to improve robustness.
  • Remove unused code.

See also list of all changes.

- C++
Published by stweil almost 4 years ago

tesseract-ocr - 5.0.1

This is a bug fix release of Tesseract 5.0.

  • Add SPDX-License-Identifier to public include files.
  • Support redirections when running OCR on a URL.
  • Lots of fixes and improvements for cmake builds. Distributions should use the autoconf build.
  • Fix broken msys2 build with gcc 11.
  • Fix parameter certainty_scale (was duplicated).
  • Fix some compiler warnings and clean code.
  • Correctly detect amd64 and i386 on FreeBSD.
  • Add libarchive and libcurl in continuous integration actions.
  • Update submodule googletest to release v1.11.0.

See also list of all changes.

- C++
Published by stweil about 4 years ago

tesseract-ocr - 5.0.0

This is the final stable release of Tesseract 5.0.0.

  • Limit BCER to interval [0,1]
  • Improved build process
  • Cleaned code

See also list of all changes.

- C++
Published by stweil about 4 years ago

tesseract-ocr - 5.0.0-rc3

This is the third release candidate of Tesseract 5.0.0.

  • Improve training messages
  • Add RowAttributes getter to PageIterator

See also list of all changes.

- C++
Published by stweil about 4 years ago

tesseract-ocr - 4.1.3

This is a new stable release of Tesseract 4.1.

  • Fix broken autoconf build (issue #3642)

See also list of all changes.

- C++
Published by stweil over 4 years ago

tesseract-ocr - 4.1.2

This is a new stable release of Tesseract 4.1.

Note: The autoconf build is broken (see issue #3642), so please use 4.1.3.

  • Allow line images with larger width for training
  • Bug fixes
  • Build updates and fixes

See also list of all changes.

- C++
Published by stweil over 4 years ago

tesseract-ocr - 5.0.0-rc2

This is the second release candidate of Tesseract 5.0.0.

  • Fix regression for OCR with more than one model file
  • Bug fixes
  • Optimizations

See also list of all changes.

- C++
Published by stweil over 4 years ago

tesseract-ocr - 5.0.0-rc1

This is the first release candidate of Tesseract 5.0.0.

  • Enable fast float32 LSTM by default
  • Switch to NFC normalisation everywhere
  • Remove banner message
  • Disable music staff detection and removal
  • Add new command line option --loglevel
  • Bug fixes

See also list of all changes.

- C++
Published by stweil over 4 years ago

tesseract-ocr - 5.0.0-beta-20210916

This is a new pre-release of Tesseract 5.0.0.

  • Bug fixes
  • Extend URI support for Tesseract with libcurl
  • Rename processed TIFF output file and add page number if needed

See also list of all changes.

- C++
Published by stweil over 4 years ago

tesseract-ocr - 5.0.0-beta-20210815

This is a new pre-release of Tesseract 5.0.0.

  • Bug fixes
  • Modernize more code
  • More options for binarization
  • Improved support for ARM NEON
  • No longer depends on Abseil for unit tests
  • Support float for model training and text recognition (faster, requires less RAM)

See also list of all changes.

- C++
Published by stweil over 4 years ago

tesseract-ocr - 5.0.0-alpha-20210401

This is a new pre-release of Tesseract 5.0.0.

  • Replaced all remaining STRING by std::string
  • Replaced lots of GenericVector by std::vector
  • Replaced all malloc / free by C++ code
  • Modernized and formatted code

See also list of all changes.

- C++
Published by stweil almost 5 years ago

tesseract-ocr - 5.0.0-alpha-20201231

This is a new pre-release of Tesseract 5.0.0.

It has massive changes in the public API which is a great step towards a final 5.0.0. All unit tests pass, but because of those changes more practical experience is needed.

  • the public API no longer uses proprietary data types GenericVector, STRING
  • pdf.ttf is no longer needed because it is now integrated into the code

See also list of all changes.

- C++
Published by stweil about 5 years ago

tesseract-ocr - 5.0.0-alpha-20201224

This is a new pre-release of Tesseract 5.0.0.

It is considered to be production ready for end users, but nevertheless not stable because more incompatible API changes are planned.

  • improved performance (also on ARM / ARM64)
  • improved unit tests
  • many fixes
  • faster flat build with automake
  • support for latest macOS (including new M1 processor)

See also list of all changes.

- C++
Published by stweil about 5 years ago

tesseract-ocr - 4.1.1 Release

  • Implemented sw build (cppan is deprecated)
  • Improved cmake build
  • Code cleanup and optimization
  • A lot of bug fixes...

- C++
Published by zdenop about 6 years ago

tesseract-ocr - 4.1.0 Release

  • Added new renderers Alto, LSTMBox, WordStrBox.
  • Added character boxes in hOCR output.
  • Added python training scripts (experimental) as alternative shell scripts.
  • Better support AVX / AVX2 / SSE.
  • Disable OpenMP support by default (see e.g. #1171, #1081).
  • Fix for bounding box problem.
  • Implemented support for whitelist/blacklist in LSTM engine.
  • Improved cmake configuration.
  • Code modernization and improvements.
  • A lot of bug fixes...

Detailed changelog is on wiki.

Windows installer can be downloaded from https://github.com/UB-Mannheim/tesseract/wiki.

- C++
Published by zdenop over 6 years ago

tesseract-ocr - 4.0.0 Release

Detailed Release notes, Changelog and documentation can be found in project wiki.

Windows installer can be downloaded from https://github.com/UB-Mannheim/tesseract/wiki.

- C++
Published by zdenop over 7 years ago

tesseract-ocr - 3.05.02 Release

Bug fix release

- C++
Published by zdenop over 7 years ago

tesseract-ocr - 3.05.01 Release

Bug fix release

- C++
Published by zdenop over 8 years ago

tesseract-ocr - 3.05.00 Release

  • Made some fine tuning to the hOCR output.
    • Added TSV as another optional output format.
    • Fixed ABI break introduced in 3.04.00 with the AnalyseLayout() method.
    • text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer.
    • Training tools - Replaced asserts with tprintf() and exit(1).
    • Fixed Cygwin compatibility.
    • Improved multipage tiff processing.
    • Improved the embedded pdf font (pdf.ttf).
    • Enable selection of OCR engine mode from command line.
    • Changed tesseract command line parameter '-psm' to '--psm'.
    • Added new C API for orientation and script detection, removed the old one.
    • Increased minimum autoconf version to 2.59.
    • Removed dead code.
    • Fixed many compiler warning.
    • Fixed memory and resource leaks.
    • Fixed some issues with the 'Cube' OCR engine.
    • Fixed some openCL issues.
    • Added option to build Tesseract with CMake build system.
    • Implemented CPPAN support for easy Windows building.

- C++
Published by zdenop about 9 years ago

tesseract-ocr - 3.04.01 release

bug-fix release of 3.04 version

- C++
Published by zdenop about 10 years ago

tesseract-ocr - 3.04.00 release

  • Added OpenCL support (experimental)
  • Many bug fixes

- C++
Published by zdenop over 10 years ago