Recent Releases of tesseract-ocr
tesseract-ocr - 5.5.1
What's Changed
- Fix linear congruential random number generator by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4357
- Make list classes templated by @egorpugin in https://github.com/tesseract-ocr/tesseract/pull/4356
- add cli
-cparameter(s) to init vectors by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/4363 - handle colormaps correctly by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/4369
- use constexpr for kDawgMagicNumber by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/4378
- Extend elfauxinfo() support for RISC-V on FreeBSD and OpenBSD by @brad0 in https://github.com/tesseract-ocr/tesseract/pull/4376
- Fix building elfauxinfo() support on OpenBSD/arm by @brad0 in https://github.com/tesseract-ocr/tesseract/pull/4383
- Fix invalid empty interval in punct_stripped() for all-punctuation words by @EnodoGH in https://github.com/tesseract-ocr/tesseract/pull/4404
- Fix function addAvailableLanguages (issue #4416) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4417
- Avoid error in pixSauvolaBinarizeTiled (issue #4390) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4418
- Fix duplicated IDs in ALTO XML when multiple pages are present by @jankal in https://github.com/tesseract-ocr/tesseract/pull/4386
- Remove unused include statements for tprintf.h by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4419
- Use links to the git history and online release notes instead of local ChangeLog by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4423
New Contributors
- @brad0 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4376
- @EnodoGH made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4404
- @jankal made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4386
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.5.0...5.5.1
- C++
Published by stweil 9 months ago
tesseract-ocr - 5.5.0
What's Changed
- Fix TARGETPDBFILE error for static linking. by @hglee in https://github.com/tesseract-ocr/tesseract/pull/4271
- Make regular usage of CMAKEINSTALLLIBDIR and GNUInstallDirs by @Zopolis4 in https://github.com/tesseract-ocr/tesseract/pull/4272
- Ignore illegal TESSDATA_PREFIX (not existing filesystem entry, issue #4277) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4278
- Fix confidence output for the PAGE XML renderer by @JKamlah in https://github.com/tesseract-ocr/tesseract/pull/4283
- Set hOCR capabilities ocrpdir and ocrplang unconditionally by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4301
- Reduce clock syscalls by @heshpdx in https://github.com/tesseract-ocr/tesseract/pull/4303
- Calculate row bounding box in single-word mode per #4304 by @Balearica in https://github.com/tesseract-ocr/tesseract/pull/4305
- Replace access/_access by std::filesystem::exists by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4307
- Modernize code for list of available models by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4308
- Fix performance and other issues reported by Codacy by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4309
- Remove unnecessary assignment and assertions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4313
- Update code for tprintf by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4306
- Add C++ stream for log messages and use it in two debug messages by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4314
- cmake: Correctly set the soversion based on SemVer properties by @Conan-Kudo in https://github.com/tesseract-ocr/tesseract/pull/4319
- Replace deprecated runner macos-12 by macos-latest in GitHub actions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4326
- Modernize code for renderers and remove filename conversion for Windows by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4330
- Fix some typos and grammer issues by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4337
- Add GitHub action and Makefile target for Windows installer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4341
- Support symbolic values for --oem and --psm options by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4344
- Replace some tprintf by tesserr stream (fixes Windows compiler warnings) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4345
- Add RISC-V V support #4346
- Fix and improve Windows installer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4348
- Remove Tensorflow support by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4350
- Update submodule googletest to release v1.15.2 by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4352
New Contributors
- @hglee made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4271
- @Zopolis4 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4272
- @Conan-Kudo made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4319
- @hleft made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4346
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.4.1...5.5.0
- C++
Published by stweil over 1 year ago
tesseract-ocr - 5.4.1
What's Changed
This release fixes a regression with legacy or mixed models (issue #4257).
- Avoid FP overflow in NormEvidenceOf (fixes issue #4257) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4259
- Update deprecated Node.js 16 GitHub actions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4262
- Fix code style issues which were reported by Codacy by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4263
- Fix some issues which were reported by Codacy by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4266
- Fix more Codacy issues by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4267
- Several build fixes by @zdenop
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.4.0...5.4.1
- C++
Published by stweil over 1 year ago
tesseract-ocr - 5.4.0
What's Changed
This releases provides an improved PDF renderer, adds a new PAGE XML renderer, extends the API to retrieve the text angle/gradient and has lots of smaller updates for code and documentation:
- Update appveyor.yml - Url has changed by @softwaretirol in https://github.com/tesseract-ocr/tesseract/pull/4188
- Fix grey result of indexed PNG in pdfrenderer. by @sjbronner in https://github.com/tesseract-ocr/tesseract/pull/4189
- Fix some typos by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4191
- normstrngs: add more hyphens and quotes by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/4195
- Rename frk -> deu_latf (ISO 639-3, ISO 15924) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4202
- Fix some performance issues which were reported by Coverity Scan by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4204
- Remove broken Dockerfile by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4205
- PAGE XML renderer / export by @JKamlah in https://github.com/tesseract-ocr/tesseract/pull/4214
- Remove unsupported OpenCL code and related API functions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4220
- facilitate vectorization for generic build by @heshpdx in https://github.com/tesseract-ocr/tesseract/pull/4223
- Support training without lstmf files by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4215
- Simplify
GridSearch<...>variables using typedef equivalents by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4226 - Use
std::minandstd::maxfor min & max operations inmakerow.cpp::most_overlapping_row()by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4229 - Fix a few typos in comments by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4227
- Remove an unused variable in
paragraphs.cpp::DetectParagraphs()by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4228 - A few refactors in some files by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4225
- Fix
output and issues reported by Coverity Scan for PAGE XML renderer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4234 - Update documentation by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4235
- Fix some issues which were reported by GitHub code scanning by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4236
- Improve CCUtil::main_setup (fixes issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4239
- Allow for text angle/gradient to be retrieved by @Balearica in https://github.com/tesseract-ocr/tesseract/pull/4070
- Fix setup of datadir on installations with Conda (issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4240
- Fix FP exception in Wordrec::angle_change (issue #4242) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4243
- Use AM_CPPFLAGS also for compilation of all sources by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4244
- Fix some compiler warnings by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4245
- Remove unused xmlns:xlink from ALTO renderer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4241
- Fix some compiler warnings by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4246
- Fixes #4247: remove unnecessary nullptr checks by @hribz in https://github.com/tesseract-ocr/tesseract/pull/4248
- Avoid redundant conversion from std::string to char * to std::string by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4249
- Replace strcpy and strncpy by new inline helper function by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4250
- Make function Network::spec pure virtual by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4253
New Contributors
- @softwaretirol made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4188
- @sjbronner made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4189
- @JKamlah made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4214
- @heshpdx made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4223
- @Balearica made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4070
- @hribz made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4248
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.4...5.4.0
- C++
Published by stweil over 1 year ago
tesseract-ocr - 5.4.0-rc1
What's Changed
- Fix setup of datadir on installations with Conda (issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4240
- Fix FP exception in Wordrec::angle_change (issue #4242) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4243
- Update sw.yml
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.4.0-rc1...5.4.0-rc2
- C++
Published by stweil almost 2 years ago
tesseract-ocr - 5.4.0-rc1
What's Changed
This releases provides an improved PDF renderer, adds a new PAGE XML renderer, extends the API to retrieve the text angle/gradient and has lots of smaller updates for code and documentation:
- Update appveyor.yml - Url has changed by @softwaretirol in https://github.com/tesseract-ocr/tesseract/pull/4188
- Fix grey result of indexed PNG in pdfrenderer. by @sjbronner in https://github.com/tesseract-ocr/tesseract/pull/4189
- Fix some typos by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4191
- normstrngs: add more hyphens and quotes by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/4195
- Rename frk -> deu_latf (ISO 639-3, ISO 15924) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4202
- Fix some performance issues which were reported by Coverity Scan by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4204
- Remove broken Dockerfile by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4205
- PAGE XML renderer / export by @jkamlah in https://github.com/tesseract-ocr/tesseract/pull/4214
- Remove unsupported OpenCL code and related API functions by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4220
- facilitate vectorization for generic build by @heshpdx in https://github.com/tesseract-ocr/tesseract/pull/4223
- Support training without lstmf files by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4215
- Simplify
GridSearch<...>variables using typedef equivalents by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4226 - Use
std::minandstd::maxfor min & max operations inmakerow.cpp::most_overlapping_row()by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4229 - Fix a few typos in comments by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4227
- Remove an unused variable in
paragraphs.cpp::DetectParagraphs()by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4228 - A few refactors in some files by @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4225
- Fix
output and issues reported by Coverity Scan for PAGE XML renderer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4234 - Update documentation by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4235
- Fix some issues which were reported by GitHub code scanning by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4236
- Improve CCUtil::main_setup (fixes issue #4230) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4239
- Allow for text angle/gradient to be retrieved by @Balearica in https://github.com/tesseract-ocr/tesseract/pull/4070
New Contributors
- @softwaretirol made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4188
- @sjbronner made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4189
- @JKamlah made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4214
- @heshpdx made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4223
- @Balearica made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4070
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.4...5.4.0-rc1
- C++
Published by stweil almost 2 years ago
tesseract-ocr - 5.3.4
What's Changed
- Fixes for autoconf, clang and sw builds
- Send output of combine_tessdata -d to stdout instead of stderr. Fixes #4149 by @tfmorris in https://github.com/tesseract-ocr/tesseract/pull/4150
- Move bail_out function before libtoolize check by @STMiki in https://github.com/tesseract-ocr/tesseract/pull/4151
- Improve OCR for an image URL
- Fail on curl download errors
- Add new parameter curlcookiefile for curleasy_setopt by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4156
- Set User-Agent: header field in HTTP request for curl downloads
- Force TCP v4 for socket to ScrollView server. Fixes #3000 by @tfmorris in https://github.com/tesseract-ocr/tesseract/pull/4162
- Fix some compiler warnings and avoid unnecessary conversions from std::string to char pointer by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4174
- Fix a tiny typo in
publictypes.hby @sadra-barikbin in https://github.com/tesseract-ocr/tesseract/pull/4178 - Fixes for autoconf, clang and sw builds
- Other small improvements for code and documentation.
New Contributors
- @STMiki made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4151
- @sadra-barikbin made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4178
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.3...5.3.4
- C++
Published by stweil about 2 years ago
tesseract-ocr - 5.3.3
What's Changed
- Disable -mfpu=neon for aarch64 by @hesmar in https://github.com/tesseract-ocr/tesseract/pull/4098
- Fix build without git clone in cloned directory by @pkubaj in https://github.com/tesseract-ocr/tesseract/pull/4099
- Fix some issues which were reported by Coverity Scan by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4097
- Update ScrollView.java by @Parryword in https://github.com/tesseract-ocr/tesseract/pull/4103
- Fix some code comments by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4113
- Optimize function ImageFind::FindImages by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4114
- Rename BibTex file to please GitHub by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4115
- Fix Broken URLs in citations.bib by @kevinunger in https://github.com/tesseract-ocr/tesseract/pull/4118
- initDSProfile: correct std::vector usage by @stima in https://github.com/tesseract-ocr/tesseract/pull/4124
- Fix typo in stepblob.h by @eltociear in https://github.com/tesseract-ocr/tesseract/pull/4133
- Fix regression in layout detection since 5.0.0 (fixes issue #4014) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4136
- Update ScrollView.java by @Parryword in https://github.com/tesseract-ocr/tesseract/pull/4104
- Fix loading of sublangs (regression) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4141
New Contributors
- @hesmar made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4098
- @Parryword made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4103
- @kevinunger made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4118
- @stima made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4124
- @eltociear made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4133
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.2...5.3.3
- C++
Published by stweil over 2 years ago
tesseract-ocr - 5.3.2
What's Changed
- fix: Fix snap package building by @brlin-tw in https://github.com/tesseract-ocr/tesseract/pull/4043
- Support for Sgaw and W Pwo Karen languages in the Myanmar validator. by @ben417 in https://github.com/tesseract-ocr/tesseract/pull/4065
- Replace bool array by more compact vector by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4067
- Replace deprecated sprintf by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4068
- Improve format of logging from lstmtraining by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4066
- Clean code by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4071
- Abort with error message if OSD is requested with LSTM-only model by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4073
- Fix typos by @luzpaz in https://github.com/tesseract-ocr/tesseract/pull/4096
New Contributors
- @ben417 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4065
- @luzpaz made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4096
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.1...5.3.2
- C++
Published by stweil over 2 years ago
tesseract-ocr - 5.3.1
What's Changed
- Update README.md by @seupedro in https://github.com/tesseract-ocr/tesseract/pull/3992
- Fix FP division by zero (issue #3995) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3996
- Fix linkage of icu and pango by @autoantwort in https://github.com/tesseract-ocr/tesseract/pull/4006
- Fix build with gcc 13 by including
by @kraj in https://github.com/tesseract-ocr/tesseract/pull/4009 - msvc debug: fix wrong lib name in generated pkgconfig file by @autoantwort in https://github.com/tesseract-ocr/tesseract/pull/4008
- Fix libdir in tesseract.pc from CMake by @ferdnyc in https://github.com/tesseract-ocr/tesseract/pull/4013
- Replace 'can not' by 'cannot' by @stweil in https://github.com/tesseract-ocr/tesseract/pull/4015
- Readme: Link to list of supported languages by @tooomm in https://github.com/tesseract-ocr/tesseract/pull/4027
- Improve the DebugDump output by slightly adjusting the format. by @GerHobbelt in https://github.com/tesseract-ocr/tesseract/pull/4022
- Fix issue #4010 by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/4041
New Contributors
- @seupedro made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3992
- @autoantwort made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4006
- @kraj made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4009
- @ferdnyc made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4013
- @tooomm made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/4027
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.3.0...5.3.1
- C++
Published by stweil almost 3 years ago
tesseract-ocr - 5.3.0
This is a new minor version of Tesseract 5.
What's Changed
- Fix memory issues in ScrollView::MessageReceiver by @p12tic in https://github.com/tesseract-ocr/tesseract/pull/3872
- autotools: Add rule for svpaint executable by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3873
- Replace call of exit function by return statement in main function by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3878
- Fix the build on CodeQL/Analyze by @arseniy-sonar in https://github.com/tesseract-ocr/tesseract/pull/3888
- CI: Remove Ubuntu 18.04 by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3902
- configure.ac: fix build on aarch64_be by @ffontaine in https://github.com/tesseract-ocr/tesseract/pull/3907
- SW CI: Add paths filter by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3908
- Create .mailmap by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3910
- Fix tesseract.pc from cmake to match autotools by @jeroen in https://github.com/tesseract-ocr/tesseract/pull/3930
- Update README.md by @nicholasz2510 in https://github.com/tesseract-ocr/tesseract/pull/3935
- Fixed 2 errors by @Gitoffthelawn in https://github.com/tesseract-ocr/tesseract/pull/3938
- fix issue #3940 - remove colormap before thresholding by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/3942
- Update upload-artifact action by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3949
- Update checkout action to version 3 by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3948
- Fix Markdownlint by @Saibamen in https://github.com/tesseract-ocr/tesseract/pull/3950
- Fix broken links in CONTRIBUTING.md by @doraeric in https://github.com/tesseract-ocr/tesseract/pull/3951
- pdfrenderer.cpp: Ignore non-text blocks by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3959
- lstm.train: allow .box from .raw.png too by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/3962
- Fix a number of performance issues (reported by Coverity Scan) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3967
- Fix training tools for legacy engine (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3970
- Fix function tesseract::WriteFeature (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3972
- Modernize function ObjectCache::DeleteUnusedObjects (fix issue with s… by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3978
- More fixes for issue #3925 by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3977
New Contributors
- @p12tic made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3872
- @arseniy-sonar made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3888
- @nicholasz2510 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3935
- @rettinghaus made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3949
- @Saibamen made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3950
- @doraeric made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3951
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.2.0...5.3.0
- C++
Published by stweil about 3 years ago
tesseract-ocr - 5.3.0-rc1
What's Changed
- Fix memory issues in ScrollView::MessageReceiver by @p12tic in https://github.com/tesseract-ocr/tesseract/pull/3872
- autotools: Add rule for svpaint executable by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3873
- Replace call of exit function by return statement in main function by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3878
- Fix the build on CodeQL/Analyze by @arseniy-sonar in https://github.com/tesseract-ocr/tesseract/pull/3888
- CI: Remove Ubuntu 18.04 by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3902
- configure.ac: fix build on aarch64_be by @ffontaine in https://github.com/tesseract-ocr/tesseract/pull/3907
- SW CI: Add paths filter by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3908
- Create .mailmap by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3910
- Fix tesseract.pc from cmake to match autotools by @jeroen in https://github.com/tesseract-ocr/tesseract/pull/3930
- Update README.md by @nicholasz2510 in https://github.com/tesseract-ocr/tesseract/pull/3935
- Fixed 2 errors by @Gitoffthelawn in https://github.com/tesseract-ocr/tesseract/pull/3938
- fix issue #3940 - remove colormap before thresholding by @zdenop in https://github.com/tesseract-ocr/tesseract/pull/3942
- Update upload-artifact action by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3949
- Update checkout action to version 3 by @rettinghaus in https://github.com/tesseract-ocr/tesseract/pull/3948
- Fix Markdownlint by @Saibamen in https://github.com/tesseract-ocr/tesseract/pull/3950
- Fix broken links in CONTRIBUTING.md by @doraeric in https://github.com/tesseract-ocr/tesseract/pull/3951
- pdfrenderer.cpp: Ignore non-text blocks by @amitdo in https://github.com/tesseract-ocr/tesseract/pull/3959
- lstm.train: allow .box from .raw.png too by @bertsky in https://github.com/tesseract-ocr/tesseract/pull/3962
- Fix a number of performance issues (reported by Coverity Scan) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3967
- Fix training tools for legacy engine (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3970
- Fix function tesseract::WriteFeature (issue #3925) by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3972
- Modernize function ObjectCache::DeleteUnusedObjects (fix issue with s… by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3978
- More fixes for issue #3925 by @stweil in https://github.com/tesseract-ocr/tesseract/pull/3977
New Contributors
- @p12tic made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3872
- @arseniy-sonar made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3888
- @nicholasz2510 made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3935
- @rettinghaus made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3949
- @Saibamen made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3950
- @doraeric made their first contribution in https://github.com/tesseract-ocr/tesseract/pull/3951
Full Changelog: https://github.com/tesseract-ocr/tesseract/compare/5.2.0...5.3.0-rc1
- C++
Published by stweil about 3 years ago
tesseract-ocr - 5.2.0
This is a new minor version of Tesseract 5.
- Improvements and fixes for continuous integration, autoconf and cmake builds.
- Set /Os for some 32 bit MS compilers (fixes #3769).
- Improve comments and other documentation.
- Add initial support for Intel AVX512F.
- Fix for very large PDF files on 32 bit hosts (fixes #3805).
- Fix NEON detection on FreeBSD.
- Fix regression with UZN files (fixes #3837).
- Fix calling delete[] for memory allocated by malloc in C API.
- Add an API function to init tesseract with traineddata from memory (fixes #3691).
- Replace direct access to Leptonica internal data structures by function calls and support latest releases of Leptonica.
- Replace std::regex by std::string functions (fixes issue #3830).
- Use compiled-in TESSDATA_PREFIX also on Windows (fixes #3767).
- Add new parameter 'invertthreshold', change the default threshold from 0.5 to 0.7 and mark parameter 'tesseditdo_invert' as deprecated.
See also list of all changes.
- C++
Published by stweil over 3 years ago
tesseract-ocr - 5.1.0
This is a new minor version of Tesseract 5.
- Handle image and line regions in output formats ALTO, hOCR and text.
- New parameter curltimeout for curleasy_setop.
- Build fixes and improvements.
- Catch nullptr in PageIterator::Orientation to improve robustness.
- Remove unused code.
See also list of all changes.
- C++
Published by stweil almost 4 years ago
tesseract-ocr - 5.0.1
This is a bug fix release of Tesseract 5.0.
- Add SPDX-License-Identifier to public include files.
- Support redirections when running OCR on a URL.
- Lots of fixes and improvements for cmake builds. Distributions should use the autoconf build.
- Fix broken msys2 build with gcc 11.
- Fix parameter certainty_scale (was duplicated).
- Fix some compiler warnings and clean code.
- Correctly detect amd64 and i386 on FreeBSD.
- Add libarchive and libcurl in continuous integration actions.
- Update submodule googletest to release v1.11.0.
See also list of all changes.
- C++
Published by stweil about 4 years ago
tesseract-ocr - 5.0.0
This is the final stable release of Tesseract 5.0.0.
- Limit BCER to interval [0,1]
- Improved build process
- Cleaned code
See also list of all changes.
- C++
Published by stweil about 4 years ago
tesseract-ocr - 5.0.0-rc3
This is the third release candidate of Tesseract 5.0.0.
- Improve training messages
- Add RowAttributes getter to PageIterator
See also list of all changes.
- C++
Published by stweil about 4 years ago
tesseract-ocr - 4.1.3
This is a new stable release of Tesseract 4.1.
- Fix broken autoconf build (issue #3642)
See also list of all changes.
- C++
Published by stweil over 4 years ago
tesseract-ocr - 4.1.2
This is a new stable release of Tesseract 4.1.
Note: The autoconf build is broken (see issue #3642), so please use 4.1.3.
- Allow line images with larger width for training
- Bug fixes
- Build updates and fixes
See also list of all changes.
- C++
Published by stweil over 4 years ago
tesseract-ocr - 5.0.0-rc2
This is the second release candidate of Tesseract 5.0.0.
- Fix regression for OCR with more than one model file
- Bug fixes
- Optimizations
See also list of all changes.
- C++
Published by stweil over 4 years ago
tesseract-ocr - 5.0.0-rc1
This is the first release candidate of Tesseract 5.0.0.
- Enable fast float32 LSTM by default
- Switch to NFC normalisation everywhere
- Remove banner message
- Disable music staff detection and removal
- Add new command line option --loglevel
- Bug fixes
See also list of all changes.
- C++
Published by stweil over 4 years ago
tesseract-ocr - 5.0.0-beta-20210916
This is a new pre-release of Tesseract 5.0.0.
- Bug fixes
- Extend URI support for Tesseract with libcurl
- Rename processed TIFF output file and add page number if needed
See also list of all changes.
- C++
Published by stweil over 4 years ago
tesseract-ocr - 5.0.0-beta-20210815
This is a new pre-release of Tesseract 5.0.0.
- Bug fixes
- Modernize more code
- More options for binarization
- Improved support for ARM NEON
- No longer depends on Abseil for unit tests
- Support float for model training and text recognition (faster, requires less RAM)
See also list of all changes.
- C++
Published by stweil over 4 years ago
tesseract-ocr - 5.0.0-alpha-20210401
This is a new pre-release of Tesseract 5.0.0.
- Replaced all remaining
STRINGbystd::string - Replaced lots of
GenericVectorbystd::vector - Replaced all
malloc/freeby C++ code - Modernized and formatted code
See also list of all changes.
- C++
Published by stweil almost 5 years ago
tesseract-ocr - 5.0.0-alpha-20201231
This is a new pre-release of Tesseract 5.0.0.
It has massive changes in the public API which is a great step towards a final 5.0.0. All unit tests pass, but because of those changes more practical experience is needed.
- the public API no longer uses proprietary data types GenericVector, STRING
- pdf.ttf is no longer needed because it is now integrated into the code
See also list of all changes.
- C++
Published by stweil about 5 years ago
tesseract-ocr - 5.0.0-alpha-20201224
This is a new pre-release of Tesseract 5.0.0.
It is considered to be production ready for end users, but nevertheless not stable because more incompatible API changes are planned.
- improved performance (also on ARM / ARM64)
- improved unit tests
- many fixes
- faster flat build with automake
- support for latest macOS (including new M1 processor)
See also list of all changes.
- C++
Published by stweil about 5 years ago
tesseract-ocr - 4.1.1 Release
- Implemented sw build (cppan is deprecated)
- Improved cmake build
- Code cleanup and optimization
- A lot of bug fixes...
- C++
Published by zdenop about 6 years ago
tesseract-ocr - 4.1.0 Release
- Added new renderers Alto, LSTMBox, WordStrBox.
- Added character boxes in hOCR output.
- Added python training scripts (experimental) as alternative shell scripts.
- Better support AVX / AVX2 / SSE.
- Disable OpenMP support by default (see e.g. #1171, #1081).
- Fix for bounding box problem.
- Implemented support for whitelist/blacklist in LSTM engine.
- Improved cmake configuration.
- Code modernization and improvements.
- A lot of bug fixes...
Detailed changelog is on wiki.
Windows installer can be downloaded from https://github.com/UB-Mannheim/tesseract/wiki.
- C++
Published by zdenop over 6 years ago
tesseract-ocr - 4.0.0 Release
Detailed Release notes, Changelog and documentation can be found in project wiki.
Windows installer can be downloaded from https://github.com/UB-Mannheim/tesseract/wiki.
- C++
Published by zdenop over 7 years ago
tesseract-ocr - 3.05.00 Release
- Made some fine tuning to the hOCR output.
- Added TSV as another optional output format.
- Fixed ABI break introduced in 3.04.00 with the AnalyseLayout() method.
- text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer.
- Training tools - Replaced asserts with tprintf() and exit(1).
- Fixed Cygwin compatibility.
- Improved multipage tiff processing.
- Improved the embedded pdf font (pdf.ttf).
- Enable selection of OCR engine mode from command line.
- Changed tesseract command line parameter '-psm' to '--psm'.
- Added new C API for orientation and script detection, removed the old one.
- Increased minimum autoconf version to 2.59.
- Removed dead code.
- Fixed many compiler warning.
- Fixed memory and resource leaks.
- Fixed some issues with the 'Cube' OCR engine.
- Fixed some openCL issues.
- Added option to build Tesseract with CMake build system.
- Implemented CPPAN support for easy Windows building.
- C++
Published by zdenop about 9 years ago
tesseract-ocr - 3.04.01 release
bug-fix release of 3.04 version
- C++
Published by zdenop about 10 years ago
tesseract-ocr - 3.04.00 release
- Added OpenCL support (experimental)
- Many bug fixes
- C++
Published by zdenop over 10 years ago