Recent Releases of ucto

ucto - v0.35

  • require latest ticcutils
  • updated GitHub CI

- C++
Published by kosloot over 1 year ago

ucto - v0.34

[Maarten van Gompel] * fall back when local config dir can not be checked for whatever reason https://github.com/LanguageMachines/ucto/issues/97 * extract custom configuration directory if provided, and fall back to that for includes https://github.com/LanguageMachines/ucto/issues/96 * needs ticcutils >= 0.35 [Ko van der Sloot] * force use of c++17 * minor code updates * streamlined Github CI file * adapted some foliatests to recent libfolia versions * refactored tests: - all shell scripts have the .sh extension now - use folialint or foliadiff to check folia results

- C++
Published by kosloot over 1 year ago

ucto - v0.33

  • added a batch mode: https://github.com/LanguageMachines/ucto/issues/94
  • improved handling of NonSpacing markers.
  • adapted some tests, based on the newest uctodata package (notably French was not correct implemented)

- C++
Published by kosloot about 2 years ago

ucto - v0.32.1

  • additional fix for https://github.com/LanguageMachines/ucto/issues/93 spurious BOM markers should be ignored in all cases

- C++
Published by kosloot about 2 years ago

ucto - v0.32

  • fix for https://github.com/LanguageMachines/ucto/issues/95
  • automagicly geneate an xml:id when not provided

- C++
Published by kosloot about 2 years ago

ucto - v0.31

  • fixed handling of the rare cases of Unidentifiable Characters They were ignored, which lead to incompatible text elements in FoLiA
  • some small refactoring, rooting out CppCheck warnings

- C++
Published by kosloot over 2 years ago

ucto - v0.30

[Ko van der Sloot] * using ticcutils >- 0.34. All Unicode id NFC normalized now * normalization performed for passthru too. All output should be in the same encoding (NFC) * fixed a problem when using the API form Frog * improving code quality * added (dangerous, and compiletime only) option to change the magic 'tokconfig-' value.

[Maarten van Gompel] * README.md: README: added demo screencast

- C++
Published by kosloot over 2 years ago

ucto - v0.29

  • fixes for https://github.com/proycon/python-ucto/issues/16
  • added a new --copyclass option, (see comments in https://github.com/LanguageMachines/ucto/issues/68)
  • updated man page

- C++
Published by kosloot about 3 years ago

ucto - v0.28.1

[Maarten van Gompel]

  • Software metadata update only, no functional changes

- C++
Published by proycon over 3 years ago

ucto - v0.28

[Ko van der Sloot] * Made sure that TextCat is not initialized when not needed * Sentences inside quotes got an inconsistent xml:id (Not invalid though) * Separated Debug en Log streams. * C++ Code quality improved

- C++
Published by kosloot over 3 years ago

ucto - v0.27

[Ko van der Sloot] * removed dependency on libtar * fixed build when HAVE_TEXTCAT was not set. Improved guards agains missing textcat support

[Maarten van Gompel] * guard against uninitialized/missing textcat (https://github.com/proycon/python-frog#22) * require latest libfolia, ticcutils and a more recent libxml2

- C++
Published by proycon over 3 years ago

ucto - v0.26

[Ko van der Sloot] * some code quality improvements * fix for https://github.com/LanguageMachines/ucto/issues/89 * updated configure.ac * updated GitHub action * [Maarten van Gompel] * Added MAINTAINERS * updated codemeta.json * fix for https://github.com/fbkarsdorp/homebrew-lamachine/issues/17

- C++
Published by kosloot over 3 years ago

ucto - v0.25

[Ko van der Sloot] * Added a test for https://github.com/LanguageMachines/ucto/issues/87 * Adapted to latest update in tokconfig-fra (uctodata 0.9) * Deal with unknown languages (as detected by ucto), using iso-639-3 'und' (https://github.com/LanguageMachines/ucto/issues/86) * don't tokenize unknown languages * configurable sentence splitter for "und" text * added tests * added code to set the separator (--seperators), so ucto can split on more than just spaces * migrated test wrapper to Python 3 (was still on 2.7)

[Maarten van Gompel] * Set up a Dockerfile * Added build-deps.sh to automatically download, build and install dependencies * Updated software metadata (codemeta.json) to latest requirements as proposed in CLARIAH * deprecated options -f and -x, still works but no longer advertised and gives a deprecation notice (https://github.com/LanguageMachines/ucto/issues/88) * textcat.cfg is now searched for in user config dir as well as global config; also allow running without textcat if the config is missing entirely (same as if not compiled in) * added support for user-based configuration dirs ($XDGCONFIGHOME/ucto), takes precedence over global data dirs

- C++
Published by proycon almost 4 years ago

ucto - v0.24.1

  • added UTF8 members to the API, to replace the variants that were converted to UnicodeString This should help fixing https://github.com/proycon/python-ucto/issues/11

- C++
Published by kosloot over 4 years ago

ucto - v0.24

  • fix for https://github.com/LanguageMachines/ucto/issues/84
  • added a solution for https://github.com/LanguageMachines/ucto/issues/53 (only partly)
  • added some UnicodeString members to the API
  • bumped library version to 6.0, because of API changes
  • code cleanup and refactoring

- C++
Published by kosloot over 4 years ago

ucto - v0.23

  • added support for the new 'tag' feature in FoLiA, only for tag="token"
  • fixed a problem with '-T full' option not always adding text
  • use the new TextPolicy class from libfolia
  • fix for https://github.com/LanguageMachines/ucto/issues/81
  • fix for https://github.com/LanguageMachines/ucto/issues/82
  • added code to handle several Unicode joiners
  • replaced TravisCI by GutHub action
  • %include files may have an extension now
  • added tests for new features

- C++
Published by kosloot almost 5 years ago

ucto - v0.22

[Ko vd Sloot] * Fix for Byte-order Marker problem #79

- C++
Published by proycon over 5 years ago

ucto - v0.21.1

  • fix for https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941498

- C++
Published by kosloot about 6 years ago

ucto - v0.21

  • Adapted to newest libfolia 2.4
  • adapted some tests
  • added an --allow-word-corrections option
  • improved handling of odd FoLiA

- C++
Published by kosloot about 6 years ago

ucto - v0.20

Bug fix release. solving: * https://github.com/LanguageMachines/frog/issues/84 * https://github.com/LanguageMachines/frog/issues/83 * https://github.com/LanguageMachines/ucto/issues/76 * https://github.com/LanguageMachines/ucto/issues/74

- C++
Published by kosloot over 6 years ago

ucto - v0.19

Bug fix release. solving: * https://github.com/LanguageMachines/ucto/issues/72 * some problems with the newest libfolia. * better provenance records

- C++
Published by kosloot over 6 years ago

ucto - v0.18

Bug fix release. solving: https://github.com/LanguageMachines/ucto/issues/70

- C++
Published by kosloot almost 7 years ago

ucto - v0.17

Bug-fix release: - solved problems when tokenizing (partly-)tokenized FoLiA (but this is a very complicated situation. Might need more work) - solved problems with --passthru on FoLiA - avoid empty lines in FoLiA output - use the new generate_id attribute for provenance/processors - added more tests

KNOW PROBLEM: On TravisCI/MacOSX some tests fail for unclear reasons.

- C++
Published by kosloot almost 7 years ago

ucto - v0.16

Major release supporting FoLiA 2.0 * bug fixes for: - empty sentences in FoLiA introduced by NonBreakingSpace - provide provenance data

- C++
Published by kosloot about 7 years ago

ucto - v0.15

Stabilizing release for pre FoLiA 2.0 * uses new folia::engine to process FoLiA * lots of refactoring and cleanup * some small bug fixes * added tests for corner cases in FoLiA * improved TextCat handling and debugging

- C++
Published by kosloot about 7 years ago

ucto - v0.14.1

  • fixed textcat installation problems om Debian and OpenBSD (https://github.com/LanguageMachines/ucto/issues/59)
  • typo in the man page fixed

- C++
Published by kosloot over 7 years ago

ucto - v0.14

[Ko van der Sloot] * updated usage() and removed -S option (never used) * make sure the right textclass is assigned to <w> nodes in FoLiA * minor code fixes/refactorings * added more tests * updated man.1 page

[Maarten van Gompel] * updated README.md

[Iris Hendrickx] * Updated and extended the manual

- C++
Published by kosloot over 7 years ago

ucto - v0.13.2

Bug fix release: * uctodata is mandatory. So don't install default rules anymore

- C++
Published by kosloot about 8 years ago

ucto - v0.13.1

Bug fix release: * configure now finds out the location of the uctodata files. should make it work on Mac systems too

- C++
Published by kosloot about 8 years ago

ucto - v0.13

[Ko van der Sloot] * improved configure/build/test * added a --split option * fixed -P option * removed -S option (never used, and only half implemented) * added a --add-tokens option, to add special tokens for the default language * generally use the icu:: namespace * added more tests * fixed uninitialized variable. * added code to use an alternative search-path for uctodata

[Maarten van Gompel] * added codemeta.json

- C++
Published by kosloot about 8 years ago

ucto - v0.12

  • now use the UniFilter Unicode Filter from ticcutils
  • now use the UnicodeNormalizer from ticcutils
  • improved configuration. Support for Mac OSX added

- C++
Published by kosloot over 8 years ago

ucto - v0.11

Bug fix release: * problems with text inside <cell> elements

- C++
Published by kosloot over 8 years ago

ucto - v0.10

New release due to outdated files in the previous release.

- C++
Published by kosloot over 8 years ago

ucto - v0.9.9

Minor fix: * bumped the .so version to 3.0.0

- C++
Published by kosloot over 8 years ago

ucto - v0.9.8

Bug-fix release. * fixed utterance handling in FoLiA input. Don't try sentence detection!

- C++
Published by kosloot over 8 years ago

ucto - v0.9.7

  • added textredundancy option, default is 'minimal'
    • small adaptations to work with FoLiA 1.5 specs
    • set textclass on words when outputclass != inputclass
    • DON'T filter special characters when inputclass == outputclass
    • -F (folia input) is automatically set for .xml files
    • more robust against texts with embedded tabs, etc.
    • more and better tests added
    • better logging and error messaging
    • improved language handling. TODO: Language detection in FoLiA
    • bug fixes:
    • correctly handle xml-comment inside a
    • better id generation when parent has no id
    • better reaction on overly long 'words'

- C++
Published by kosloot over 8 years ago

ucto - v0.9.6

  • Moving data files from etc/ to share/, as they are more data files than configuration files that should be edited. Requires uctodata >= 0.4. Should solve debian packaging issues (#18)
  • Minor updates to the manual (#2)
  • Some refactoring/code cleanup, temper expectations regarding ucto's date-tagging abilities (#16, thanks also to @sanmai-NL)

- C++
Published by proycon over 9 years ago

ucto - v0.9.5

Bug fix release: - updated tokconfig-generic, which is removed from the uctodata package - configure no longer insists on the presence of uctodata, it merely warns when missing.

- C++
Published by kosloot over 9 years ago

ucto - v0.9.4

Major update - Language support - added support for multiple languages - auto detection of languages using textcat - some refactoring - no more call to exit() - Better logging and Warning messages - some folia output improvements - bug fixes - in passthru, - issue #11

- C++
Published by kosloot over 9 years ago

ucto - v0.9.3

New release, implementing recursive rule application. Check for uctodata version >= 0.2 (but works reasonable with older version)

- C++
Published by kosloot over 9 years ago

ucto - v0.9.2

Minor update release to facilitate debian packaging, readme has been updated as well.

- C++
Published by proycon almost 10 years ago

ucto - v0.9.1

Bug fix release: - fixed autoconfig issue

- C++
Published by kosloot almost 10 years ago

ucto - v0.9.0

Major update - now use uctodata for language specific information ucto itself only supports a generic tokenizer - interactive use now uses readline library - accept long options --help and --version - UTF16BE now works - better support for crooked Windows files in general - added a --normalize option to map tokens in a certain TokenClass to it's generic name

- C++
Published by kosloot almost 10 years ago

ucto - v0.8.6

Fix in the ABRREVIATION rules to avoid spurious linebreaks

- C++
Published by kosloot about 10 years ago

ucto - v0.8.5

Bug fix release. Fixes issue #5

- C++
Published by kosloot about 10 years ago

ucto - v0.8.4

new release based on libfolia1.0

- C++
Published by kosloot about 10 years ago

ucto - v0.8.2

First Ucto release from GIT

- C++
Published by kosloot over 10 years ago