Recent Releases of foliautils

foliautils - v0.23

  • adapted to most recent libfolia (2,21) and ticcutils
  • updated to C++17
  • numereous code refactorings
  • updated GitHub CI

- C++
Published by kosloot over 1 year ago

foliautils - v0.22

  • requires libfolia 2.19 or higher
  • fixes github actions on MacOSX
  • small code updates and improvements

- C++
Published by kosloot about 2 years ago

foliautils - v0.21

  • a lot off code changes. Many regarding hyphens
  • added an extractfinalhyphen function, used by several programs FoLiA-abby, FoLiA-page and FoLiA-txt
  • FoLiA-txt: filter out ZWNJ characters. Avoid spurious LineBreaks
  • fix for https://github.com/LanguageMachines/foliautils/issues/68
  • FoLiA-idf was not quite working. Fixed
  • FoliA-page: removed --sent option. updated man page lots of other fixes too
  • added experimental FoLiA-merge program: merging lemma/pos information into FoLiA files
  • more and better tests added
  • updates in README.MD

- C++
Published by kosloot about 2 years ago

foliautils - v0.20

[Ko van der Sloot] * Fix in FoLiA-txt. A <t-hbr> signals a newline, so adding an extra <br/> is not correct

- C++
Published by kosloot over 3 years ago

foliautils - v0.19

[Ko van der Sloot] * general C++ cleanup and refactoring * Some fixes for building on Mac OSX * FoLiA-txt: - now we handle soft-hyphens - modifications to solve #67 --remove-end-hyphens is the default now. We create <t-hbr> nodes - modifications for https://github.com/proycon/foliapy/issues/25 - Unicode awareness * FoLiA-2text: - added a --restore-formatting option, which outputs the text inside <t-hspace> and <t-hbr> nodes * FoLiA-abby: - handling of soft-hyphens - fixes for <br/> and <t-hbr> - preserve original spaces in <t-hspace>'s text * FoLiA-correct: small fix in program logic.

- C++
Published by kosloot over 3 years ago

foliautils - v0.18

[Ko van der Sloot] * FoLiA-page: only add LineBreak annotation when needed * added more tests to make check * adapted and fixed tests * fixed the ugly problem of temporally disabling text checking. * start using the "system" foliadiff * fix declarations

[Maarten van Gompel] * FoLiA-page: added a --nomarkup parameter to revert to the old behaviour, and an extra --nostrings parameter to omit the strings #65 * added a note for the --sent option #65 * Added some comments for the ugly disable set_checktext patch, I don't like this but it seems needed (underlying libfolia issue?) #65 * Add linebreaks and t-str to the paragraph text (currently fails text validation) * added Dockerfile and instructions * codemeta.json: updated according to (proposed) CLARIAH requirements (CLARIAH/clariah-plus#38)

- C++
Published by proycon almost 4 years ago

foliautils - v0.17

  • needs libfolia 2.9 or above
  • replaced TravisCI by GitHub actions
  • FoLiA-correct:
    • fixex a problem with correcting FoLia with both p and s nodes
    • added support for the FoLiA 'tag' feature
    • clearer error messages
    • fixed bugs in HEMP handling
    • better handling of Ucto's ABBREVIATION* tokens
    • fixed corrections when a word has 'space="no"'.
    • some smaller fixes
    • added more tests
  • FoLiA-clean:
    • improved, using new features from libfolia 2.9
  • FoLiA-2text:
    • replaced '--original' parameter by a '--correction-handling' parameter
    • implemented a --honour-tags option, to interpret tag="token" tags
    • some improvement in output-file naming
  • FoLiA-abby:
    • complete reworked the code
    • added '-S' and '-C' as alternatives for '--setname' and '--classname'
    • added a --keephyphens option
    • added a --addbreaks option
    • addes option --addmetrics to optionally add positional info to the paragraphs
    • improved handling of '-' (Hyphen)
    • add 'fontproperties', 'fontid' and 'font_style' as a feature node
    • improved handling of text with spaces at 'unexpected' locations
  • all modules:
    • Code refactoring and cleaning
    • added and improved tests
    • adapted man pages

- C++
Published by kosloot almost 5 years ago

foliautils - v0.16

[Ko vd Sloot] * requires libfolia 2.7 or above * provenance data is better for a lot of modules * added better checking on invalid NCnames in some modules. * FoLiA-abby: - a lot of refactoring and additions to handle font/style information * FoLiA-pm: - Notes are handled correctly now - fixed error in xlink attributes * FoLiA-page: - more types of Page files are handled now - fixed annotation declarations - fixed offset calculation (due to change in FoLiA's opinion on those) - page number is added as a
node and in the metadata - added a --trusttokens option. This means that Word items in the Page file are added as Word's in the FoLiA, embedded in Sentences. - added a --norefs option to avoid adding references to the original texts * FoLiA-correct: - make sure that the default is to run on 1 thread - added a --rebase-inputclass option * FoLiA-alto: - the -t option was not always handled correctly

[Maarten van Gompel] * FoLiA-benchmark: guard against compiler optimisation #48

- C++
Published by kosloot over 5 years ago

foliautils - v0.15

[Maarten van Gompel] * FoLiA-txt: check if a string is empty after normalisation (fix for #46)

[Ko vd Sloot] * folia-correct: fix one-off error in hemp handling (when no hemp was found) #45 * some refactoring * centralized definition of XMLPARSEROPTIONS * bugfix in threading

- C++
Published by proycon almost 6 years ago

foliautils - v0.14

[Martin Reynaert] * updated man pages

[Ko vd Sloot] * added man pages * revised usage() in many modules * the default separator in FoLiA-stats is '' now * fix for: https://github.com/LanguageMachines/foliautils/issues/37 * fix for: https://github.com/LanguageMachines/foliautils/issues/41 * adapted to changes in libfolia * many small code refactorings * FoLiA-correct is improved a lot, allowing ngram corrections in FoLiA * FoLiA-stats accepts a 'wordin_doc' mode now * FoLiA-alto by default created nodes now. use --oldstring to get * improved a lot in tests/ * many small fixes

- C++
Published by kosloot about 6 years ago

foliautils - v0.13

[Ko vd Sloot] bug fixes: * fix for https://github.com/LanguageMachines/foliautils/issues/35 * fix for https://github.com/LanguageMachines/foliautils/issues/36

[Maarten van Gompel] new features: * FoLiA-wordtranslate.cxx: frog should be able to deal with spaces now, no need for ugly non-breaking space hack that now causes other problems in frog (LanguageMachines/frog#77) * FoLiA-wordtranslate.cxx: adding ability to constrain by language

- C++
Published by kosloot almost 7 years ago

foliautils - v0.12

Released for FoLiA 2.0

- C++
Published by kosloot about 7 years ago

foliautils - v0.11

  • Updated and added some tests
  • started moving common code to a separate file and build a library (libfoliautils)

    • hemp detection is one of them
  • FoLiA-stats:

    • added possibility to read a list of directories + file-names to process into separate output directories. (could be generalized to other programs)
    • better hemp detection
  • FoLiA-correct:

    • use same hemp detection as FoLiA-stats
  • FoLiA-abby:

    • support more flavors
  • FoLiA-clean:

    • avoid removing the last remaining tekt on nodes
    • cleaning of tokenization now works

- C++
Published by kosloot about 7 years ago

foliautils - v0.10

[Ko vd Sloot] * fixed icu:namespace issues * added FoLiA-abby, an ABBY to FoLiA convertor * src/FoLiA-abby.cxx, src/FoLiA-page.cxx, src/FoLiA-pm.cxx: - Allow 'none' value for --prefix * src/FoLiA-page.cxx, src/FoLiA-hocr.cxx: fixed Alignment info * src/FoLiA-correct.cxx: - fixed a problem with correction of the last word of a trigram. - fix correction of paragraphs with only deeper text - The --rank option accepts more flavors of files * src/FoLiA-stats.cxx: - added a --detokenize option * several minor fixes, refactorings etc. * updated tests

- C++
Published by kosloot over 7 years ago

foliautils - v0.9.2

Bug fix release: * append small prefixes to output filenames, to ALWAYS avoid names starting with a numeric value. 'FPM-' for FoLiA-pm. 'FP-' for FoLiA-page, 'FH-' for FoLiA-hocr Can bet set witth --prefix * FoLiA-stats.cxx: - added --collect to usage() and 'man' page * FoLiA-correct: - added --inputclass and --outputclass parameters (must be different) - Don't crash on empty text.

- C++
Published by kosloot about 8 years ago

foliautils - v0.9.1

Bug Fix release: - the tests directory wasn't included in the release

- C++
Published by kosloot about 8 years ago

foliautils - v0.9

[Ko vd Sloot] * FoLiA-stats.cxx: - added a --collect option, to create files with all n-grams together - clearer message in FoLiA-stats when no results were found - extract text from deeper nodes, if needed - fixed out-of-bounds problem - now fails when every input file fails * FoLiA-txt: - now fails when every input file fails * avoid xml:id's starting with a number. Add "id-" in front. * added more tests

[Maarten van Gompel] * added codemeta.json

- C++
Published by kosloot about 8 years ago

foliautils - v0.8

  • added -R option to FoLiA-collect
  • FoLiA-collect now can work in parallel (-t option)
  • modernized configuration, whit better Max OSX support (including OpenMP)
  • all modules end with an exit code now.
  • added more tests to 'make check'
  • added output of Type-Token Ratio's (also in degrees)
  • several bugfixes.
  • code cleanup and refactoring, some speedup too

- C++
Published by kosloot over 8 years ago

foliautils - v0.7

[ko vd Sloot] * updated and expanded tests * fixed offset calculations in FoLiA-hocr, FoLiA-page.cxx and FoLiA-alto. We use unicode points now. (needed for folia v1.5 and above) * Changed 'modes' in FoLiA-stats, to be a bit more comprehensible * fixed problem with metadatatype when 'foreign-data' is present * enhanced FoLiA-clean. Still not done... * switched to dynamic OMP scheduling in most programs. (which process files with probably big differences in processing time) * small bugfixes. * general cleanup and refactoring

[Maarten van Gompel] * Added and improved FoLiA-wordtranslate.cxx

- C++
Published by kosloot over 8 years ago

foliautils - v0.6

foliautils 0.6 04-04-2017 This is an intermediate release!! Work on some tools is developing rapidly. next releases won't take long. For now, backward compatibility is still maintained mostly.

[Ko van der Sloot] * uses libfolia 1.7 now! * FoLiA-correct now uses an other output file naming scheme (breaks backward compitablity) * FoLiA-langcat now has a --tags parameter to select which nodes are searched * FoLiA-stats: - a new --separator option is added - added a --max-ngram option. - added a --languages option for multiple languages - now we have a --aggregate option for multiple language statistics - fixed a bug in total counts * added a first version of FoLiA-clean program. Cleans up tests/tags in FoLiA files. * FoLiA-correct: - output statistics - verbosity option improved * added and improved a lot of tests

- C++
Published by kosloot about 9 years ago

foliautils - v0.5

  • based on libfolia 1.5 or higher
    • use recent ucto with textcat support
    • use ISO 639-3 language names
    • lot's of code refactoring
    • improved tests
    • bug fixes in FoLiA-correct unigram correction
    • extended and improved FoLiA-pm a lot
    • changed default values for '--lang' and '--class' in FoLiA-stats (issue #3)
    • FoLiA-alto can now work without a Didl too (issue #2)
    • numerous additions...

- C++
Published by kosloot over 9 years ago

foliautils - v0.4

A new program FoLiA-pm is added it converts Political Mashup files into FoLiA.

Needs libfolia v1.2

- C++
Published by kosloot about 10 years ago

foliautils - v0.3

New release. Now based on libfolia 1.0

- C++
Published by kosloot over 10 years ago

foliautils - v0.2

- C++
Published by kosloot over 10 years ago