Recent Releases of foliautils
foliautils - v0.23
- adapted to most recent libfolia (2,21) and ticcutils
- updated to C++17
- numereous code refactorings
- updated GitHub CI
- C++
Published by kosloot over 1 year ago
foliautils - v0.22
- requires libfolia 2.19 or higher
- fixes github actions on MacOSX
- small code updates and improvements
- C++
Published by kosloot about 2 years ago
foliautils - v0.21
- a lot off code changes. Many regarding hyphens
- added an extractfinalhyphen function, used by several programs FoLiA-abby, FoLiA-page and FoLiA-txt
- FoLiA-txt: filter out ZWNJ characters. Avoid spurious LineBreaks
- fix for https://github.com/LanguageMachines/foliautils/issues/68
- FoLiA-idf was not quite working. Fixed
- FoliA-page: removed --sent option. updated man page lots of other fixes too
- added experimental FoLiA-merge program: merging lemma/pos information into FoLiA files
- more and better tests added
- updates in README.MD
- C++
Published by kosloot about 2 years ago
foliautils - v0.20
[Ko van der Sloot]
* Fix in FoLiA-txt. A <t-hbr> signals a newline, so adding an extra <br/> is
not correct
- C++
Published by kosloot over 3 years ago
foliautils - v0.19
[Ko van der Sloot]
* general C++ cleanup and refactoring
* Some fixes for building on Mac OSX
* FoLiA-txt:
- now we handle soft-hyphens
- modifications to solve #67
--remove-end-hyphens is the default now. We create <t-hbr> nodes
- modifications for https://github.com/proycon/foliapy/issues/25
- Unicode awareness
* FoLiA-2text:
- added a --restore-formatting option, which outputs the text inside
<t-hspace> and <t-hbr> nodes
* FoLiA-abby:
- handling of soft-hyphens
- fixes for <br/> and <t-hbr>
- preserve original spaces in <t-hspace>'s text
* FoLiA-correct: small fix in program logic.
- C++
Published by kosloot over 3 years ago
foliautils - v0.18
[Ko van der Sloot] * FoLiA-page: only add LineBreak annotation when needed * added more tests to make check * adapted and fixed tests * fixed the ugly problem of temporally disabling text checking. * start using the "system" foliadiff * fix declarations
[Maarten van Gompel] * FoLiA-page: added a --nomarkup parameter to revert to the old behaviour, and an extra --nostrings parameter to omit the strings #65 * added a note for the --sent option #65 * Added some comments for the ugly disable set_checktext patch, I don't like this but it seems needed (underlying libfolia issue?) #65 * Add linebreaks and t-str to the paragraph text (currently fails text validation) * added Dockerfile and instructions * codemeta.json: updated according to (proposed) CLARIAH requirements (CLARIAH/clariah-plus#38)
- C++
Published by proycon almost 4 years ago
foliautils - v0.17
- needs libfolia 2.9 or above
- replaced TravisCI by GitHub actions
- FoLiA-correct:
- fixex a problem with correcting FoLia with both p and s nodes
- added support for the FoLiA 'tag' feature
- clearer error messages
- fixed bugs in HEMP handling
- better handling of Ucto's ABBREVIATION* tokens
- fixed corrections when a word has 'space="no"'.
- some smaller fixes
- added more tests
- FoLiA-clean:
- improved, using new features from libfolia 2.9
- FoLiA-2text:
- replaced '--original' parameter by a '--correction-handling' parameter
- implemented a --honour-tags option, to interpret tag="token" tags
- some improvement in output-file naming
- FoLiA-abby:
- complete reworked the code
- added '-S' and '-C' as alternatives for '--setname' and '--classname'
- added a --keephyphens option
- added a --addbreaks option
- addes option --addmetrics to optionally add positional info to the paragraphs
- improved handling of '-' (Hyphen)
- add 'fontproperties', 'fontid' and 'font_style' as a feature node
- improved handling of text with spaces at 'unexpected' locations
- all modules:
- Code refactoring and cleaning
- added and improved tests
- adapted man pages
- C++
Published by kosloot almost 5 years ago
foliautils - v0.16
[Ko vd Sloot]
* requires libfolia 2.7 or above
* provenance data is better for a lot of modules
* added better checking on invalid NCnames in some modules.
* FoLiA-abby:
- a lot of refactoring and additions to handle font/style information
* FoLiA-pm:
- Notes are handled correctly now
- fixed error in xlink attributes
* FoLiA-page:
- more types of Page files are handled now
- fixed annotation declarations
- fixed offset calculation (due to change in FoLiA's opinion on those)
- page number is added as a
node and in the metadata
- added a --trusttokens option. This means that Word items in the Page file
are added as Word's in the FoLiA, embedded in Sentences.
- added a --norefs option to avoid adding references to the original texts
* FoLiA-correct:
- make sure that the default is to run on 1 thread
- added a --rebase-inputclass option
* FoLiA-alto:
- the -t option was not always handled correctly
[Maarten van Gompel] * FoLiA-benchmark: guard against compiler optimisation #48
- C++
Published by kosloot over 5 years ago
foliautils - v0.15
[Maarten van Gompel] * FoLiA-txt: check if a string is empty after normalisation (fix for #46)
[Ko vd Sloot] * folia-correct: fix one-off error in hemp handling (when no hemp was found) #45 * some refactoring * centralized definition of XMLPARSEROPTIONS * bugfix in threading
- C++
Published by proycon almost 6 years ago
foliautils - v0.14
[Martin Reynaert] * updated man pages
[Ko vd Sloot]
* added man pages
* revised usage() in many modules
* the default separator in FoLiA-stats is '' now
* fix for: https://github.com/LanguageMachines/foliautils/issues/37
* fix for: https://github.com/LanguageMachines/foliautils/issues/41
* adapted to changes in libfolia
* many small code refactorings
* FoLiA-correct is improved a lot, allowing ngram corrections in FoLiA
* FoLiA-stats accepts a 'wordin_doc' mode now
* FoLiA-alto by default created
- C++
Published by kosloot about 6 years ago
foliautils - v0.13
[Ko vd Sloot] bug fixes: * fix for https://github.com/LanguageMachines/foliautils/issues/35 * fix for https://github.com/LanguageMachines/foliautils/issues/36
[Maarten van Gompel] new features: * FoLiA-wordtranslate.cxx: frog should be able to deal with spaces now, no need for ugly non-breaking space hack that now causes other problems in frog (LanguageMachines/frog#77) * FoLiA-wordtranslate.cxx: adding ability to constrain by language
- C++
Published by kosloot almost 7 years ago
foliautils - v0.11
- Updated and added some tests
started moving common code to a separate file and build a library (libfoliautils)
- hemp detection is one of them
FoLiA-stats:
- added possibility to read a list of directories + file-names to process into separate output directories. (could be generalized to other programs)
- better hemp detection
FoLiA-correct:
- use same hemp detection as FoLiA-stats
FoLiA-abby:
- support more flavors
FoLiA-clean:
- avoid removing the last remaining tekt on
nodes - cleaning of tokenization now works
- avoid removing the last remaining tekt on
- C++
Published by kosloot about 7 years ago
foliautils - v0.10
[Ko vd Sloot] * fixed icu:namespace issues * added FoLiA-abby, an ABBY to FoLiA convertor * src/FoLiA-abby.cxx, src/FoLiA-page.cxx, src/FoLiA-pm.cxx: - Allow 'none' value for --prefix * src/FoLiA-page.cxx, src/FoLiA-hocr.cxx: fixed Alignment info * src/FoLiA-correct.cxx: - fixed a problem with correction of the last word of a trigram. - fix correction of paragraphs with only deeper text - The --rank option accepts more flavors of files * src/FoLiA-stats.cxx: - added a --detokenize option * several minor fixes, refactorings etc. * updated tests
- C++
Published by kosloot over 7 years ago
foliautils - v0.9.2
Bug fix release: * append small prefixes to output filenames, to ALWAYS avoid names starting with a numeric value. 'FPM-' for FoLiA-pm. 'FP-' for FoLiA-page, 'FH-' for FoLiA-hocr Can bet set witth --prefix * FoLiA-stats.cxx: - added --collect to usage() and 'man' page * FoLiA-correct: - added --inputclass and --outputclass parameters (must be different) - Don't crash on empty text.
- C++
Published by kosloot about 8 years ago
foliautils - v0.9.1
Bug Fix release: - the tests directory wasn't included in the release
- C++
Published by kosloot about 8 years ago
foliautils - v0.9
[Ko vd Sloot] * FoLiA-stats.cxx: - added a --collect option, to create files with all n-grams together - clearer message in FoLiA-stats when no results were found - extract text from deeper nodes, if needed - fixed out-of-bounds problem - now fails when every input file fails * FoLiA-txt: - now fails when every input file fails * avoid xml:id's starting with a number. Add "id-" in front. * added more tests
[Maarten van Gompel] * added codemeta.json
- C++
Published by kosloot about 8 years ago
foliautils - v0.8
- added -R option to FoLiA-collect
- FoLiA-collect now can work in parallel (-t option)
- modernized configuration, whit better Max OSX support (including OpenMP)
- all modules end with an exit code now.
- added more tests to 'make check'
- added output of Type-Token Ratio's (also in degrees)
- several bugfixes.
- code cleanup and refactoring, some speedup too
- C++
Published by kosloot over 8 years ago
foliautils - v0.7
[ko vd Sloot] * updated and expanded tests * fixed offset calculations in FoLiA-hocr, FoLiA-page.cxx and FoLiA-alto. We use unicode points now. (needed for folia v1.5 and above) * Changed 'modes' in FoLiA-stats, to be a bit more comprehensible * fixed problem with metadatatype when 'foreign-data' is present * enhanced FoLiA-clean. Still not done... * switched to dynamic OMP scheduling in most programs. (which process files with probably big differences in processing time) * small bugfixes. * general cleanup and refactoring
[Maarten van Gompel] * Added and improved FoLiA-wordtranslate.cxx
- C++
Published by kosloot over 8 years ago
foliautils - v0.6
foliautils 0.6 04-04-2017 This is an intermediate release!! Work on some tools is developing rapidly. next releases won't take long. For now, backward compatibility is still maintained mostly.
[Ko van der Sloot]
* uses libfolia 1.7 now!
* FoLiA-correct now uses an other output file naming scheme (breaks backward compitablity)
* FoLiA-langcat now has a --tags parameter to select which
- C++
Published by kosloot about 9 years ago
foliautils - v0.5
- based on libfolia 1.5 or higher
- use recent ucto with textcat support
- use ISO 639-3 language names
- lot's of code refactoring
- improved tests
- bug fixes in FoLiA-correct unigram correction
- extended and improved FoLiA-pm a lot
- changed default values for '--lang' and '--class' in FoLiA-stats (issue #3)
- FoLiA-alto can now work without a Didl too (issue #2)
- numerous additions...
- C++
Published by kosloot over 9 years ago
foliautils - v0.4
A new program FoLiA-pm is added it converts Political Mashup files into FoLiA.
Needs libfolia v1.2
- C++
Published by kosloot about 10 years ago
foliautils - v0.3
New release. Now based on libfolia 1.0
- C++
Published by kosloot over 10 years ago