Recent Releases of ticcltools
ticcltools - v0.11
- require C++17
- require latest ticcutils
- Now we use NFC endoded Unicode strings everywhere
- testrank script results were outdated since 0.10
- removed dependency on libtar
- added --follow option to TiCCL-indexer(NT)
- several code refactoring and cleanup
- adapted tests
- updated GitHub CI
- C++
Published by kosloot about 1 year ago
ticcltools - v0.10
[Ko van der Sloot] * LDcalc: - No longer filter out n-grams with common parts. Was too aggressive - Removed some more outcommented old code * chainclean: added a --caseless option. (Default is true) * Removed Roaring versions of the code. Lacked maintenance for years. * internally shifting towards UnicodeString in general * a lot of C++ cleanup, with some refactoring, splitting up long blobs of code
- C++
Published by kosloot about 3 years ago
ticcltools - v0.9
Ko van der Sloot: * LDcalc: removed code to filter out ngrams with common parts (experimental)
Maarten van Gompel: * Added Dockerfile: containerization support * Changed repository status to unsupported!
- C++
Published by proycon over 3 years ago
ticcltools - v0.8
- using more recent functions from ticcutils
- use more code from ticcl_common
- attempt to solve https://github.com/LanguageMachines/ticcltools/issues/42
- some small code refactoring
- C++
Published by kosloot about 4 years ago
ticcltools - v0.7.1
[Ko vd Sloot] * changed ICU requirement to at least 5.6 * some refactoring * started implementing a solution for #42 * added error message when the index file is empty.
- C++
Published by proycon over 5 years ago
ticcltools - v0.7
[Martin Reynaert] * updated man pages * updated README.md
[Ko vander Sloot] Numerous bug fixes and additions. Added a .so for common functions
The bitType is changed to uint64_t (for the biggest int possible) which triggered some code adaptations. (values < 0 are not possible)
TICCL-unk:
- some changes in UNK detection
- added a --hemp option
- create a .fore.clean file when a background corpus is merged in
TICCL-stats:
- added a -n option to use a newline as delimiter
TICCL-indexer(NT):
- better and faster implementation
- added --confstats option
TICCL-LDcalc:
- added a --follow option for debugging purposes
- fix for https://github.com/LanguageMachines/ticcltools/issues/30
- added --low and --high parameters
TICCL-rank:
- added a --follow option for debugging purposes
- added --subtractartifrqfeature1 and --subtractartifrqfeature2 options
- replaced pairs_combined ranking by median ranking
- added an n-garm filter
TICCL-chain:
- added --nounk option
- fix for https://github.com/LanguageMachines/ticcltools/issues/38
- fix for https://github.com/LanguageMachines/ticcltools/issues/37
- use the alphabet file too with --alph
TICCL-chainclean: new module to clean chain ranked files
TICCL-anahash:
- accept lexicons without frequencies too. (also simple word lists)
- added a -o option
- C++
Published by kosloot almost 6 years ago
ticcltools - v0.6
Intermediate release, with a lot of new code to handle N-grams Also a lot of refactoring is done, for more clear and maintainable code. This is work in progress still.
TICCL-unk:
- more extensive acronym detection
- fixed artifreq problems in 'clean' punctuated words
- added filters for 'unwanted' characters
- added a ligature filter to convert evil ligatures
- normalize all hyphens to a 'normal' one (-)
- use a better definition of punctuation (unicode character class is not good enough to decide)
TICCL-lexstat:
- the 'separator' symbol should get freq=0, so it isnt counted
- the clip value is added to the output filename
TICCL-indexer:
- indexer and indexerNT now produce the same output, using different strategies when a --foci files is used.
TICCL-LDcalc: major overhaul for n-grams
- added a ngram point column to the output (so NOT backward compatible!)
- produce a '.short' list for short word corrections
- produce a '.ambi' file with a list of n-grams related to short words
- prune a lot of ngrams from the output
TICCL-rank:
- output is sorted now
- honor the ngram-points from the new LDcalc. (so NOT backward compatible!)
TICCL-chain: new module to chain ranked files
TICCL-lexclean: -added a -x option for 'inverse' alphabet
TICCL-anahash:
- added a --list option to produce a list of words and anagram values
added metadata file: codemeta.json
- C++
Published by kosloot over 7 years ago
ticcltools - v0.5
- updated configuration. also for Mac OSX
- use of more ticcutils stuff: diacriticsfilter
- added a TICCL-mergelex program
- the OMPTHREADLIMIT environment variable was ignored sometimes
- TICCL-unk:
- fixed a problem in artifreq handling
- changed acronym detection (work in progress)
- added -o option TICCL-lexstat:
- added TTR output
- added -o option TICCL-indexer
- now also handles --foci file. with some speed-up
- added a -t option TICCL-LDcalc:
- be less picky on a few wrong lines in the data
- added some tests
- when libroaring is installed we built roaring versions of some modules (experimental)
- updated man pages
- C++
Published by kosloot about 8 years ago
ticcltools - v0.4
- first official release.
- added functions to test on Word2Vec datafiles
- refactoring and modernizing stuff all around
- C++
Published by kosloot almost 9 years ago