Scientific Software
Updated 9 months ago
WordTokenizers.jl
WordTokenizers.jl: Basic tools for tokenizing natural language in Julia - Published in JOSS (2020)
Earth and Environmental Sciences
(40%)
Engineering
(40%)
Scientific Software · Peer-reviewed
Updated 3 months ago
KeemenaPreprocessing.jl: Unicode-Robust Cleaning, Multi-Level Tokenisation & Streaming Offset Bundling for Julia NLP
KeemenaPreprocessing.jl: Unicode-Robust Cleaning, Multi-Level Tokenisation & Streaming Offset Bundling for Julia NLP - Published in JOSS (2026)