Recent Releases of lexikanon
lexikanon - v0.6.1
Fix
- tokenizers: Add model validator after method (
cfd8a06) - normalizer: Change unescape_html type to Unionbool, str
- tokenizer: Add formal_en normalizer to nltk config (
b65d01a) - stopwords: Add verbose condition to logging (
a7f168b) - lexikanon: Adjust NLTKTagger for tagsets and default tag (
54ba18f)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.6.0
Feature
- tokenizer: Add additional postags (
fe95c3e) - tokenizer: Add additional postags to nltk config (
a35f684) - tokenizers/nltk: Add language support, improve tagset flexibility, download universal_tagset (
82e2514) - lexikanon: Add new nltk_universal configuration file (
2c7880f) - tokenizer: Add nltk_universal configuration (
e68c5e1) - tokenizers: Add MecabTagger and NLTKTagger (
b766fb3) - tokenizer/tagger: Implement NLTKTagger (
4f2d945)
Fix
- tokenizer: Add punctuation postags to mecab.yaml (
1624dbd) - tokenizers: Adjust tokenizer base configurations (
c96bfbf) - MecabTagger: Correct config_group path (
9b74064)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.5.0
Feature
- lexikanon: Add findsimilardocsbyclustering configuration (
51ddf56) - lexikanon: Add findsimilardocsbyclustering configuration (
e2122cf) - lexikanon: Add similarity.py for document similarity analysis (
b1fc21b) - pyproject.toml: Add scikit-learn dependency (
7dd0014)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.4.0
Feature
- nltk: Add config group and name to NLTKTagger (
fd5ba82) - mecab: Add configgroup_ and configname_ fields in MecabTagger (
75937aa) - lexikanon: Add configgroup_ and configname_ to Tokenizer class (
5ae39cb) - normalizer: Add config group and config name attributes to classes (
f55ea51) - tokenizer/tagger: Add configgroup_ and configname_ in mecab.yaml and nltk.yaml (
ed1be7c) - tokenizer: Add configname_ in tokenizer configuration files (
5a3d79b) - lexikanon: Add configname_ in normalizer files (
989d354)
Fix
- dependencies: Upgrade hyfi to 1.12.5 (
0113560)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.3.1
Fix
- lexikanon: Simplify YAML configuration files (
55def1f) - dependencies: Upgrade hyfi to ^1.11.0 (
1ea8090) - lexikanon: Add 'numheads' and 'numtails' options in 'datasetextracttokens.yaml', 'datasettokenize.yaml', 'tokenize.py' and 'extracttokens' (
07dec64)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.3.0
Feature
- lexikanon/pipe: Add new tokenize module (
307d2a7) - lexikanon/pipe: Add new init.py file (
f824369) - lexikanon: Add datasetextractnouns configuration (
e175605) - lexikanon: Add datasetextracttokens.yaml configuration (
f024732) - lexikanon: Add new tokenizer configuration in the dataset_tokenize.yaml file (
02f4044)
Fix
- lexikanon: Update logging and data display in tokenize functions (
27b22e7)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.2.4
Fix
- stopwords: Add special methods for Stopwords (
e1b8871) - stopwords: Separate stopwords function and list, enhance logging control (
fe2cf32) - stopwords: Rename configuration variables (
084f679) - dependencies: Upgrade hyfi to 1.9.4 (
7a0bc54)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.2.2
Fix
- tokenizer: Change SimpleTokenizer path in config (
13d4ea0) - tokenizer: Change MecabTokenizer import path (
edc6e8d) - stopwords: Update target path (
da93a48) - normalizer: Update target to lexikanon.normalizers.Normalizer (
c33e060)
- Python
Published by github-actions[bot] almost 3 years ago
lexikanon - v0.2.0
Feature
- tests: Add stopwords test in lexikanon module (
32be6ae) - tests: Add new test cases in test_tokenizer.py (
f4f6eb8) - tests: Add normalizer test case in lexikanon (
ef1d8c9) - lexikanon/utils/hanja: Add table loading functionality (
e3f14ee) - lexikanon: Add hanja translation support (
36df26b) - hangul: Add support for Hangul character operations (
130c699) - lexikanon/utils/hanja: Add new translation functionality (
d35534b) - lexikanon/utils: Add hangle utilities to handle korean language (
dba4474) - lexikanon/utils: Add new util file with various text processing functions (
6fd05ec) - tokenizers: Add SimpleTokenizer, MecabTokenizer, NLTKTokenizer (
6154fa0) - tokenizers: Add NLTKTokenizer and NLTKTagger classes (
44aacd0) - tokenizers: Add MecabTokenizer and MecabTagger classes (
23dec65) - lexikanon/tokenizers: Add base tokenizer methods (
dc826d1) - stopwords: Add Stopwords class (
3ae64af) - lexikanon/resources/dictionaries/mecab: Add new ekon_v1 dictionary file (
5a0cb8a) - lexikanon/normalizers: Add new file normalizer.py with Normalizer class and associated configurations (
94fda7f) - lexikanon/normalizers: Add Normalizer (
342b781) - lexikanon: Add new tokenizer configuration (
2c3ce80) - tokenizer: Add nltk configuration files for tokenization and tagging (
e700d9b) - tokenizer: Add configuration for mecab tokenizer (
05caee4) - tokenizer: Add new tokenizer configuration file (
ad9c0c2) - stopwords: Add new stopwords configuration file (
d403295) - normalizer: Add new files for various character settings (
2568cdd) - dependencies: Add ftfy, nltk and ekonlpy (
6b68952)
- Python
Published by github-actions[bot] almost 3 years ago