Recent Releases of pythainlp

pythainlp - PyThaiNLP v5.1.2 Released!

PyThaiNLP v5.1.2 is a bug fix release of PyThaiNLP v5.1.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.1
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.1 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/900.

What's Changed

  • Update romanize docs and keep space #1110

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.1.1...v5.1.2

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong about 1 year ago

pythainlp - PyThaiNLP v5.1.1 Released!

PyThaiNLP v5.1.1 is a bug fix release of PyThaiNLP v5.1.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.1
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.1 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/900.

What's Changed

  • PR Description: Refactor thaiconsonantsall to Use set in syllable.py #1087 by @allrob23
  • ThaiTransliterator: Select 1D CPU int64 tensor device #1089 by @jkingd0n

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.1.0...v5.1.1

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong about 1 year ago

pythainlp - PyThaiNLP v5.1.0 Released!

We released PyThaiNLP v5.1.0! This version has increased features and fixed problems such as Thai Discourse Treebank (TDTB), Thai Solar Date converted to Thai Lunar Date, and others.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.1
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.1 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/900

What is new?

New features

  • Add Thai Discourse Treebank postag #910
  • Add Thai Universal Dependency Treebank postag #916
  • Add Thai G2P v2 Grapheme-to-Phoneme model #923
  • Add support for list of strings as input to sent_tokenize() #927
  • Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console #969
  • Add Thai Solar Date convert to Thai Lunar Date #998
  • Add Thai pangram text #1045
  • Add pythainlp.llm #1043

Bug fixes

  • Fix collate() to consider tonemark in ordering #926
  • Fix maiyamok() that expanding the wrong word #962
  • Fix nlpo3.load_dict() that never print error msg when not success #979

Remove

  • Remove clause_tokenize #1024

Deprecation and other API changes

  • 5.1
    • pythainlp.util.is_native_thai, use instead pythainlp.morpheme.is_native_thai
  • 5.2
    • pythainlp.cls, use instead pythainlp.classify
    • pythainlp.corpus.thai_synonym, use instead pythainlp.corpus.thai_synonyms
    • pythainlp.util.maiyamok, use instead pythainlp.util.expand_maiyamok

Improve

  • Add more Thailand political party to Thai dictionary https://github.com/PyThaiNLP/pythainlp/commit/2252dee57bd7be9503242fa734bf0abc48c5ddf1
  • Fix inconsistency in newmm-safe engine by copilot #1063
  • Update warn_deprecation to get deprecated and removal versions #1028
  • Remove unnecessary enumerate in expand_maiyamok #1029
  • Add SPDX FileType #1032
  • Fix bug in Longest Matching tokenizer to preprocess spaces consistently #1062
  • Add codemeta.json file to root directory #1053

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0...v5.1.0

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong over 1 year ago

pythainlp - PyThaiNLP v5.1.0-beta2

Schedule - First Beta release: 27 December 2024 - Production release: WIP

PyThaiNLP 5.1 Change Log #900

Docs: https://pythainlp.org/dev-docs/

What's Changed

  • Add pythainlp.llm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1043
  • Add How to cut a new release doc by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1051
  • Update pandas requirement from ==1.4.* to ==2.2.* by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1041
  • Bump sentence-transformers from 2.2.2 to 2.7.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1038
  • Bump pyicu from 2.8 to 2.14 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1052
  • Add pythainlp.lm.calculatengramcounts by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1054
  • Fixed #1055 bug: Tone detector + syllable sound bug by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1056
  • Fix inconsistency in newmm-safe engine by copilot by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1063
  • Fix bug in Longest Matching tokenizer to preprocess spaces consistently by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1062
  • [Ready] Reduce reload word tokenizer engine in word_tokenize by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/1064
  • Add display cell tokenizer by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1058
  • Add longest common subsequence algorithm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1059
  • Bump transformers from 4.47.1 to 4.48.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1068
  • Bump protobuf from 5.29.2 to 5.29.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1067
  • Fix custom dict error for unsupported tokenization engines by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1066
  • Add pythainlp.util.spelling by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1060
  • Add misspell command to CLI by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1057
  • Add codemeta.json file to root directory by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1069
  • Bump epitran from 1.25.1 to 1.26.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1072
  • Bump transformers from 4.48.0 to 4.48.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1071
  • Bump transformers from 4.48.1 to 4.48.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1074

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.1.0-beta1...v5.1.0-beta2

- Python
Published by wannaphong over 1 year ago

pythainlp - PyThaiNLP v5.1.0-beta1

Schedule - First Beta release: 27 December 2024 - Production release: WIP

PyThaiNLP 5.1 Change Log #900

What's Changed

  • Add Thai Universal Dependency Treebank postag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/916
  • Add Thai Discourse Treebank postag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/910
  • Update tone_detector() API description by @bact in https://github.com/PyThaiNLP/pythainlp/pull/919
  • Add save and load for pythainlp.classify.param_free.GzipModel by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/908
  • Add Thai G2P v2 Grapheme-to-Phoneme model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/923
  • Bump transformers from 4.36.0 to 4.38.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/907
  • Add preprocess function to split whitespace before romanize by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/924
  • Fix collate() to consider tonemark in ordering by @WTFPUn in https://github.com/PyThaiNLP/pythainlp/pull/926
  • test: Add more cases too covered all possible Marttra by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/929
  • Bump github/codeql-action from 2 to 3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/939
  • Bump actions/setup-python from 4 to 5 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/940
  • Bump peaceiris/actions-gh-pages from 3 to 4 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/937
  • Bump conda-incubator/setup-miniconda from 2 to 3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/936
  • Bump actions/stale from 6 to 9 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/938
  • Add support for list of strings as input to sent_tokenize() by @ayaan-qadri in https://github.com/PyThaiNLP/pythainlp/pull/927
  • Bump python-crfsuite from 0.9.9 to 0.9.11 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/943
  • Tidy up workflow files by @bact in https://github.com/PyThaiNLP/pythainlp/pull/946
  • Upgrade Python in CI to 3.10 by @bact in https://github.com/PyThaiNLP/pythainlp/pull/947
  • Fix nltk.downloader warning by @bact in https://github.com/PyThaiNLP/pythainlp/pull/949
  • Remove unused pytest by @bact in https://github.com/PyThaiNLP/pythainlp/pull/950
  • Unify unit test workflow across OSes by @bact in https://github.com/PyThaiNLP/pythainlp/pull/951
  • Specify a limited test suite by @bact in https://github.com/PyThaiNLP/pythainlp/pull/952
  • Use common warn_deprecation by @bact in https://github.com/PyThaiNLP/pythainlp/pull/956
  • Move sent_tokenize with default crfcut to testx by @bact in https://github.com/PyThaiNLP/pythainlp/pull/958
  • Merge new sent_tokenize test to fix-954 by @bact in https://github.com/PyThaiNLP/pythainlp/pull/959
  • Move more sent_tokenize test by @bact in https://github.com/PyThaiNLP/pythainlp/pull/960
  • Move more sent_tokenize test by @bact in https://github.com/PyThaiNLP/pythainlp/pull/961
  • Fix sent_tokenize(engine="whitespace") return value to be a list of string by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/957
  • Fix maiyamok() that expanding the wrong word by @bact in https://github.com/PyThaiNLP/pythainlp/pull/962
  • Add version to deprecation warnings by @bact in https://github.com/PyThaiNLP/pythainlp/pull/963
  • Remove tests with Sonarcloud issue by @bact in https://github.com/PyThaiNLP/pythainlp/pull/964
  • Add test_tools to test suite by @bact in https://github.com/PyThaiNLP/pythainlp/pull/965
  • Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console by @bact in https://github.com/PyThaiNLP/pythainlp/pull/969
  • Make CLI able to handle Unicode characters output on Windows console by @bact in https://github.com/PyThaiNLP/pythainlp/pull/968
  • Split testtag and testxtag by @bact in https://github.com/PyThaiNLP/pythainlp/pull/970
  • Add testtag to _init__ by @bact in https://github.com/PyThaiNLP/pythainlp/pull/971
  • Add testcorpus to _init__ by @bact in https://github.com/PyThaiNLP/pythainlp/pull/972
  • Add test coverage by @bact in https://github.com/PyThaiNLP/pythainlp/pull/974
  • Add test_khavee to test suite by @bact in https://github.com/PyThaiNLP/pythainlp/pull/967
  • Create CHANGELOG.md by @bact in https://github.com/PyThaiNLP/pythainlp/pull/975
  • Add Compact Tests (testc) by @bact in https://github.com/PyThaiNLP/pythainlp/pull/976
  • Add testc_tools (misspell) by @bact in https://github.com/PyThaiNLP/pythainlp/pull/977
  • Fix warnings and types by @bact in https://github.com/PyThaiNLP/pythainlp/pull/978
  • Fix nlpo3.load_dict() that never print error msg when not success by @bact in https://github.com/PyThaiNLP/pythainlp/pull/979
  • Add tests.compact.transliterate (PyICU test) by @bact in https://github.com/PyThaiNLP/pythainlp/pull/980
  • Add documentation about compact install option by @bact in https://github.com/PyThaiNLP/pythainlp/pull/981
  • Bump symspellpy from 6.7.7 to 6.7.8 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/985
  • Bump sentencepiece from 0.1.99 to 0.2.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/982
  • Bump tensorflow from 2.13.1 to 2.18.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/988
  • Bump bpemb from 0.3.4 to 0.3.6 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/989
  • Add nlpo3 to compact install/test by @bact in https://github.com/PyThaiNLP/pythainlp/pull/987
  • Bump h5py from 3.1.0 to 3.12.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/991
  • Use "build" instead of setup.py + add "[cd build]" build trigger word by @bact in https://github.com/PyThaiNLP/pythainlp/pull/994
  • Add Thai Solar Date convert to Thai Lunar Date by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/998
  • Update requests requirement from ==2.31.* to ==2.32.* by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1003
  • Bump gensim from 4.3.2 to 4.3.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1009
  • Update numpy requirement from ==1.22.* to ==1.26.* by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1007
  • Bump epitran from 1.9 to 1.25.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1006
  • Bump astral-sh/ruff-action from 1 to 2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1010
  • Bump spacy-thai from 0.7.1 to 0.7.8 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1014
  • Bump fairseq from 0.10.2 to 0.12.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1013
  • Bump transformers from 4.38.0 to 4.47.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1020
  • Bump panphon from 0.20.0 to 0.21.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1022
  • Remove clause_tokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1024
  • Update warn_deprecation to get deprecated and removal versions by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1028
  • Remove unnecessary enumerate in expand_maiyamok by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1029
  • Add SPDX FileType by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1032
  • Bump spylls from 0.1.5 to 0.1.7 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1035
  • Bump emoji from 0.5.4 to 0.6.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1036
  • Bump wtpsplit from 1.0.1 to 1.3.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1037
  • Simplify calculatefyearfdev() by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1031
  • Bump sacremoses from 0.0.41 to 0.1.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1034
  • Bump protobuf from 3.20.3 to 5.29.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1033
  • Bump protobuf from 5.29.1 to 5.29.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1042
  • Bump ufal-chu-liu-edmonds from 1.0.2 to 1.0.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1040
  • Bump transformers from 4.47.0 to 4.47.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1039
  • Bump astral-sh/ruff-action from 2 to 3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1044
  • Add Thai pangram text by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1045
  • Fixed #1004 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1046
  • PyThaiNLP v5.1.0-beta1 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1047

New Contributors

  • @WTFPUn made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/926
  • @ayaan-qadri made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/927

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.5...v5.1.0-beta1

- Python
Published by wannaphong over 1 year ago

pythainlp - PyThaiNLP v5.0.5 Released!

PyThaiNLP v5.0.5 is a bug fix release of PyThaiNLP v5.0.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.0
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.

What's Changed

  • Add clause_tokenize warnings #1026
  • Fix maiyamok() (merge back from #962)

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.4...v5.0.5

- Python
Published by wannaphong over 1 year ago

pythainlp - PyThaiNLP v5.0.4 Released!

PyThaiNLP v5.0.4 is a bug fix release of PyThaiNLP v5.0.3.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.0
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.

What's Changed

  • Fixed #914 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/917

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.3...v5.0.4

- Python
Published by wannaphong about 2 years ago

pythainlp - PyThaiNLP v5.0.3 Released!

PyThaiNLP v5.0.3 is a bug fix release of PyThaiNLP v5.0.2.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.0
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.

What's Changed

  • Create .editorconfig by @bact in https://github.com/PyThaiNLP/pythainlp/pull/909
  • Fix empty string ('') added (in some cases) when using wordtokenize with joinbroken_num=True by @S2P2 in https://github.com/PyThaiNLP/pythainlp/pull/912

New Contributors

  • @S2P2 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/912

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.2...v5.0.3

- Python
Published by wannaphong about 2 years ago

pythainlp - PyThaiNLP v5.0.2 Released!

PyThaiNLP v5.0.2 is a bug fix release of PyThaiNLP v5.0.1.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.0
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.

What's Changed

  • Update README and license header by @bact in https://github.com/PyThaiNLP/pythainlp/pull/902
  • Updated crfcut.py by @varunkatiyar819 in https://github.com/PyThaiNLP/pythainlp/pull/905

New Contributors

  • @varunkatiyar819 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/905

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.1...v5.0.2

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong about 2 years ago

pythainlp - PyThaiNLP v5.0.1 Released!

PyThaiNLP v5.0.1 is a bug fix release of PyThaiNLP v5.0.0.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.0
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.

What's Changed

  • Fixed bug: ImportError pycrfsuite #901

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0...v5.0.1

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v5.0.0 Released!

We are excited to announce the latest release of PyThaiNLP - version 5.0! PyThaiNLP is a Python library for Thai natural language processing (NLP). We are welcome to release PyThaiNLP 5.0!

With PyThaiNLP 5.0, you can expect improved performance and accuracy for NLP tasks in Thai. We have also added new functions to make your NLP tasks even easier and more efficient.

Install: pip install pythainlp Upgrade: pip install -U pythainlp

  • Documentation: https://pythainlp.github.io/docs/5.0
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.

What is new?

License information

Deprecation and other API changes

  • Change default NER to thainer-v2 https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5
  • Move pythainlp.util.is_native_thai to pythainlp.morpheme.is_native_thai https://github.com/PyThaiNLP/pythainlp/commit/524759ac1926fb9837bb9464f0a40cd984af2608

Dependency

  • Add tzdata as a dependency on Windows by @BLKSerene in #841

New API

  • Add pythainlp.coref for Thai coreference resolution #802
  • Add wtpsplit to sentence segmentation & paragraph segmentation #804 and add paragraph_threshold into paragraph_tokenize() function #806
  • Add word approximation to pythainlp.soundex.sound #809 by @wannaphong
  • Add pythainlp.wsd for Thai word sense disambiguation #818 by @wannaphong
  • Add pythainlp.chat and WangChanGLM to pythainlp.generate #819 by @wannaphong
  • Add pythainlp.cls a param-free classification model #821 by @c4n
  • Add pythainlp.el entity linking #822 by @wannaphong
  • Add pythainlp.ancient by @wannaphong in #833
  • Add pythainlp.util.rhyme by @wannaphong in #849
  • Add remove_trailing_repeat_consonants by @konbraphat51 in #862
  • Add pythainlp.util.to_idn by @wannaphong in #875
  • Add pythainlp.corpus.find_synonyms by @wannaphong in #890
  • Add pythainlp.util.morse by @wannaphong in #891
  • Add pythainlp.morpheme by @wannaphong in #896

Improve

  • Update code comments and clean up codes by @BLKSerene in #845
  • Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
  • Fix tests of khavee functions by @BLKSerene in #854
  • Update Git Actions versions by @bact in #878
  • Fix ruff args in workflow by @bact in #880
  • Revise ruff args in workflow by @bact in #881
  • Fix coref return type and add fallback by @bact in #883
  • Fix wrong/incompatible types, code readability by @bact in #884
  • Bump protobuf from 3.20 to 3.20.2 by #885
  • Add license info to /tests and README_TH.md by @bact in #886
  • phayathaibert, khavee, parse: Code clean up by @bact in #889
  • ruff: docstring-code-format = true by @bact in #892

Tokenizer

  • Add wtpsplit engine to sentence_tokenize #804
  • New paragraph_tokenize funtion to split Thai text to a paragraph #804
  • Add paragraph_threshold into paragraph_tokenize() function #806 by @pavaris-pm in
  • Add 🪿 Han-solo by @wannaphong in #830
  • Fix newmm to better handle non-Thai characters in tokens #856 by @konbraphat51
  • Fix incorrect passing of flags to re.split by @hauntsaninja in #832
  • Add syllable_tokenize by @wannaphong in #834
  • Add wanchanbertathaigrammarly by @wannaphong in #836
  • Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
  • Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856

Tag

  • Add function for pos tag with transformers by @MpolaarbearM in #857
  • Update postagtransformers function by @pavaris-pm in #865
  • Add PhayaThaiBERT engine with new features by @pavaris-pm in #873

Chat

  • Fixed bug #828

Translate

  • Add small100 to pythainlp.translate #815 by @wannaphong

Transliterate

  • Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
  • Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852

Corpus

  • Add pythainlp.corpus.thai_orst_words() Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong
  • Add pythainlp.corpus.thai_wikipedia_titles() Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51
  • Add pythainlp.corpus.thai_volubilis_words() Thai word list from Volubilis dictionary #870 by @konbraphat51
  • Add pythainlp.corpus.thai_icu_words() Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm
  • Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882

Util

  • Add pythainlp.util.encoding #813 by @wannaphong
  • Add pythainlp.util.spell_words #817 by @wannaphong
  • Add pythainlp.util.remove_trailing_repeat_consonants() #862 by @konbraphat51

New Contributors

  • @pavaris-pm made their first contribution in #806
  • @hauntsaninja made their first contribution in #832
  • @Saharshjain78 made their first contribution in #850
  • @konbraphat51 made their first contribution in #856
  • @MpolaarbearM made their first contribution in #857

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.2...v5.0.0

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v5.0.0-beta1

Schedule - First Beta release: 5 February 2024 - Production release: 10 February 2024

See 5.0 Milestone.

What is new?

License information

  • Use SPDX license identifier at the header of source code #876

Deprecation and other API changes

  • Change default NER to thainer-v2 https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5
  • Move pythainlp.util.is_native_thai to pythainlp.morpheme.is_native_thai https://github.com/PyThaiNLP/pythainlp/commit/524759ac1926fb9837bb9464f0a40cd984af2608

Dependency

  • Add tzdata as a dependency on Windows by @BLKSerene in #841

New API

  • Add pythainlp.coref for Thai coreference resolution #802
  • Add wtpsplit to sentence segmentation & paragraph segmentation #804 and add paragraph_threshold into paragraph_tokenize() function #806
  • Add word approximation to pythainlp.soundex.sound #809 by @wannaphong
  • Add pythainlp.wsd for Thai word sense disambiguation #818 by @wannaphong
  • Add pythainlp.chat and WangChanGLM to pythainlp.generate #819 by @wannaphong
  • Add pythainlp.cls a param-free classification model #821 by @c4n
  • Add pythainlp.el entity linking #822 by @wannaphong
  • Add pythainlp.ancient by @wannaphong in #833
  • Add pythainlp.util.rhyme by @wannaphong in #849
  • Add: remove_trailing_repeat_consonants by @konbraphat51 in #862
  • Add pythainlp.util.to_idn by @wannaphong in #875
  • Add pythainlp.corpus.find_synonyms by @wannaphong in #890
  • Add pythainlp.util.morse by @wannaphong in #891
  • Add pythainlp.morpheme by @wannaphong in #896

Improve

  • Update code comments and clean up codes by @BLKSerene in #845
  • Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
  • Fix tests of khavee functions by @BLKSerene in #854
  • Update Git Actions versions by @bact in #878
  • Fix ruff args in workflow by @bact in #880
  • Revise ruff args in workflow by @bact in #881
  • Fix coref return type and add fallback by @bact in #883
  • Fix wrong/incompatible types, code readability by @bact in #884
  • Bump protobuf from 3.20 to 3.20.2 by #885
  • Add license info to /tests and README_TH.md by @bact in #886
  • phayathaibert, khavee, parse: Code clean up by @bact in #889
  • ruff: docstring-code-format = true by @bact in #892

Tokenizer

  • Add wtpsplit engine to sentence_tokenize #804
  • New paragraph_tokenize funtion to split Thai text to a paragraph #804
  • Add paragraph_threshold into paragraph_tokenize() function #806 by @pavaris-pm in
  • Add 🪿 Han-solo by @wannaphong in #830
  • Fix newmm to better handle non-Thai characters in tokens #856 by @konbraphat51
  • Fix incorrect passing of flags to re.split by @hauntsaninja in #832
  • Add syllable_tokenize by @wannaphong in #834
  • Add wanchanbertathaigrammarly by @wannaphong in #836
  • Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
  • Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856

Tag

  • add function for pos tag with transformers by @MpolaarbearM in #857
  • Update postagtransformers function by @pavaris-pm in #865
  • Add PhayaThaiBERT engine with new features by @pavaris-pm in #873

Chat

  • Fixed bug #828

Translate

  • Add small100 to pythainlp.translate #815 by @wannaphong

Transliterate

  • Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
  • Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852

Corpus

  • Add pythainlp.corpus.thai_orst_words() Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong
  • Add pythainlp.corpus.thai_wikipedia_titles() Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51
  • Add pythainlp.corpus.thai_volubilis_words() Thai word list from Volubilis dictionary #870 by @konbraphat51
  • Add pythainlp.corpus.thai_icu_words() Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm
  • Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882

Util

  • Add pythainlp.util.encoding #813 by @wannaphong
  • Add pythainlp.util.spell_words #817 by @wannaphong
  • Add pythainlp.util.remove_trailing_repeat_consonants() #862 by @konbraphat51

New Contributors

  • @pavaris-pm made their first contribution in #806
  • @hauntsaninja made their first contribution in #832
  • @Saharshjain78 made their first contribution in #850
  • @konbraphat51 made their first contribution in #856
  • @MpolaarbearM made their first contribution in #857

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v5.0.0-dev2

What's Changed

  • Add pythainlp.morpheme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/896

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0-dev1...v5.0.0-dev2

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v5.0.0-dev1

What's Changed

  • Add Thai word list from Volubilis dictionary by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/870
  • Add Thai word list from Thai Wikipedia titles by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/869
  • switch PyThaiNLP source code to SPDX license ID by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/876
  • Add pythainlp.util.to_idn by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/875
  • Update Git Actions versions by @bact in https://github.com/PyThaiNLP/pythainlp/pull/878
  • Fix ruff args in workflow by @bact in https://github.com/PyThaiNLP/pythainlp/pull/880
  • Revise ruff args in workflow by @bact in https://github.com/PyThaiNLP/pythainlp/pull/881
  • Add Thai word list from ICU BreakIterator dictionary by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/879
  • Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in https://github.com/PyThaiNLP/pythainlp/pull/882
  • Fix coref return type and add fallback by @bact in https://github.com/PyThaiNLP/pythainlp/pull/883
  • Fix wrong/incompatible types, code readability by @bact in https://github.com/PyThaiNLP/pythainlp/pull/884
  • Bump protobuf from 3.20 to 3.20.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/885
  • Add license info to /tests and README_TH.md by @bact in https://github.com/PyThaiNLP/pythainlp/pull/886
  • Add PhayaThaiBERT engine with new features [WIP] by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/873
  • phayathaibert, khavee, parse: Code clean up by @bact in https://github.com/PyThaiNLP/pythainlp/pull/889
  • Add pythainlp.corpus.find_synonyms by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/890
  • ruff: docstring-code-format = true by @bact in https://github.com/PyThaiNLP/pythainlp/pull/892
  • Add pythainlp.util.morse by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/891

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0-dev0...v5.0.0-dev1

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v5.0.0-dev0

What's Changed

  • Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/844
  • Update code comments and clean up codes by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/845
  • Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in https://github.com/PyThaiNLP/pythainlp/pull/850
  • Fix ISO 11940 duplicate keys by @bact in https://github.com/PyThaiNLP/pythainlp/pull/851
  • Add pythainlp.util.rhyme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/849
  • Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/852
  • Fix tests of khavee functions by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/854
  • Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/856
  • add function for pos tag with transformers by @MpolaarbearM in https://github.com/PyThaiNLP/pythainlp/pull/857
  • Add: removetrailingrepeat_consonants() by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/862
  • Update pos_tag_transformers function by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/865

New Contributors

  • @Saharshjain78 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/850
  • @konbraphat51 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/856
  • @MpolaarbearM made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/857

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta5...v5.0.0-dev0

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v4.1.0-beta5

Docs: https://pythainlp.github.io/dev-docs/ Report bug: https://github.com/PyThaiNLP/pythainlp/issues

Install: pip install --pre pythanlp

See 4.1 Milestone.

What's Changed

  • Fix "List of possible extras" in README by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/839
  • Add tzdata as a dependency on Windows by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/841

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta4...v4.1.0-beta5

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v4.1.0-beta4

Docs: https://pythainlp.github.io/dev-docs/ Report bug: https://github.com/PyThaiNLP/pythainlp/issues

Install: pip install --pre pythanlp

See 4.1 Milestone.

What's Changed

  • Fix incorrect passing of flags to re.split by @hauntsaninja in https://github.com/PyThaiNLP/pythainlp/pull/832
  • Add pythainlp.ancient by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/833
  • Add syllable_tokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/834
  • Add wanchanbertathaigrammarly by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/836

New Contributors

  • @hauntsaninja made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/832

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta3...v4.1.0-beta4

- Python
Published by wannaphong over 2 years ago

pythainlp - PyThaiNLP v4.1.0-beta3

What's Changed

  • Add 🪿 Han-solo by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/830

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta2...v4.1.0-beta3

- Python
Published by wannaphong almost 3 years ago

pythainlp - PyThaiNLP v4.1.0-beta2

What is change? - Fixed bug #828. Thank you @tonezzz for reporting!

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta1...v4.1.0-beta2

- Python
Published by wannaphong almost 3 years ago

pythainlp - PyThaiNLP v4.1.0-beta1

Schedule - First Beta release: 24 July 2023

Docs: https://pythainlp.github.io/dev-docs/ Report bug: https://github.com/PyThaiNLP/pythainlp/issues

Install: pip install --pre pythanlp

See 4.1 Milestone.

What is new?

Deprecation and other API changes

  • https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5 Change the default NER to thainer-v2

New API

  • Add pythainlp.coref: Add pythainlp.coref for support Thai Coreference resolution #802
  • Add wtpsplit to sentence segmentation & paragraph segmentation #804 and add paragraphthreshold into paragraphtokenize function #806
  • Add word approximation to pythainlp.soundex.sound by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/809
  • Add pythainlp.wsd for Thai Word Sense Disambiguation by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/818
  • Add pythainlp.chat and WangChanGLM to pythainlp.generate by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/819
  • Add a param-free classification model (pythainlp.cls) by @c4n in https://github.com/PyThaiNLP/pythainlp/pull/821
  • Add pythainlp.el by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/822
  • Add pythainlp.util.abbreviationtofull_text #826 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/826

Tokenizer

  • Add wtpsplit engine to sentence_tokenize #804
  • New paragraph_tokenize funtion to split Thai text to a paragraph. #804
  • add paragraph_threshold into paragraph_tokenize function by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/806

Translate

  • Add small100 to pythainlp.translate by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/815

Corpus

  • Add orst list by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/810
  • Add thai_synonym #825 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/825

Util

  • Add pythainlp.util.encoding by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/813
  • Add pythainlp.util.spell_words by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/817
  • Add pythainlp.util.abbreviationtofull_text #826 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/826

New Contributors

  • @pavaris-pm made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/806
  • @falukelo made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/824

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.0...v4.1.0-beta1

- Python
Published by wannaphong almost 3 years ago

pythainlp - PyThaiNLP v4.0.2 Released!

PyThaiNLP v4.0.2 is a bug fix release of PyThaiNLP v4.0.

Upgrade: pip install -U pythainlp

Documentation: https://pythainlp.github.io/docs/4.0

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 4.0 Change Log

What's Changed

  • fixed bug by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/798
  • fig เอือน อวน by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/799

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.1...v4.0.2

Contributors

Thanks all the contributors. (Image made with contributors-img)

If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.

- Python
Published by wannaphong about 3 years ago

pythainlp - PyThaiNLP v4.0.1 Released!

PyThaiNLP v4.0.1 is a bug fix release of PyThaiNLP v4.0.

Upgrade: pip install -U pythainlp

Documentation: https://pythainlp.github.io/docs/4.0

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 4.0 Change Log

What's Changed

  • Fix mishandling Karun in Kavee Matra Checker by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/793
  • adding tonemark removal to fix mattra checking by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/795

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.0...v4.0.1

Contributors

Thanks all the contributors. (Image made with contributors-img)

If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.

- Python
Published by wannaphong about 3 years ago

pythainlp - PyThaiNLP 4.0 Released!

PyThaiNLP published the first version is 0.0.4 to PyPI at 6 years ago, so PyThaiNLP 4.0 will have special codename. The codename for PyThaiNLP 4.0 is PyThaiNLP 4.0 (Real).

See 4.0 Milestone.

Documentation: https://pythainlp.github.io/docs/4.0

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 4.0 Change Log

If you want to contribute to PyThaiNLP, you can read Contributing to PyThaiNLP.

What is new?

Deprecation and other API changes

  • Delete all LST20 model #728
  • https://github.com/PyThaiNLP/pythainlp/commit/947c7be9ce4199af58ecf042629dda4d752dbcd6 Change pythainlp.tools.misspell to pythainlp.tools.misspell.misspell

Improve

  • Reduce import time #719
  • Fix/broken numeric data format (#652) #723

Tokenizer

  • Add blackboard cls #732
  • Add rule to TCC and Change TCC rule for newmm #741

Tag

  • Add blackboard pos_tag #733
  • Add ThaiNER 2.0 #781

Util

  • Add pythainlp.util.countthaichars #748
  • Add thaistrptime and convertyears #767

Transliterate

  • Add Thai2Rom ONNX model #743

Khavee

  • add khavee to pythainlp #777
  • add aek/too checker function to khavee #779

Parse

  • Add ud_goeswith #757

Corpus

  • Add new science word #763

Full Changelog

  • Improve: Reduce import time by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/719
  • Create CITATION.cff by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/721
  • Fix/broken numeric data format (#652) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/723
  • Add blackboard pos_tag to cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/734
  • Update perceptron.py by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/736
  • Feature/integrate transliteration dictionary (#681) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/735
  • Delete all LST20 model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/728
  • Add blackboard cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/732
  • Add blackboard pos_tag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/733
  • Add style.css: extend docs page width by @LXZE in https://github.com/PyThaiNLP/pythainlp/pull/742
  • Add rule to TCC and Change TCC rule for newmm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/741
  • Setup action to check for code formatting by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/746
  • Add more test for TCC by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/747
  • Add Thai2Rom ONNX model by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/743
  • Add pythainlp.util.countthaichars by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/748
  • Feature: keyword extraction with keybert and frequency ranking by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/751
  • Add ud_goeswith by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/757
  • Bump tensorflow from 2.7.2 to 2.9.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/758
  • Add new science word by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/763
  • Add thaistrptime and convertyears by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/767
  • Fix typo in thaifullmonth_lists for February by @PhakphumV in https://github.com/PyThaiNLP/pythainlp/pull/770
  • Add pythainlp.util.phoneme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/772
  • Add remove tone ipa by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/776
  • add khavee to pythainlp by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/777
  • Add khavee docs tests by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/778
  • add aek/too checker function to khavee by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/779
  • Add Thai NER 2.0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/781
  • Add Copyright to the header files by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/782
  • Fixed some issues in Khavee. It's a problem with use อ by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/785
  • PyThaiNLP 4.0 beta 1 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/786
  • fix some bugs and add checkkarulahu function by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/787
  • PyThaiNLP 4.0 Released! by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/789

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0...v4.0.0

Contributors

Thanks all the contributors. (Image made with contributors-img)

If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.

New Contributors

  • @LXZE made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/742
  • @new5558 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/746
  • @PhakphumV made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/770
  • @kangkengkhadev made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/777
  • @HRNPH made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/779

- Python
Published by wannaphong about 3 years ago

pythainlp - PyThaiNLP v4.0.0-beta1

This post will give you the change log for PyThaiNLP 4.0. PyThaiNLP published the first version is 0.0.4 to PyPI at 6 years ago, so PyThaiNLP 4.0 will have special codename. The codename for PyThaiNLP 4.0 is PyThaiNLP 4.0 (Real).

This release is the first beta release of PyThaiNLP 4.0.

Schedule - Beta release: 1 April 2023 - Production release: 14 April 2023

See 4.0 Milestone.

What is new?

Deprecation and other API changes

  • Delete all LST20 model #728
  • https://github.com/PyThaiNLP/pythainlp/commit/947c7be9ce4199af58ecf042629dda4d752dbcd6 Change pythainlp.tools.misspell to pythainlp.tools.misspell.misspell

Improve

  • Reduce import time #719
  • Fix/broken numeric data format (#652) #723

Tokenizer

  • Add blackboard cls #732
  • Add rule to TCC and Change TCC rule for newmm #741

Tag

  • Add blackboard pos_tag #733
  • Add ThaiNER 2.0 #781

Util

  • Add pythainlp.util.countthaichars #748
  • Add thaistrptime and convertyears #767

Transliterate

  • Add Thai2Rom ONNX model #743

Khavee

  • add khavee to pythainlp #777
  • add aek/too checker function to khavee #779

Parse

  • Add ud_goeswith #757

Corpus

  • Add new science word #763

What's Changed

  • Improve: Reduce import time by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/719
  • Create CITATION.cff by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/721
  • Fix/broken numeric data format (#652) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/723
  • Add blackboard pos_tag to cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/734
  • Update perceptron.py by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/736
  • Feature/integrate transliteration dictionary (#681) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/735
  • Delete all LST20 model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/728
  • Add blackboard cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/732
  • Add blackboard pos_tag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/733
  • Add style.css: extend docs page width by @LXZE in https://github.com/PyThaiNLP/pythainlp/pull/742
  • Add rule to TCC and Change TCC rule for newmm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/741
  • Setup action to check for code formatting by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/746
  • Add more test for TCC by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/747
  • Add Thai2Rom ONNX model by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/743
  • Add pythainlp.util.countthaichars by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/748
  • Feature: keyword extraction with keybert and frequency ranking by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/751
  • Add ud_goeswith by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/757
  • Bump tensorflow from 2.7.2 to 2.9.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/758
  • Add new science word by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/763
  • Add thaistrptime and convertyears by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/767
  • Fix typo in thaifullmonth_lists for February by @PhakphumV in https://github.com/PyThaiNLP/pythainlp/pull/770
  • Add pythainlp.util.phoneme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/772
  • Add remove tone ipa by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/776
  • add khavee to pythainlp by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/777
  • Add khavee docs tests by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/778
  • add aek/too checker function to khavee by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/779
  • Add Thai NER 2.0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/781
  • Add Copyright to the header files by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/782
  • Fixed some issues in Khavee. It's a problem with use อ by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/785
  • PyThaiNLP 4.0 beta 1 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/786

New Contributors

  • @LXZE made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/742
  • @new5558 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/746
  • @PhakphumV made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/770
  • @kangkengkhadev made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/777
  • @HRNPH made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/779

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0...v4.0.0-beta1

- Python
Published by wannaphong about 3 years ago

pythainlp - PyThaiNLP v3.1.1 Released!

PyThaiNLP v3.1.1 is the releases updates of PyThaiNLP v3.1.0.

What's Changed

  • pythainlp.tools.misspell changed to pythainlp.tools.misspell.misspell.
  • Add Reduce import time #719 to PyThaiNLP 3.1.1 #753
  • Doc: Lst20 deprecation warning for 3.1.1 (#749) #752 (Thank you @noppayut)

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0...v3.1.1

You can install or upgrade by pip install pythainlp==3.1.1.

Documentation: https://pythainlp.github.io/docs/3.1

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.1.0 Released!

This is the release version for PyThaiNLP v3.1.0

You can install by pip install pythainlp==3.1.0.

Documentation: https://pythainlp.github.io/docs/3.1

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

What is new?

Deprecation and other API changes

687 Remove deprecated function

  • pythainlp.wordvector; doesntmatch, getmodel, mostsimilarcosmul, sentencevectorizer, similarity. use WordVector class instead
  • pythainlp.util.deletetone. use pythainlp.util.removetonemark instead
  • Remove pythainlp.util.timetime. use pythainlp.util.timeto_thaiword instead
  • pythainlp.tokenize.syllabletokenize. use pythainlp.tokenize.subwordtokenize instead

Dependency Parsing

  • Now, PyThaiNLP support dependencyparsing 🎉 Add pythainlp.parse.dependencyparsing https://github.com/PyThaiNLP/pythainlp/pull/706

Name Entity Tagging

  • #665 Add Thai-NNER pythainlp.tag.NNER
  • #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by WangchanBERTa model.

Transliteration

  • #659 Add ISO 11940 transliteration
  • #660 Add Thai W2P v0.2
  • #686 Add wunsen
  • #694 Wunsen Mandarin and Japanese update

PyThaiNLP Corpus downloader

  • #656 Add support zip/tar.gz to download corpus

Text normalization

  • #673 Add a normalising rule for Lakkhangyao ๅ

Translate

  • #674 add gpu option

Text summarize

  • #679 Add mt5 cpe kmutt thai sentence sum

Util

  • #682 Add live-dead syllable classification
  • #684 Add live dead syllable classify
  • #690 Add tone detector

Soundex

  • #699 Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique

Other

  • #689 map NG tag to PART
  • #691 Remove TinyDB as a dependency
  • #692 Fix notifications that newer versions of corpora are available
  • Add warning about LST20 license

Contributors

New Contributors

  • @chameleonTK made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/673
  • @vikimark made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/674
  • @BLKSerene made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/691
  • @cakimpei made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/694

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.10...v3.1.0

All Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.0.10 Released!

PyThaiNLP v3.0.10 is This release is a bug fix release of PyThaiNLP v3.0.9.

Bug Fixed - Fixed Wrong tag mapping from lst20 to UD #711

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.9...v3.0.10

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.1.0-beta0

This is the beta version for PyThaiNLP v3.1.

You can install by pip install --pre pythainlp==3.1.0b0.

Documentation: https://pythainlp.github.io/dev-docs/

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

What is new?

Deprecation and other API changes

687 Remove deprecated function

  • pythainlp.wordvector; doesntmatch, getmodel, mostsimilarcosmul, sentencevectorizer, similarity. use WordVector class instead
  • pythainlp.util.deletetone. use pythainlp.util.removetonemark instead
  • Remove pythainlp.util.timetime. use pythainlp.util.timeto_thaiword instead
  • pythainlp.tokenize.syllabletokenize. use pythainlp.tokenize.subwordtokenize instead

Dependency Parsing

  • Now, PyThaiNLP support dependencyparsing 🎉 Add pythainlp.parse.dependencyparsing https://github.com/PyThaiNLP/pythainlp/pull/706

Name Entity Tagging

  • #665 Add Thai-NNER pythainlp.tag.NNER
  • #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by WangchanBERTa model.

Transliteration

  • #659 Add ISO 11940 transliteration
  • #660 Add Thai W2P v0.2
  • #686 Add wunsen
  • #694 Wunsen Mandarin and Japanese update

PyThaiNLP Corpus downloader

  • #656 Add support zip/tar.gz to download corpus

Text normalization

  • #673 Add a normalising rule for Lakkhangyao ๅ

Translate

  • #674 add gpu option

Text summarize

  • #679 Add mt5 cpe kmutt thai sentence sum

Util

  • #682 Add live-dead syllable classification
  • #684 Add live dead syllable classify
  • #690 Add tone detector

Soundex

  • #699 Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique

Other

  • #689 map NG tag to PART
  • #691 Remove TinyDB as a dependency
  • #692 Fix notifications that newer versions of corpora are available
  • Add warning about LST20 license

What's Changed

  • Add more words from Royal Society by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/653
  • Add support zip/tar.gz to download corpus by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/656
  • Update from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/657
  • Add ISO 11940 transliteration by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/659
  • Add Thai W2P v0.2 and PyThaiNLP v3.0.6dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/660
  • Add LST20NER onnx model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/658
  • Add Thai-NNER by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/665
  • Update dev base from 3.0 base by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/668
  • PyThaiNLP 3.0.7 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/670
  • Update dev branche from pythainlp-3.0 branche by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/672
  • Normalise Lakkhangyao by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/673
  • add gpu option by @vikimark in https://github.com/PyThaiNLP/pythainlp/pull/674
  • Bump tensorflow from 2.5.3 to 2.6.4 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/677
  • Bump tensorflow from 2.6.4 to 2.7.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/678
  • Add mt5 cpe kmutt thai sentence sum by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/679
  • Add live-dead syllable classification by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/682
  • Fixed CI Bug by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/683
  • Add live dead syllable classify by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/684
  • Add wunsen by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/686
  • Add ThaiSum sentence segmentor by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/688
  • map NG tag to PART by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/689
  • Add tone detector by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/690
  • Remove deprecated function by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/687
  • Remove TinyDB as a dependency by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/691
  • Fix notifications that newer versions of corpora are available by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/692
  • Start PyThaiNLP v3.1.0-dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/693
  • Wunsen Mandarin and Japanese update by @cakimpei in https://github.com/PyThaiNLP/pythainlp/pull/694
  • Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/699
  • Fixed #700 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/701
  • Update add-word_detokenize from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/703
  • Add word_detokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/697
  • Move model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/705
  • Add pythainlp.parse.dependency_parsing by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/706

New Contributors

  • @chameleonTK made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/673
  • @vikimark made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/674
  • @BLKSerene made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/691
  • @cakimpei made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/694

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.9...v3.1.0-beta0

All Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.1.0-dev3

This is a development release for PyThaiNLP v3.1.

You can install by pip install --pre pythainlp==3.1.0.dev3.

Documentation: https://pythainlp.github.io/dev-docs/

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

What's Changed

  • Move model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/705
  • Add pythainlp.parse.dependency_parsing by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/706

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0-dev2...v3.1.0-dev3

All Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.1.0-dev2

This is the development release for PyThaiNLP v3.1.

You can install by pip install --pre pythainlp==3.1.0.dev2.

Documentation: https://pythainlp.github.io/dev-docs/

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

What's Changed

  • Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/699
  • Fixed #700 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/701
  • Update add-word_detokenize from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/703
  • Add word_detokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/697

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0-dev1...v3.1.0-dev2

All Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.0.9 Released!

PyThaiNLP v3.0.9 is This release is a bug fix release of PyThaiNLP v3.0.8.

Bug Fixed - Fixed Thai w2p model version is 0.1 https://github.com/PyThaiNLP/pythainlp/commit/b1cddd934c9224e0f513b6ccb71021a8a3c51260

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 3 years ago

pythainlp - PyThaiNLP v3.1.0-dev1

This is the development release for PyThaiNLP v3.1.

You can install by pip install --pre pythainlp==3.1.0.dev1.

Documentation: https://pythainlp.github.io/dev-docs/

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

What's Changed

  • Wunsen Mandarin and Japanese update by @cakimpei in https://github.com/PyThaiNLP/pythainlp/pull/694

New Contributors

  • @cakimpei made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/694

All Contributors

Thanks all the contributors. (Image made with contributors-img)

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0-dev0...v3.1.0-dev1

- Python
Published by wannaphong almost 4 years ago

pythainlp - PyThaiNLP v3.1.0-dev0

This is the first development release for PyThaiNLP v3.1.

You can install by pip install --pre pythainlp==3.1.0.dev0.

Documentation: https://pythainlp.github.io/dev-docs/

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.1 change log

See 3.1 Milestone.

What is new?

Deprecation and other API changes

687 Remove deprecated function

  • pythainlp.wordvector; doesntmatch, getmodel, mostsimilarcosmul, sentencevectorizer, similarity. use WordVector class instead
  • pythainlp.util.deletetone. use pythainlp.util.removetonemark instead
  • Remove pythainlp.util.timetime. use pythainlp.util.timeto_thaiword instead
  • pythainlp.tokenize.syllabletokenize. use pythainlp.tokenize.subwordtokenize instead

Name Entity Tagging

  • #665 Add Thai-NNER pythainlp.tag.NNER
  • #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by WangchanBERTa model.

Transliteration

  • #659 Add ISO 11940 transliteration
  • #660 Add Thai W2P v0.2
  • #686 Add wunsen

PyThaiNLP Corpus downloader

  • #656 Add support zip/tar.gz to download corpus

Text normalization

  • #673 Add a normalising rule for Lakkhangyao ๅ

Translate

  • #674 add gpu option

Text summarize

  • #679 Add mt5 cpe kmutt thai sentence sum

Util

  • #682 Add live-dead syllable classification
  • #684 Add live dead syllable classify
  • #690 Add tone detector

Other

  • #689 map NG tag to PART
  • #691 Remove TinyDB as a dependency
  • #692 Fix notifications that newer versions of corpora are available

What's Changed

  • Add more words from Royal Society by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/653
  • Add support zip/tar.gz to download corpus by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/656
  • Update from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/657
  • Add ISO 11940 transliteration by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/659
  • Add Thai W2P v0.2 and PyThaiNLP v3.0.6dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/660
  • Add LST20NER onnx model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/658
  • Add Thai-NNER by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/665
  • Update dev base from 3.0 base by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/668
  • PyThaiNLP 3.0.7 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/670
  • Update dev branche from pythainlp-3.0 branche by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/672
  • Normalise Lakkhangyao by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/673
  • add gpu option by @vikimark in https://github.com/PyThaiNLP/pythainlp/pull/674
  • Bump tensorflow from 2.5.3 to 2.6.4 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/677
  • Bump tensorflow from 2.6.4 to 2.7.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/678
  • Add mt5 cpe kmutt thai sentence sum by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/679
  • Add live-dead syllable classification by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/682
  • Fixed CI Bug by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/683
  • Add live dead syllable classify by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/684
  • Add wunsen by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/686
  • Add ThaiSum sentence segmentor by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/688
  • map NG tag to PART by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/689
  • Add tone detector by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/690
  • Remove deprecated function by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/687
  • Remove TinyDB as a dependency by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/691
  • Fix notifications that newer versions of corpora are available by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/692
  • Start PyThaiNLP v3.1.0-dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/693

Contributors

New Contributors

  • @chameleonTK made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/673
  • @vikimark made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/674
  • @BLKSerene made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/691

All Contributors

Thanks all the contributors. (Image made with contributors-img)

Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.8...v3.1.0-dev0

- Python
Published by wannaphong almost 4 years ago

pythainlp - PyThaiNLP v3.0.8 Released!

PyThaiNLP v3.0.8 is This release is a bug fix release of PyThaiNLP 3.0.7.

Bug Fixed - Fixed nercut bug. https://github.com/PyThaiNLP/pythainlp/pull/671 Thank you @kmining for your bug report.

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong about 4 years ago

pythainlp - PyThaiNLP v3.0.7 Released!

PyThaiNLP v3.0.7 is This release is a bug fix release of PyThaiNLP 3.0.5.

Bug Fixed - Fixed nercut bug. https://github.com/PyThaiNLP/pythainlp/issues/666 Thank you @kmining for your bug report.

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change loghttps://github.com/PyThaiNLP/pythainlp/issues/545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong about 4 years ago

pythainlp - PyThaiNLP v3.0.6 Released!

PyThaiNLP v3.0.6 is This release is a bug fix release of PyThaiNLP 3.0.5.

Bug Fixed - Fixed nercut bug. #666 Thank you @kmining for your bug report.

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong about 4 years ago

pythainlp - PyThaiNLP v3.0.5 Released!

PyThaiNLP v3.0.5 is This release is a bug fix release of PyThaiNLP 3.0.4.

Bug Fixed - Fixed nercut bug. https://github.com/PyThaiNLP/pythainlp/commit/e9b89628c89dacc7b992dbe7c140e38f3ee52869

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.4 Released!

PyThaiNLP v3.0.4 is This release is a bug fix release of PyThaiNLP 3.0.3.

Bug Fixed - Remove pythainlp.tag.named_entity.ThaiNameTagger to fixed import pycrfsuite. https://github.com/PyThaiNLP/pythainlp/commit/cc628d8cde6d3ea83d22f0d582398a0fdcbe6d84

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.3 Released!

PyThaiNLP v3.0.3 is This release is a bug fix release of PyThaiNLP 3.0.2.

Bug Fixed - Fixed TypeError in pythainlp.spell.symspellpy https://github.com/PyThaiNLP/pythainlp/issues/650

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.2 Release!

PyThaiNLP v3.0.2 is This release is a bug fix release of PyThaiNLP 3.0.1.

Bug Fixed - Fixed some wrong code. from #645

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.1 Release!

PyThaiNLP v3.0.1 is This release is a bug fix release of PyThaiNLP 3.0.

Bug Fixed - Remove warning message in pythainlp.tag.thainer. Fixed #644 - Add PYTHAINLPREADMODE environment variable is config PyThaiNLP to read-only mode. Fixed #645

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

Contributors

Thanks all the contributors. (Image made with contributors-img)

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.0 Released!

After a long time of the development of PyThaiNLP 3.0, We released PyThaiNLP 3.0. PyThaiNLP 3.0 has many improvements and new features to help with Thai language processing tasks.

You can install by pip install pythainlp or upgrade by pip install -U pythainlp.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log#545

If you want to contribute to PyThaiNLP, you can read Contributing to PyThaiNLP.

News

Since PyThaiNLP 3.0, We will end supporting PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.2.

We have updated the Thai word dictionary & rule for newmm. We recommend retraining your model if you use newmm for word tokenization in your model.

What is new?

Deprecation and other API changes

  • Deprecated syllabletokenize. `syllabletokenizeis deprecated, usesubword_tokenize` instead
  • pythainlp.tag.named_entity.ThaiNameTagger is change to pythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 3.1.

Augment

  • Add Thai Text Augmentation

Corpus

  • Fix lots of misspellings in the dictionary (words_th.txt)
  • Add getcorpusdefaultdb and thainer 1.5 model. You can add corpus on `defaultdb.json`, and you don't load the last trainer model from the Internet.

Tag

  • Add TLTK (postag and ner) - add TLTK wrapper to pythainlp functions ex ner, wordtokenize and more.
  • Add NER class - NER class for Named-entity recognizer tasks.

Translate

  • Add pythainlp.translate.Translate Class
  • Add Chinese-Thai Machine Translation
  • Add Thai-French Machine Translation

Tokenization

  • Tokenize repeating dots and commas from numbers
  • Fix tokenmaxlen bug that makes it always zero
  • Tokenize repeating dots and commas from numbers (fix #461)
  • Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
  • Add SEFR CUT to pythainlp
  • Add TLTK (sentencetokenize and wordtokenize) - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
  • Add nlpo3

Transliterate

  • Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
  • Manually merge update-royin branch with dev branch to add O-ANG rule
  • Add TLTK (g2p and ipa) - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
  • Add pythainlp.transliterate.puan

Word Vector

  • Fix tokenmaxlen bug that makes it always zero
  • Add pythainlp.word_vector.WordVector

Spell

  • Add more spelling engine
  • Add TLTK (spell) - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.

Generate

  • Add pythainlp.generate to generate a text.

Tool

  • Add misspell module

Other

  • Add TLTK - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
  • Update requirements from ssg 0.0.6 to ssg 0.0.8
  • Spoonerism: Add supports words more three syllables
  • Add maiyamok; This function is preprocessing MaiYaMok in a Thai sentence.

Contributors

Thanks all the contributors. (Image made with contributors-img)

If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.

This year is the 6th year's PyThaiNLP, and PyThaiNLP has more than one million downloads. I started to develop PyThaiNLP to help me do Thai language processing tasks. Now, PyThaiNLP has been used in many research and works worldwide. PyThaiNLP can't be grown if it doesn't have contributors, sponsors, and users.

Thank you for all supporting.

Thank you for using PyThaiNLP.

Wannaphong Phatthiyaphaibun

PyThaiNLP Founder

27 January 2022

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.0-beta0

PyThaiNLP 3.0 have many improvement and new features to help you in Thai language processing tasks. This release is PyThaiNLP v3.0.0-beta0. It is The first beta release of PyThaiNLP 3.0

You can install by pip install pythainlp==3.0.0b0.

Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 3.0 change log #545

If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.

News

Since PyThaiNLP 3.0, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.2. We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.

What is new?

Deprecation and other API changes

  • Deprecated syllabletokenize. `syllabletokenizeis deprecated, usesubword_tokenize` instead
  • pythainlp.tag.named_entity.ThaiNameTagger is change to pythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 3.1.

Augment

  • Add Thai Text Augmentation

Corpus

  • Fix lots of misspellings in dictionary (words_th.txt)
  • Add getcorpusdefaultdb and thainer 1.5 model. Now, You can add corpus on `defaultdb.json` and you dont load last thainer model from Internet.

Tag

  • Add tltk (postag and ner) - add tltk wrapper to pythainlp functions ex ner, wordtokenize and more.
  • Add NER class - NER class for Named-entity recognizer tasks.

Translate

  • Add pythainlp.translate.Translate Class
  • Add Chinese-Thai Machine Translation

Tokenization

  • Tokenize repeating dots and commas from numbers
  • Fix tokenmaxlen bug that makes it always zero
  • Tokenize repeating dots and commas from numbers (fix #461)
  • Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
  • Add SEFR CUT to pythainlp
  • Add tltk (sentencetokenize and wordtokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • Add nlpo3

Transliterate

  • Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
  • Manually merge update-royin branch with dev branch to add O-ANG rule
  • Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • Add pythainlp.transliterate.puan

Word Vector

  • Fix tokenmaxlen bug that makes it always zero
  • Add pythainlp.word_vector.WordVector

Spell

  • Add more spelling engine
  • Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Generate

  • Add pythainlp.generate

Tool

  • Add misspell module

Other

  • Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • Update requirements from ssg 0.0.6 to ssg 0.0.8
  • Spoonerism: Add supports words more 3 syllables
  • Add maiyamok; This function is preprocessing MaiYaMok in Thai sentence.

Contributors

Thanks all the contributors. (Image made with contributors-img)

If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.

PyThaiNLP #ThaiNLP

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v3.0.0-dev0

PyThaiNLP v3.0.0-dev0 is The first development release of PyThaiNLP 3.0 (For development only)

Docs: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues GitHub: https://github.com/PyThaiNLP/pythainlp

News

Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1 We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.

What is new?

Deprecation and other API changes

  • #550 Deprecated syllabletokenize. `syllabletokenizeis deprecated, usesubword_tokenize` instead
  • https://github.com/PyThaiNLP/pythainlp/commit/701fb3a7842b3abd0b2318ba9074f1902c2f32e9 pythainlp.tag.named_entity.ThaiNameTagger is change to pythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 2.5.

Augment

  • #580 Add Thai Text Augmentation

Corpus

  • #557 Fix lots of misspellings in dictionary (words_th.txt)
  • #576 Add getcorpusdefaultdb and thainer 1.5 model. Now, You can add corpus on `defaultdb.json` and you dont load last thainer model from Internet.

Tag

  • #599 Add tltk (postag and ner) - add tltk wrapper to pythainlp functions ex ner, wordtokenize and more.
  • #600 Add NER class - NER class for Named-entity recognizer tasks.

Translate

  • #589 Add pythainlp.translate.Translate Class
  • #588 Add Chinese-Thai Machine Translation

Tokenization

  • #562 Tokenize repeating dots and commas from numbers
  • #585 Fix tokenmaxlen bug that makes it always zero
  • #562 Tokenize repeating dots and commas from numbers (fix #461)
  • #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
  • https://github.com/PyThaiNLP/pythainlp/commit/314411086707b60ba8790724301224916f4670b8 Add SEFR CUT to pythainlp
  • #599 Add tltk (sentencetokenize and wordtokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • #622 Add nlpo3

Transliterate

  • #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
  • #585 Manually merge update-royin branch with dev branch to add O-ANG rule
  • #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • #624 Add pythainlp.transliterate.puan

Word Vector

  • #573 Fix tokenmaxlen bug that makes it always zero
  • #583 Add pythainlp.word_vector.WordVector

Spell

  • #591 Add more spelling engine
  • #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Generate

  • #579 Add pythainlp.generate

Tool

  • #614 Add misspell module

Other

  • #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • https://github.com/PyThaiNLP/pythainlp/commit/e357cf8f9b626e3a633dc33b8557fe45dc837aba Update requirements from ssg 0.0.6 to ssg 0.0.8
  • Spoonerism: Add supports words more 3 syllables #631
  • Add maiyamok #623 This function is preprocessing MaiYaMok in Thai sentence.

- Python
Published by wannaphong over 4 years ago

pythainlp - PyThaiNLP v2.3.2 Release!

PyThaiNLP v2.3.2 is This release is a bug fix release of PyThaiNLP 2.3.

Bug Fixed - Fixed clause_tokenize returns an empty list. #609

Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

- Python
Published by wannaphong almost 5 years ago

pythainlp - PyThaiNLP v2.4.0-dev0

PyThaiNLP v2.4.0-dev0 is The first development release of PyThaiNLP 2.4 (For development only)

Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.4 change log #545

News

Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1 We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.

Deprecation and other API changes

  • #550 Deprecated syllabletokenize. `syllabletokenizeis deprecated, usesubword_tokenize` instead
  • https://github.com/PyThaiNLP/pythainlp/commit/701fb3a7842b3abd0b2318ba9074f1902c2f32e9 pythainlp.tag.named_entity.ThaiNameTagger is change to pythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 2.5.

Augment

  • #580 Add Thai Text Augmentation

Corpus

  • #557 Fix lots of misspellings in dictionary (words_th.txt)
  • #576 Add getcorpusdefaultdb and thainer 1.5 model. Now, You can add corpus on `defaultdb.json` and you dont load last thainer model from Internet.

Tag

  • #599 Add tltk (postag and ner) - add tltk wrapper to pythainlp functions ex ner, wordtokenize and more.
  • #600 Add NER class - NER class for Named-entity recognizer tasks.

Translate

  • #589 Add pythainlp.translate.Translate Class
  • #588 Add Chinese-Thai Machine Translation

Tokenization

  • #562 Tokenize repeating dots and commas from numbers
  • #585 Fix tokenmaxlen bug that makes it always zero
  • #562 Tokenize repeating dots and commas from numbers (fix #461)
  • #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
  • https://github.com/PyThaiNLP/pythainlp/commit/314411086707b60ba8790724301224916f4670b8 Add SEFR CUT to pythainlp
  • #599 Add tltk (sentencetokenize and wordtokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Transliterate

  • #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
  • #585 Manually merge update-royin branch with dev branch to add O-ANG rule
  • #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Word Vector

  • #573 Fix tokenmaxlen bug that makes it always zero
  • #583 Add pythainlp.word_vector.WordVector

Spell

  • #591 Add more spelling engine
  • #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Generate

  • #579 Add pythainlp.generate

Other

  • #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

- Python
Published by wannaphong almost 5 years ago

pythainlp - PyThaiNLP v2.3.1 Release!

PyThaiNLP v2.3.1 is This release is a bug fix release of PyThaiNLP 2.3.

Bug Fixed - Fix gensim #546

Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

  • NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4') (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong about 5 years ago

pythainlp - PyThaiNLP v2.3.1-dev0

PyThaiNLP v2.3.1-dev0 is The development release of PyThaiNLP 2.3.1 (For development only)

Bug Fixed - Fix gensim #546

Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

- Python
Published by wannaphong about 5 years ago

pythainlp - PyThaiNLP v2.3.0 Release!

PyThaiNLP v2.3.0 is The production release of PyThaiNLP 2.3

Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

  • NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4') (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong about 5 years ago

pythainlp - PyThaiNLP v2.3.0-beta1

PyThaiNLP v2.3.0-beta1 is The first beta release of PyThaiNLP 2.3

Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

  • NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4') (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Links

  • Website: https://pythainlp.github.io
  • Docs: https://pythainlp.github.io/dev-docs/
  • GitHub: https://github.com/PyThaiNLP/pythainlp
  • Issues: https://github.com/PyThaiNLP/pythainlp/issues

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

- Python
Published by wannaphong about 5 years ago

pythainlp - PyThaiNLP v2.3.0-dev1

PyThaiNLP v2.3.0-dev1 is The development release of PyThaiNLP 2.3 (For development only)

Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

  • NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4') (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

- Python
Published by wannaphong about 5 years ago

pythainlp - v2.3.0-dev0

PyThaiNLP v2.3.0-dev0 is The first development release of PyThaiNLP 2.3 (For development only)

Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

  • NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4') (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

  • License change:
  • #449 Fix: remove instances with [ or ] from etcc.txt
  • #467 Add: corpus.common.provinces() can now return romanized names
  • #476 Add: thai_family_names() to get a set of Thai family names
  • #487 Fix: thailand_provinces_th.csv not found issue
  • #492 Fix: remove erroneous AITT tag from ORCHID to UD table -- thanks @c4n for the fix

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

- Python
Published by wannaphong about 5 years ago

pythainlp - PyThaiNLP 2.2.6

PyThaiNLP 2.2.6 Released! This release is a bug fix release. - Update pythainlp.tag docs #492 - thai_strftime: Normalize output for unsupported directive #490 - port pickle to json and add lst20 postag model to pythainlp.corpus #488

Thanks to the following contributors to 2.2.6: @c4n

Thanks to other contributors listed here: https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md

You can install or upgrade using pip install -U pythainlp

  • GitHub Releases: https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.6
  • Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
  • Tutorials: https://thainlp.org/pythainlp/tutorials/
  • GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP PyThaiNLP Team

- Python
Published by wannaphong over 5 years ago

pythainlp - PyThaiNLP 2.2.5

PyThaiNLP 2.2.5 Released! This release is a bug fix release. - Fix: not found file for pythainlp.corpus #486

https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.5

You can install or upgrade using pip install -U pythainlp Documentation: https://www.thainlp.org/pythainlp/docs/2.2/ Tutorials: https://thainlp.org/pythainlp/tutorials/ GitHub: https://github.com/PyThaiNLP/pythainlp We build Thai NLP PyThaiNLP Team

- Python
Published by wannaphong over 5 years ago

pythainlp - PyThaiNLP 2.2.4

  • #481 Fix: removerepeatvowels() bug that remove spaces between different vowels

- Python
Published by bact over 5 years ago

pythainlp - PyThaiNLP 2.2.3

This release is a bug fix release. - fix crfcut last segment not included if not predicted as end-of-sentence #459

Installation

  • You can install or upgrade using pip install -U pythainlp

More information

  • Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
  • Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
  • Tutorials: https://thainlp.org/pythainlp/tutorials/
  • GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

- Python
Published by wannaphong almost 6 years ago

pythainlp - PyThaiNLP 2.2.2

This release is a bug fix release.

Installation

  • You can install or upgrade using pip install -U pythainlp

More information

  • Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
  • Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
  • Tutorials: https://thainlp.org/pythainlp/tutorials/
  • GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

- Python
Published by bact almost 6 years ago

pythainlp - PyThaiNLP 2.2.1

This release is a bug fix release. - Fix %O modifier for thai_strftime() #441 - Fix db.json #442

Installation

  • You can install or upgrade using pip install -U pythainlp

More information

  • Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
  • Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
  • Tutorials: https://thainlp.org/pythainlp/tutorials/
  • GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

- Python
Published by wannaphong almost 6 years ago

pythainlp - PyThaiNLP 2.2.0

English

Hello World. Today, we're happy to announce the availability of PyThaiNLP. It has been four years since PyThaiNLP's the first release. Thank you very much for supporting PyThaiNLP.

Summary – Release Highlights

New Features

Tokenizer

  • Fix longest engine, last character is now consumed
  • Add CRFCut sentence segmentation

Transliteration

  • Add Thai Grapheme-to-Phoneme (Thai G2P) deep learning sequence-to-sequence model

Normalization

  • Add more normalize functions, like remove zero-width characters, remove duplicate spaces, etc.

Utilities

  • Add thaiwordtodate() and thaiwordtotime()
  • Fix countthai() to handle a case where the text has only numbers and symbols

Command line

Others

  • Code improvement: Move non-init code out of init.py files, etc.
  • Remove dependency: Unigram POS tagger no longer need NLTK module

Installation

You can install or upgrade using pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/330

Documentation: https://www.thainlp.org/pythainlp/docs/2.2/

Tutorials: https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

ภาษาไทย

สวัสดีชาวโลก วันนี้ 24 มิถุนายน 2563 พวกเราได้ปล่อย PyThaiNLP 2.2 ตอนนี้ PyThaiNLP อายุครบ 4 ปี ขอบคุณที่ใช้ PyThaiNLP :)

สรุป – สิ่งที่สำคัญ

คุณลักษณะใหม่

ตัวตัดข้อความ

  • แก้ไขตัวตัดคำ longest
  • เพิ่มตัวตัดประโยค CRFCut

ถอดเสียง

  • เพิ่มการถอดเสียงภาษาไทยเป็น IPA ด้วย Thai Grapheme-to-Phoneme (Thai G2P)

Normalization

  • เพิ่มเติมความสามารถให้กับฟังก์ชัน normalize เช่น ลบช่องว่างซ้ำกัน เป็นต้น

เครื่องมือ

  • เพิ่ม thaiwordtodate() และ thaiwordtotime()
  • ปรับปรุง countthai()

Command line

  • ปรับปรุงคำสั่ง command และไวยากรณ์ sub-command - ดูเพิ่มเติมได้ที่ command line docs

อื่น ๆ

  • ปรับปรุงโค้ด: ย้ายโค้ดออกจากไฟล์ init.py เป็นต้น
  • ลดความต้องการไลบรารีภายนอก: Unigram POS tagger สามารถทำงานได้โดยไม่ต้องการ NLTK

การติดตั้ง

สามารถติดตั้งหรือปรับรุ่นได้ด้วยคำสั่ง pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/330

Documentation: https://www.thainlp.org/pythainlp/docs/2.2/

Tutorials https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

พวกเราสร้าง Thai NLP

ทีม PyThaiNLP

- Python
Published by wannaphong almost 6 years ago

pythainlp - PyThaiNLP 2.2.0-beta1

This the first beta version of PyThaiNLP 2.2.

Installation pip install --pre pythainlp

PyThaiNLP 2.2 change log #330

Documentation : https://www.thainlp.org/pythainlp/docs/dev/

Report bug : https://github.com/PyThaiNLP/pythainlp/issues

We build Thai NLP.

PyThaiNLP Team

- Python
Published by wannaphong almost 6 years ago

pythainlp - PyThaiNLP 2.2.0-dev1

Dev version For developer only

PyThaiNLP 2.2 change log #330

Documentation : https://www.thainlp.org/pythainlp/docs/dev/

- Python
Published by wannaphong about 6 years ago

pythainlp - PyThaiNLP 2.2.0-dev0

Dev version For developer only

PyThaiNLP 2.2 change log #330

Documentation : https://www.thainlp.org/pythainlp/docs/dev/

- Python
Published by wannaphong about 6 years ago

pythainlp - PyThaiNLP 2.1.4

This release is a bug fix release.

  • Remove NumPy and pandas requirements from base install (#353)
  • Fix longest matching bug (fail when the entire input text is a full word) (#357)

- Python
Published by bact over 6 years ago

pythainlp - PyThaiNLP 2.1.3

This release is a bug fix release.

  • numtoword number to thai word (#350)

Installation

You can install or upgarde from pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/181

Documentation: https://www.thainlp.org/pythainlp/docs/2.1/

Tutorials: https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.2

This release is a bug fix release.

  • thainer html-like output: Fixed output of the html-like is incorrect. (#346)

Installation

You can install or upgarde from pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/181

Documentation: https://www.thainlp.org/pythainlp/docs/2.1/

Tutorials: https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.1

This release is a bug fix release.

  • newmm word tokenizer: Add graph size limit in _onecut() to avoid long wait for ambiguous text (#333)

Installation

You can install or upgarde from pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/181

Documentation: https://www.thainlp.org/pythainlp/docs/2.1/

Tutorials: https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1

English

Hello World. Today, we're happy to announce the availability of PyThaiNLP. Since the project moved to GitHub, we have recorded over 197,000 downloads -- thank you for using PyThaiNLP.

Summary – Release Highlights

New Features

Tokenizer

  • AttaCut, a fast and accurate tokenizer, is now available through engine="attacut" in pythainlp.tokenize.word_tokenize(). Read more about AttaCut implementation at https://arxiv.org/abs/1911.07056, as presented at New in ML Workshop, NeurIPS 2019.
  • ssg, a syllable segmentor, is now available through engine=”ssg” in pythainlp.tokenize.subword_tokenize()
  • Tokenization benchmark

Corpus

  • Add Thai female, male names corpus
  • Add PYTHAINLPDATADIR environment variable to set location of downloaded data

Named-Entity Tagger

  • Add HTML-like tag in output

Localization

  • New function: pythainlp.util.thai_time, time spell out to Thai words

Other improvements

  • Removing and updating many dependencies
  • Remove marisa-trie from pythainlp
  • Updated tutorial notebooks and documentation
  • Better command-line interface

Installation

You can install or upgarde from pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/181

Documentation: https://www.thainlp.org/pythainlp/docs/2.1/

Tutorials: https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP

PyThaiNLP Team

ภาษาไทย

สวัสดีชาวโลก วันนี้ 10 ธันวาคม 2562 พวกเราได้ปล่อย PyThaiNLP 2.1 ตอนนี้ PyThaiNLP มียอดดาวน์โหลดมากกว่า 197,000 ครั้ง ขอบคุณที่ใช้ PyThaiNLP

สรุป – สิ่งที่สำคัญ

คุณลักษณะใหม่

ตัวตัดข้อความ

  • เพิ่ม AttaCut ตัวตัดคำที่เร็วและแม่นยำ เรียกใช้ผ่าน engine="attacut" ใน pythainlp.tokenize.word_tokenize() อ่านวิธีการทำงานของ AttaCut ตามที่นำเสนอที่ New in ML Workshop, NeurIPS 2019 ได้ที่ https://arxiv.org/abs/1911.07056
  • เพิ่ม ssg ตัวตัดพยางค์แบบ CRF เรียกใช้ผ่าน engine="ssg" ใน pythainlp.tokenize.subword_tokenize()
  • ตัววัดประสิทธิภาพตัวตัดคำ

คลังข้อความ

  • เพิ่มคลังข้อมูลชื่อผู้หญิงและผู้ชาย
  • เพิ่ม PYTHAINLPDATADIR environment variable สำหรับตั้งค่าการโหลดข้อมูลโมเดล

ตัวหาชื่อ

  • เพิ่ม tag ทำนอง HTML ครอบข้อความที่มีชื่อ

การปรับเป็นท้องถิ่น

  • เพิ่ม pythainlp.util.thai_time สำหรับแปลงเวลาเป็นคำอ่านภาษาไทย

การปรับปรุงอื่นๆ

  • ลบและอัปเดตไลบรารีหลายอัน
  • ลบ marisa-trie จาก pythainlp
  • ปรับปรุง tutorial notebooks และเอกสาร
  • ปรับปรุง command-line interface

การติดตั้ง

สามารถติดตั้งหรือปรับรุ่นได้ด้วยคำสั่ง pip install -U pythainlp

Change log: https://github.com/PyThaiNLP/pythainlp/issues/181

Documentation: https://www.thainlp.org/pythainlp/docs/2.1/

Tutorials https://thainlp.org/pythainlp/tutorials/

GitHub: https://github.com/PyThaiNLP/pythainlp

พวกเราสร้าง Thai NLP

ทีม PyThaiNLP

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.dev8

ขอเชิญทุกท่านร่วมกันทดสอบ PyThaiNLP 2.1dev8 PyThaiNLP 2.1dev เป็นรุ่นสำหรับนักพัฒนาไว้ทดสอบ ก่อนปล่อยรุ่นจริงออกมา โดย PyThaiNLP 2.1 จะมีคุณสมบัติใหม่ดังนี้

ความสามารถใหม่ - เพิ่ม pythainlp.benchmarks สำหรับทดสอบการตัดคำภาษาไทย - เพิ่ม pythainlp.util.thai_time สำหรับใช้แปลงเวลาให้เป็นภาษาไทย เช่น 8:17 เป็น แปดนาฬิกาสิบเจ็ดนาที (24 ชั่วโมง) หรือ แปดโมงสิบเจ็ดนาที (6 ชั่วโมง)

การตัดคำ - เพิ่ม ssg เข้ามาเป็นส่วนหนึ่งในการตัดพยางค์ภาษาไทย - เพิ่มตัวตัดคำ attacut ซึ่งเป็นตัวตัดคำที่ใช้ deep learning ที่ถูกสร้างขึ้นเพื่อแก้ไขปัญหาด้านความเร็วตัดคำภาษาไทย - เพิ่ม "newmm-safe" เข้ามาเพื่อแก้ไขปัญหาเวลาเจอข้อความที่กำกวมหรือใช้เวลาตัดคำนานจนผิดปกติ เช่น "หน้าด้านหน้าด้านหน้าด้านหน้าด้านหน้าด้าน" - ปรับปรุงพจนานุกรมที่ใช้ในการตัดคำ

Model updated - thai2rom เวอร์ชั่นใหม่ใช้ PyTorch ทำงานแทน TF แถมกินแรมน้อยกว่าเดิมมาก - ThaiNER 1.3 รุ่นใหม่ล่าสุด (ThaiNER) HTML -> SGML พร้อมสามารถส่ง output ออกมาเป็นแท็ก html ได้แล้ว เช่น 'วันที่ 15 ก.ย. 61 ทดสอบระบบเวลา '

Refactoring - ลบ marisa-trie ออกจาก PyThaiNLP ต่อไปใช้ PyThaiNLP ไม่ต้องเจอกับปัญหาติดตั้ง PyThaiNLP แล้ว (@korakot เขียน Trie ใน Python) - ลบ fastai ออกจาก dependencies ที่ถูกใช้ใน pythainlp.ulmfit - ทำความสะอาดโค้ดและเพิ่มชุด Test โดยผ่าน coveralls กว่า 90% - เพิ่ม MD5 checksum ให้กับโมเดลที่โหลดผ่าน pythainlp - รองรับการเปลี่ยนตำแหน่งที่ตั้ง pythainlp-data ได้ง่าย ๆ โดยแก้ตัวแปร env var ชื่อ PYTHAINLPDATADIR ใส่ path ที่ต้องการ

ดูการเปลี่ยนแปลง PyThaiNLP 2.1 ได้ที่ https://github.com/PyThaiNLP/pythainlp/issues/181

สามารถทดลองโดยใช้คำสั่ง

pip install -U --pre pythainlp

ลิงก์ที่สำคัญ - เอกสาร API ศึกษาได้ที่ https://www.thainlp.org/pythainlp/docs/dev/ - เอกสารสอนการใช้งานศึกษาได้ที่ https://thainlp.org/pythainlp/tutorials/ - แจ้ง Bug, สอบถามข้อมูลเกี่ยวกับ PyThaiNLP และรายงานปัญหาได้ที่ https://github.com/PyThaiNLP/pythainlp/issues

ขอขอบคุณผู้ร่วมพัฒนาในเวอร์ชั่นนี้ https://github.com/PyThaiNLP/pythainlp/graphs/contributors

We build Thai NLP. PyThaiNLP

ThaiNLP #NLP #PyThaiNLP

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.dev7

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.dev6

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.dev5

  • Change from marisa-trie to a Trie implementation written in python

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.1.dev4

- Python
Published by wannaphong over 6 years ago

pythainlp - PyThaiNLP 2.0.7

PyThaiNLP 2.0.7 Release change log * Bug fix: Include case THANTHAKHAT and SARA U, UU too (pythainlp.util.normalize) https://github.com/PyThaiNLP/pythainlp/pull/244

Upgrade : pip install -U pythainlp Docs : https://thainlp.org/pythainlp/docs/2.0/ User guide: https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb

- Python
Published by wannaphong almost 7 years ago

pythainlp - PyThaiNLP 2.1.dev2

- Python
Published by wannaphong almost 7 years ago

pythainlp - PyThaiNLP 2.0.6

  • fixed #230
  • new train ThaiNER

- Python
Published by wannaphong almost 7 years ago

pythainlp - PyThaiNLP 2.0.5

  • Clean word lists in pythainlp.corpus (remove duplicates, etc.)
  • Fix/add return type hinting for functions in pythainlp.corpus
  • Fix deprecated inline flag for regular expression in pythainlp.corpus.tnc (Thai National Corpus)
  • Bug fix: reorder condition checks in pythainlp.tokenize.dict_trie so it catch Trie before Iterable

- Python
Published by bact about 7 years ago

pythainlp - PyThaiNLP 2.0.4

  • word_tokenize()'s argument whitespaces is now keep_whitespace to make is more explicit, default behavior is to keep whitespaces
  • word_tokenize() can now take a custom dictionary throught custom_dict parameter
    • dict_word_tokenize() will be deprecated soon

- Python
Published by bact about 7 years ago

pythainlp - PyThaiNLP 2.0.3

  • Fix TCC (Thai Textbook Corpus) corpus always downloading new file issue
  • Words and their frequencies from TTC (Thai Textbook Corpus) now has a local copy at ttc_freq.txt inside pythainlp.corpus.
  • Other refactoring and code improvements, including ones related to subword tokenization (Thai Character Cluster / TCC and ETCC), see #193

- Python
Published by bact about 7 years ago

pythainlp - PyThaiNLP 2.0.2

  • Fixed tree map
  • Subword tokeniser documentation improvement https://github.com/PyThaiNLP/pythainlp/pull/190

- Python
Published by wannaphong about 7 years ago

pythainlp - PyThaiNLP 2.0.1

  • Add Tokenizer from pythainlp.tokenize.Tokenizer 79432c2
  • NER fixes, code cleaning, and type hinting #186

- Python
Published by wannaphong about 7 years ago

pythainlp - PyThaiNLP 2.0

PyThaiNLP 2.0

Codacy Badgepypi Build Status Build status Coverage Status License

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see From PyThaiNLP 1.7 to PyThaiNLP 2.0

📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0

📫 follow us on Facebook Pythainlp

What's new in version 2.0 ?

  • New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
  • Terminate Python 2 support. Remove all Python 2 compatibility code.
  • Remove old, obsolated, deprecated, and experimental code.
  • Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
  • ThaiNER 1.0
  • Remove sentiment analysis
  • Improved wordtokenize (newmm, mm) and dictword_tokenize
  • Improved POS-tagging
  • More and improved examples
  • see PyThaiNLP 2.0 change log

    Links

  • User guide : English , ภาษาไทย

  • Docs: https://thainlp.org/pythainlp/docs/2.0/

  • GitHub: https://github.com/PyThaiNLP/pythainlp

  • Issues: https://github.com/PyThaiNLP/pythainlp/issues

Thank you for choosing us.

PyThaiNLP team

- Python
Published by wannaphong about 7 years ago

pythainlp - PyThaiNLP 2.0 Beta

PyThaiNLP is a Python package for text processing and linguistic analysis, similar to nltk but with focus on Thai language.

PyThaiNLP 2.0 Beta for beta testing PyThaiNLP 2.0.

What's new in PyThaiNLP 2.0 ?

  • Consolidate documentation files
  • Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
  • Remove Python 2 compatibility code
  • Remove temporary files, experiment files, and obsoleted files
  • Remove sentiment analysis
  • More consistent indentations in source code
  • Improved wordtokenize (newmm, mm) and dictword_tokenize
  • Improved POS-tagging
  • More and improved examples
  • Improved test coverages with more test case

More details https://github.com/PyThaiNLP/pythainlp/issues/118

Install

pip install https://github.com/PyThaiNLP/pythainlp/archive/2.0b.zip

Docs : https://thainlp.org/pythainlp/docs/2.0/index.html

Website : https://pythainlp.github.io/

GitHub : https://github.com/PyThaiNLP/pythainlp

Issues : https://github.com/PyThaiNLP/pythainlp/issues

Thank you for choosing us.

PyThaiNLP team

- Python
Published by wannaphong about 7 years ago

pythainlp - PyThaiNLP 1.7.4

  • Fixed #176
  • removed conllu from requirements.txt #175

- Python
Published by wannaphong about 7 years ago

pythainlp - PyThaiNLP 1.7.3

  • fixed import thai_syllable.txt

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7.2

  • fix sent_tokenize also split text by vertical line #166

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7.1

  • Remove duplicated codes , More meaningful exception message, report unknown engine name (@bact )
  • Move test folder , Fix Flake8 errors (@zkan )

and more

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7.0.1

  • remove import test in PyThaiNLP
  • update README.md

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7.0

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

What's new in PyThaiNLP 1.7 ?

  • Deprecate Python 2 support
  • Refactor pythainlp.tokenize.pyicu for readability
  • Add Thai NER model to pythainlp.ner
  • thai2vec v0.2 - larger vocab, benchmarking results on Wongnai dataset
  • Sentiment classifier based on ULMFit and various product review datasets
  • Add ULMFit utility to PyThaiNLP
  • Add Thai romanization model thai2rom
  • Retrain POS-tagging model
  • Improve word tokenize (newmm,mm) and dictwordtokenize
  • Documentation added

Install

pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7.0.zip

Docs : https://thainlp.org/pythainlp/docs/1.7/

GitHub : https://github.com/PyThaiNLP/pythainlp

Issues : https://github.com/PyThaiNLP/pythainlp/issues

Thank you for choosing us.

PyThaiNLP team

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7 Beta 1

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP 1.7 Beta 1 for beta testing PyThaiNLP 1.7

What's new in PyThaiNLP 1.7 ?

  • Deprecate Python 2 support
  • Refactor pythainlp.tokenize.pyicu for readability
  • Add Thai NER model to pythainlp.ner
  • thai2vec v0.2 - larger vocab, benchmarking results on Wongnai dataset
  • Sentiment classifier based on ULMFit and various product review datasets
  • Add ULMFit utility to PyThaiNLP
  • Add Thai romanization model thai2rom
  • Retrain POS-tagging model
  • Improve word tokenize (newmm,mm) and dictwordtokenize
  • Documentation added

Install

pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7b1.zip

Docs : https://thainlp.org/pythainlp/docs/1.7/ (in progress)

Website : https://thainlp.org/pythainlp/ (in progress)

GitHub : https://github.com/PyThaiNLP/pythainlp

Issues : https://github.com/PyThaiNLP/pythainlp/issues

Thank you for choosing us.

PyThaiNLP team

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7 Alpha 2

PyThaiNLP 1.7 Alpha 2 เป็นเวชั่นทดสอบสำหรับนักพัฒนา ไม่แนะนำให้นำไปใช้งานจริง

มีอะไรใหม่ใน PyThaiNLP 1.7

สรุปประเด็นสำคัญ

  • เพิ่ม pythainlp.ner เป็น NER สำหรับ PyThaiNLP
  • ยกเลิกการสนับสนุน Python 2.7 อย่างเป็นทางการ
  • เพิ่ม ULMFit utility เข้ามาใน PyThaiNLP
  • ปรับปรุงระบบตัดคำใหม่ ทั้ง newmm และ mm
  • thai2vec v0.2
  • sentiment analysis ตัวใหม่ทำงานด้วย Deep learning
  • เพิ่ม thai2rom เป็น Thai Romanization ทำด้วย Deep learning ในระดับตัวอักษร
  • Train Pos tag ใหม่เพิ่มเติมจากเดิม

การติดตั้ง

ใช้คำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7a2.zip

แจ้งข้อผิดพลาดหรือเสนอแนะนำได้ที่ https://github.com/PyThaiNLP/pythainlp/issues

- Python
Published by wannaphong over 7 years ago

pythainlp - PyThaiNLP 1.7 Alpha 1

PyThaiNLP 1.7 Alpha 1 เป็นเวชั่นทดสอบสำหรับนักพัฒนา ไม่แนะนำให้นำไปใช้งานจริง

มีอะไรใหม่ใน PyThaiNLP 1.7

สรุปประเด็นสำคัญ

  • ยกเลิกการสนับสนุน Python 2.7 อย่างเป็นทางการ
  • เพิ่ม ULMFit utility เข้ามาใน PyThaiNLP
  • ปรับปรุงระบบตัดคำใหม่ ทั้ง newmm และ mm
  • thai2vec v0.2
  • sentiment analysis ตัวใหม่ทำงานด้วย Deep learning
  • เพิ่ม thai2rom เป็น Thai Romanization ทำด้วย Deep learning ในระดับตัวอักษร
  • Train Pos tag ใหม่เพิ่มเติมจากเดิม

การติดตั้ง

ใช้คำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7a1.zip

แจ้งข้อผิดพลาดหรือเสนอแนะนำได้ที่ https://github.com/PyThaiNLP/pythainlp/issues

- Python
Published by wannaphong almost 8 years ago

pythainlp - PyThaiNLP 1.6.0.7

  • edit dropbox url for thai2vec

- Python
Published by wannaphong almost 8 years ago

pythainlp - PyThaiNLP 1.6.0.6

  • fixed #93

- Python
Published by wannaphong almost 8 years ago

pythainlp - PyThaiNLP 1.6.0.5

  • fix tcc rule https://github.com/PyThaiNLP/pythainlp/commit/729d32277e2cecd52fa237dcd97f9629d009d8a0

- Python
Published by wannaphong about 8 years ago

pythainlp - PyThaiNLP 1.6.0.4

  • fix url thai2vec

- Python
Published by wannaphong about 8 years ago

pythainlp - PyThaiNLP 1.6

มีอะไรใหม่ใน PyThaiNLP 1.6

  • ตัวตัดคำ newmm ถูกเขียนขึ้นใหม่โดยใช้หลัก Maximum Matching algorithm และ TCC เพื่อแก้ไขข้อผิดพลาดจากการตัดคำที่ไม่มีในฐานข้อมูลโดยคุณ @korakot และตัดคำได้รวดเร็วยิ่งขึ้น
  • เพิ่ม cutkum (https://github.com/pucktada/cutkum) เข้ามาเป็นส่วนหนึ่งของระบบตัดคำ
  • เพิ่ม syllable_tokenize ระบบตัดพยางค์ภาษาไทยโดยใช้ dict ในการตัดพยางค์
  • เพิ่ม dictwordtokenize สำหรับใช้เป็นฐานข้อมูลตัดคำได้ตามที่ต้องการ
  • pythainlp.romanization โดยใช้ royin ถูกเขียนขึ้นใหม่
  • pythainlp.sentiment ถูก Train ใหม่โดยใช้ตัวตัดคำ newmm ทำให้ได้ผลลัพธ์ที่แม่นยำขึ้นมากกว่าเดิม
  • เพิ่ม pythainlp.word_vector.thai2vec โดยสามารถนำ https://github.com/cstorm125/thai2vec ของคุณ @cstorm125 ไปใช้งานได้
  • เพิ่มระบบเก็บไฟล์ไว้ใน pythainlp-data สำหรับใช้เก็บข้อมูลต่าง ๆ ของ PyThaiNLP
  • ติดตั้งได้สะดวกยิ่งขึ้นด้วยการเขียนโค้ดทดแทน pyicu ทำให้ไม่จำเป็นต้องติดตั้ง pyicu อีกต่อไป

เอกสารการใช้งาน https://github.com/PyThaiNLP/pythainlp/blob/pythainlp1.6/docs/pythainlp-1-6-thai.md

แล้วติดตั้งได้ด้วยคำสั่ง pip install -U pythainlp

- Python
Published by wannaphong over 8 years ago

pythainlp - PyThaiNLP 1.6 Beta 1

PyThaiNLP 1.6 Beta 1 รุ่นทดสอบสำหรับนักพัฒนาและบุคคลทั่วไป เป็นรุ่นที่ API นิ่งแล้ว

มีอะไรใหม่ใน PyThaiNLP 1.6

  • ตัวตัดคำ newmm ถูกเขียนขึ้นใหม่โดยใช้หลัก Maximum Matching algorithm และ TCC เพื่อแก้ไขข้อผิดพลาดจากการตัดคำที่ไม่มีในฐานข้อมูลโดยคุณ @korakot และตัดคำได้รวดเร็วยิ่งขึ้น
  • เพิ่ม cutkum (https://github.com/pucktada/cutkum) เข้ามาเป็นส่วนหนึ่งของระบบตัดคำ
  • เพิ่ม syllable_tokenize ระบบตัดพยางค์ภาษาไทยโดยใช้ dict ในการตัดพยางค์
  • เพิ่ม dictwordtokenize สำหรับใช้เป็นฐานข้อมูลตัดคำได้ตามที่ต้องการ
  • pythainlp.romanization โดยใช้ royin ถูกเขียนขึ้นใหม่
  • pythainlp.sentiment ถูก Train ใหม่โดยใช้ตัวตัดคำ newmm ทำให้ได้ผลลัพธ์ที่แม่นยำขึ้นมากกว่าเดิม
  • เพิ่ม pythainlp.word_vector.thai2vec โดยสามารถนำ https://github.com/cstorm125/thai2vec ของคุณ @cstorm125 ไปใช้งานได้
  • เพิ่มระบบเก็บไฟล์ไว้ใน pythainlp-data สำหรับใช้เก็บข้อมูลต่าง ๆ ของ PyThaiNLP
  • ติดตั้งได้สะดวกยิ่งขึ้นด้วยการเขียนโค้ดทดแทน pyicu ทำให้ไม่จำเป็นต้องติดตั้ง pyicu อีกต่อไป

เอกสารการใช้งาน https://github.com/PyThaiNLP/pythainlp/blob/dev/docs/pythainlp-1-6-thai.md (กำลังปรับปรุง)

สามารถทดลองใช้งานได้ โดยลบ PyThaiNLP เวชั่นก่อนทิ้งด้วยคำสั่ง pip uninstall pythainlp

แล้วติดตั้งได้ด้วยคำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.6-beta-1.zip

หากท่านพบ Bug สามารถแจ้งได้ที่ https://www.facebook.com/pythainlp/ หรือหน้า https://github.com/PyThaiNLP/pythainlp/issues

ขอบคุณท่านที่ใช้ PyThaiNLP :)

ทีมนักพัฒนา PyThaiNLP

- Python
Published by wannaphong over 8 years ago

pythainlp - PyThaiNLP 1.6 Alpha 2

มีอะไรใหม่ ? - ปรับปรุงความเร็วในการตัดคำด้วย newmm ด้วยการเขียนโค้ดตัดคำใหม่โดยคุณ @korakot และปรับปรุงประสิทธิภาพในการตัดคำภาษาไทย https://github.com/PyThaiNLP/pythainlp/issues/65 - เพิ่ม pythainlp.word_vector.thai2vec โดยรวม thaivec ของคุณ @cstorm125 เข้ามาใน PyThaiNLP

ก่อนทดลองใช้งานให้ทำการลบ PyThaiNLP เวชั่นเก่าทิ้งด้วยคำสั่ง pip uninstall pythainlp

ติดตั้งได้ด้วยคำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.6a2.zip

- Python
Published by wannaphong over 8 years ago

pythainlp - PyThaiNLP 1.6 Alpha 1

PyThaiNLP 1.6 รุ่น alpha 1 (รุ่นสำหรับนักพัฒนาเท่านั้น)

มีอะไรใหม่

  • เพิ่มความเร็วในการตัดคำด้วยการ build model Trie ไว้
  • เพิ่มตัวตัดพยางค์ภาษาไทย
  • เพิ่ม API ให้ผู้ใช้งานโมดูลสามารถใช้พจนานุกรมของตัวเองในการตัดคำได้
  • เปลี่ยนจากตัวตัดคำ icu ค่าเริ่มต้นไปเป็น newmm
  • แก้ไขการตัดคำผิดโดยใช้ TCC (Thai Character Clusters) เข้ามาช่วยตัดคำด้วย

ทดลองได้ด้วยคำสั่ง

pip install https://github.com/PyThaiNLP/pythainlp/archive/1.6a1.zip

- Python
Published by wannaphong over 8 years ago

pythainlp - PyThaiNLP 1.5.2

  • fix stopwords

- Python
Published by wannaphong over 8 years ago