Recent Releases of pythainlp
pythainlp - PyThaiNLP v5.1.2 Released!
PyThaiNLP v5.1.2 is a bug fix release of PyThaiNLP v5.1.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.1
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.1 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/900.
What's Changed
- Update romanize docs and keep space #1110
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.1.1...v5.1.2
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong about 1 year ago
pythainlp - PyThaiNLP v5.1.1 Released!
PyThaiNLP v5.1.1 is a bug fix release of PyThaiNLP v5.1.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.1
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.1 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/900.
What's Changed
- PR Description: Refactor thaiconsonantsall to Use set in syllable.py #1087 by @allrob23
- ThaiTransliterator: Select 1D CPU int64 tensor device #1089 by @jkingd0n
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.1.0...v5.1.1
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong about 1 year ago
pythainlp - PyThaiNLP v5.1.0 Released!
We released PyThaiNLP v5.1.0! This version has increased features and fixed problems such as Thai Discourse Treebank (TDTB), Thai Solar Date converted to Thai Lunar Date, and others.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.1
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.1 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/900
What is new?
New features
- Add Thai Discourse Treebank postag #910
- Add Thai Universal Dependency Treebank postag #916
- Add Thai G2P v2 Grapheme-to-Phoneme model #923
- Add support for list of strings as input to sent_tokenize() #927
- Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console #969
- Add Thai Solar Date convert to Thai Lunar Date #998
- Add Thai pangram text #1045
- Add pythainlp.llm #1043
Bug fixes
- Fix collate() to consider tonemark in ordering #926
- Fix maiyamok() that expanding the wrong word #962
- Fix nlpo3.load_dict() that never print error msg when not success #979
Remove
- Remove clause_tokenize #1024
Deprecation and other API changes
- 5.1
pythainlp.util.is_native_thai, use insteadpythainlp.morpheme.is_native_thai
- 5.2
pythainlp.cls, use insteadpythainlp.classifypythainlp.corpus.thai_synonym, use insteadpythainlp.corpus.thai_synonymspythainlp.util.maiyamok, use insteadpythainlp.util.expand_maiyamok
Improve
- Add more Thailand political party to Thai dictionary https://github.com/PyThaiNLP/pythainlp/commit/2252dee57bd7be9503242fa734bf0abc48c5ddf1
- Fix inconsistency in newmm-safe engine by copilot #1063
- Update warn_deprecation to get deprecated and removal versions #1028
- Remove unnecessary enumerate in expand_maiyamok #1029
- Add SPDX FileType #1032
- Fix bug in Longest Matching tokenizer to preprocess spaces consistently #1062
- Add codemeta.json file to root directory #1053
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0...v5.1.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong over 1 year ago
pythainlp - PyThaiNLP v5.1.0-beta2
Schedule - First Beta release: 27 December 2024 - Production release: WIP
PyThaiNLP 5.1 Change Log #900
Docs: https://pythainlp.org/dev-docs/
What's Changed
- Add pythainlp.llm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1043
- Add How to cut a new release doc by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1051
- Update pandas requirement from ==1.4.* to ==2.2.* by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1041
- Bump sentence-transformers from 2.2.2 to 2.7.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1038
- Bump pyicu from 2.8 to 2.14 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1052
- Add pythainlp.lm.calculatengramcounts by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1054
- Fixed #1055 bug: Tone detector + syllable sound bug by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1056
- Fix inconsistency in newmm-safe engine by copilot by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1063
- Fix bug in Longest Matching tokenizer to preprocess spaces consistently by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1062
- [Ready] Reduce reload word tokenizer engine in word_tokenize by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/1064
- Add display cell tokenizer by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1058
- Add longest common subsequence algorithm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1059
- Bump transformers from 4.47.1 to 4.48.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1068
- Bump protobuf from 5.29.2 to 5.29.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1067
- Fix custom dict error for unsupported tokenization engines by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1066
- Add pythainlp.util.spelling by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1060
- Add misspell command to CLI by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1057
- Add codemeta.json file to root directory by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1069
- Bump epitran from 1.25.1 to 1.26.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1072
- Bump transformers from 4.48.0 to 4.48.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1071
- Bump transformers from 4.48.1 to 4.48.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1074
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.1.0-beta1...v5.1.0-beta2
- Python
Published by wannaphong over 1 year ago
pythainlp - PyThaiNLP v5.1.0-beta1
Schedule - First Beta release: 27 December 2024 - Production release: WIP
PyThaiNLP 5.1 Change Log #900
What's Changed
- Add Thai Universal Dependency Treebank postag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/916
- Add Thai Discourse Treebank postag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/910
- Update tone_detector() API description by @bact in https://github.com/PyThaiNLP/pythainlp/pull/919
- Add save and load for pythainlp.classify.param_free.GzipModel by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/908
- Add Thai G2P v2 Grapheme-to-Phoneme model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/923
- Bump transformers from 4.36.0 to 4.38.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/907
- Add preprocess function to split whitespace before
romanizeby @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/924 - Fix collate() to consider tonemark in ordering by @WTFPUn in https://github.com/PyThaiNLP/pythainlp/pull/926
- test: Add more cases too covered all possible Marttra by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/929
- Bump github/codeql-action from 2 to 3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/939
- Bump actions/setup-python from 4 to 5 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/940
- Bump peaceiris/actions-gh-pages from 3 to 4 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/937
- Bump conda-incubator/setup-miniconda from 2 to 3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/936
- Bump actions/stale from 6 to 9 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/938
- Add support for list of strings as input to sent_tokenize() by @ayaan-qadri in https://github.com/PyThaiNLP/pythainlp/pull/927
- Bump python-crfsuite from 0.9.9 to 0.9.11 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/943
- Tidy up workflow files by @bact in https://github.com/PyThaiNLP/pythainlp/pull/946
- Upgrade Python in CI to 3.10 by @bact in https://github.com/PyThaiNLP/pythainlp/pull/947
- Fix nltk.downloader warning by @bact in https://github.com/PyThaiNLP/pythainlp/pull/949
- Remove unused pytest by @bact in https://github.com/PyThaiNLP/pythainlp/pull/950
- Unify unit test workflow across OSes by @bact in https://github.com/PyThaiNLP/pythainlp/pull/951
- Specify a limited test suite by @bact in https://github.com/PyThaiNLP/pythainlp/pull/952
- Use common warn_deprecation by @bact in https://github.com/PyThaiNLP/pythainlp/pull/956
- Move sent_tokenize with default crfcut to testx by @bact in https://github.com/PyThaiNLP/pythainlp/pull/958
- Merge new sent_tokenize test to fix-954 by @bact in https://github.com/PyThaiNLP/pythainlp/pull/959
- Move more sent_tokenize test by @bact in https://github.com/PyThaiNLP/pythainlp/pull/960
- Move more sent_tokenize test by @bact in https://github.com/PyThaiNLP/pythainlp/pull/961
- Fix sent_tokenize(engine="whitespace") return value to be a list of string by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/957
- Fix maiyamok() that expanding the wrong word by @bact in https://github.com/PyThaiNLP/pythainlp/pull/962
- Add version to deprecation warnings by @bact in https://github.com/PyThaiNLP/pythainlp/pull/963
- Remove tests with Sonarcloud issue by @bact in https://github.com/PyThaiNLP/pythainlp/pull/964
- Add test_tools to test suite by @bact in https://github.com/PyThaiNLP/pythainlp/pull/965
- Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console by @bact in https://github.com/PyThaiNLP/pythainlp/pull/969
- Make CLI able to handle Unicode characters output on Windows console by @bact in https://github.com/PyThaiNLP/pythainlp/pull/968
- Split testtag and testxtag by @bact in https://github.com/PyThaiNLP/pythainlp/pull/970
- Add testtag to _init__ by @bact in https://github.com/PyThaiNLP/pythainlp/pull/971
- Add testcorpus to _init__ by @bact in https://github.com/PyThaiNLP/pythainlp/pull/972
- Add test coverage by @bact in https://github.com/PyThaiNLP/pythainlp/pull/974
- Add test_khavee to test suite by @bact in https://github.com/PyThaiNLP/pythainlp/pull/967
- Create CHANGELOG.md by @bact in https://github.com/PyThaiNLP/pythainlp/pull/975
- Add Compact Tests (testc) by @bact in https://github.com/PyThaiNLP/pythainlp/pull/976
- Add testc_tools (misspell) by @bact in https://github.com/PyThaiNLP/pythainlp/pull/977
- Fix warnings and types by @bact in https://github.com/PyThaiNLP/pythainlp/pull/978
- Fix nlpo3.load_dict() that never print error msg when not success by @bact in https://github.com/PyThaiNLP/pythainlp/pull/979
- Add tests.compact.transliterate (PyICU test) by @bact in https://github.com/PyThaiNLP/pythainlp/pull/980
- Add documentation about compact install option by @bact in https://github.com/PyThaiNLP/pythainlp/pull/981
- Bump symspellpy from 6.7.7 to 6.7.8 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/985
- Bump sentencepiece from 0.1.99 to 0.2.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/982
- Bump tensorflow from 2.13.1 to 2.18.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/988
- Bump bpemb from 0.3.4 to 0.3.6 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/989
- Add nlpo3 to compact install/test by @bact in https://github.com/PyThaiNLP/pythainlp/pull/987
- Bump h5py from 3.1.0 to 3.12.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/991
- Use "build" instead of setup.py + add "[cd build]" build trigger word by @bact in https://github.com/PyThaiNLP/pythainlp/pull/994
- Add Thai Solar Date convert to Thai Lunar Date by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/998
- Update requests requirement from ==2.31.* to ==2.32.* by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1003
- Bump gensim from 4.3.2 to 4.3.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1009
- Update numpy requirement from ==1.22.* to ==1.26.* by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1007
- Bump epitran from 1.9 to 1.25.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1006
- Bump astral-sh/ruff-action from 1 to 2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1010
- Bump spacy-thai from 0.7.1 to 0.7.8 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1014
- Bump fairseq from 0.10.2 to 0.12.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1013
- Bump transformers from 4.38.0 to 4.47.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1020
- Bump panphon from 0.20.0 to 0.21.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1022
- Remove clause_tokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1024
- Update warn_deprecation to get deprecated and removal versions by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1028
- Remove unnecessary enumerate in expand_maiyamok by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1029
- Add SPDX FileType by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1032
- Bump spylls from 0.1.5 to 0.1.7 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1035
- Bump emoji from 0.5.4 to 0.6.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1036
- Bump wtpsplit from 1.0.1 to 1.3.0 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1037
- Simplify calculatefyearfdev() by @bact in https://github.com/PyThaiNLP/pythainlp/pull/1031
- Bump sacremoses from 0.0.41 to 0.1.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1034
- Bump protobuf from 3.20.3 to 5.29.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1033
- Bump protobuf from 5.29.1 to 5.29.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1042
- Bump ufal-chu-liu-edmonds from 1.0.2 to 1.0.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1040
- Bump transformers from 4.47.0 to 4.47.1 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1039
- Bump astral-sh/ruff-action from 2 to 3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/1044
- Add Thai pangram text by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1045
- Fixed #1004 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1046
- PyThaiNLP v5.1.0-beta1 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/1047
New Contributors
- @WTFPUn made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/926
- @ayaan-qadri made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/927
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.5...v5.1.0-beta1
- Python
Published by wannaphong over 1 year ago
pythainlp - PyThaiNLP v5.0.5 Released!
PyThaiNLP v5.0.5 is a bug fix release of PyThaiNLP v5.0.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.
What's Changed
- Add clause_tokenize warnings #1026
- Fix maiyamok() (merge back from #962)
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.4...v5.0.5
- Python
Published by wannaphong over 1 year ago
pythainlp - PyThaiNLP v5.0.4 Released!
PyThaiNLP v5.0.4 is a bug fix release of PyThaiNLP v5.0.3.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.
What's Changed
- Fixed #914 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/917
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.3...v5.0.4
- Python
Published by wannaphong about 2 years ago
pythainlp - PyThaiNLP v5.0.3 Released!
PyThaiNLP v5.0.3 is a bug fix release of PyThaiNLP v5.0.2.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.
What's Changed
- Create .editorconfig by @bact in https://github.com/PyThaiNLP/pythainlp/pull/909
- Fix empty string ('') added (in some cases) when using wordtokenize with joinbroken_num=True by @S2P2 in https://github.com/PyThaiNLP/pythainlp/pull/912
New Contributors
- @S2P2 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/912
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.2...v5.0.3
- Python
Published by wannaphong about 2 years ago
pythainlp - PyThaiNLP v5.0.2 Released!
PyThaiNLP v5.0.2 is a bug fix release of PyThaiNLP v5.0.1.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.
What's Changed
- Update README and license header by @bact in https://github.com/PyThaiNLP/pythainlp/pull/902
- Updated crfcut.py by @varunkatiyar819 in https://github.com/PyThaiNLP/pythainlp/pull/905
New Contributors
- @varunkatiyar819 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/905
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.1...v5.0.2
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong about 2 years ago
pythainlp - PyThaiNLP v5.0.1 Released!
PyThaiNLP v5.0.1 is a bug fix release of PyThaiNLP v5.0.0.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.
What's Changed
- Fixed bug: ImportError pycrfsuite #901
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0...v5.0.1
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v5.0.0 Released!
We are excited to announce the latest release of PyThaiNLP - version 5.0! PyThaiNLP is a Python library for Thai natural language processing (NLP). We are welcome to release PyThaiNLP 5.0!
With PyThaiNLP 5.0, you can expect improved performance and accuracy for NLP tasks in Thai. We have also added new functions to make your NLP tasks even easier and more efficient.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: https://github.com/PyThaiNLP/pythainlp/issues/788.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5
- Move
pythainlp.util.is_native_thaitopythainlp.morpheme.is_native_thaihttps://github.com/PyThaiNLP/pythainlp/commit/524759ac1926fb9837bb9464f0a40cd984af2608
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coreffor Thai coreference resolution #802 - Add
wtpsplitto sentence segmentation & paragraph segmentation #804 and addparagraph_thresholdintoparagraph_tokenize()function #806 - Add word approximation to
pythainlp.soundex.sound#809 by @wannaphong - Add
pythainlp.wsdfor Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chatandWangChanGLMtopythainlp.generate#819 by @wannaphong - Add
pythainlp.clsa param-free classification model #821 by @c4n - Add
pythainlp.elentity linking #822 by @wannaphong - Add
pythainlp.ancientby @wannaphong in #833 - Add
pythainlp.util.rhymeby @wannaphong in #849 - Add
remove_trailing_repeat_consonantsby @konbraphat51 in #862 - Add
pythainlp.util.to_idnby @wannaphong in #875 - Add
pythainlp.corpus.find_synonymsby @wannaphong in #890 - Add
pythainlp.util.morseby @wannaphong in #891 - Add
pythainlp.morphemeby @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenizefuntion to split Thai text to a paragraph #804 - Add
paragraph_thresholdintoparagraph_tokenize()function #806 by @pavaris-pm in - Add 🪿 Han-solo by @wannaphong in #830
- Fix
newmmto better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanbertathaigrammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- Add function for pos tag with transformers by @MpolaarbearM in #857
- Update postagtransformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding#813 by @wannaphong - Add
pythainlp.util.spell_words#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.2...v5.0.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v5.0.0-beta1
Schedule - First Beta release: 5 February 2024 - Production release: 10 February 2024
See 5.0 Milestone.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5
- Move
pythainlp.util.is_native_thaitopythainlp.morpheme.is_native_thaihttps://github.com/PyThaiNLP/pythainlp/commit/524759ac1926fb9837bb9464f0a40cd984af2608
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coreffor Thai coreference resolution #802 - Add
wtpsplitto sentence segmentation & paragraph segmentation #804 and addparagraph_thresholdintoparagraph_tokenize()function #806 - Add word approximation to
pythainlp.soundex.sound#809 by @wannaphong - Add
pythainlp.wsdfor Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chatandWangChanGLMtopythainlp.generate#819 by @wannaphong - Add
pythainlp.clsa param-free classification model #821 by @c4n - Add
pythainlp.elentity linking #822 by @wannaphong - Add
pythainlp.ancientby @wannaphong in #833 - Add
pythainlp.util.rhymeby @wannaphong in #849 - Add:
remove_trailing_repeat_consonantsby @konbraphat51 in #862 - Add
pythainlp.util.to_idnby @wannaphong in #875 - Add
pythainlp.corpus.find_synonymsby @wannaphong in #890 - Add
pythainlp.util.morseby @wannaphong in #891 - Add
pythainlp.morphemeby @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenizefuntion to split Thai text to a paragraph #804 - Add
paragraph_thresholdintoparagraph_tokenize()function #806 by @pavaris-pm in - Add 🪿 Han-solo by @wannaphong in #830
- Fix
newmmto better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanbertathaigrammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- add function for pos tag with transformers by @MpolaarbearM in #857
- Update postagtransformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding#813 by @wannaphong - Add
pythainlp.util.spell_words#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v5.0.0-dev2
What's Changed
- Add pythainlp.morpheme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/896
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0-dev1...v5.0.0-dev2
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v5.0.0-dev1
What's Changed
- Add Thai word list from Volubilis dictionary by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/870
- Add Thai word list from Thai Wikipedia titles by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/869
- switch PyThaiNLP source code to SPDX license ID by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/876
- Add pythainlp.util.to_idn by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/875
- Update Git Actions versions by @bact in https://github.com/PyThaiNLP/pythainlp/pull/878
- Fix ruff args in workflow by @bact in https://github.com/PyThaiNLP/pythainlp/pull/880
- Revise ruff args in workflow by @bact in https://github.com/PyThaiNLP/pythainlp/pull/881
- Add Thai word list from ICU BreakIterator dictionary by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/879
- Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in https://github.com/PyThaiNLP/pythainlp/pull/882
- Fix coref return type and add fallback by @bact in https://github.com/PyThaiNLP/pythainlp/pull/883
- Fix wrong/incompatible types, code readability by @bact in https://github.com/PyThaiNLP/pythainlp/pull/884
- Bump protobuf from 3.20 to 3.20.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/885
- Add license info to /tests and README_TH.md by @bact in https://github.com/PyThaiNLP/pythainlp/pull/886
- Add PhayaThaiBERT engine with new features [WIP] by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/873
- phayathaibert, khavee, parse: Code clean up by @bact in https://github.com/PyThaiNLP/pythainlp/pull/889
- Add pythainlp.corpus.find_synonyms by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/890
- ruff: docstring-code-format = true by @bact in https://github.com/PyThaiNLP/pythainlp/pull/892
- Add pythainlp.util.morse by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/891
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.0-dev0...v5.0.0-dev1
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v5.0.0-dev0
What's Changed
- Add extra segmentation style for
paragraph_tokenizefunction by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/844 - Update code comments and clean up codes by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in https://github.com/PyThaiNLP/pythainlp/pull/850
- Fix ISO 11940 duplicate keys by @bact in https://github.com/PyThaiNLP/pythainlp/pull/851
- Add pythainlp.util.rhyme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/849
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/852
- Fix tests of khavee functions by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/854
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/856
- add function for pos tag with transformers by @MpolaarbearM in https://github.com/PyThaiNLP/pythainlp/pull/857
- Add: removetrailingrepeat_consonants() by @konbraphat51 in https://github.com/PyThaiNLP/pythainlp/pull/862
- Update
pos_tag_transformersfunction by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/865
New Contributors
- @Saharshjain78 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/850
- @konbraphat51 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/856
- @MpolaarbearM made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/857
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta5...v5.0.0-dev0
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v4.1.0-beta5
Docs: https://pythainlp.github.io/dev-docs/ Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Install: pip install --pre pythanlp
See 4.1 Milestone.
What's Changed
- Fix "List of possible extras" in README by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/839
- Add tzdata as a dependency on Windows by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/841
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta4...v4.1.0-beta5
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v4.1.0-beta4
Docs: https://pythainlp.github.io/dev-docs/ Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Install: pip install --pre pythanlp
See 4.1 Milestone.
What's Changed
- Fix incorrect passing of flags to re.split by @hauntsaninja in https://github.com/PyThaiNLP/pythainlp/pull/832
- Add pythainlp.ancient by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/833
- Add syllable_tokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/834
- Add wanchanbertathaigrammarly by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/836
New Contributors
- @hauntsaninja made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/832
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta3...v4.1.0-beta4
- Python
Published by wannaphong over 2 years ago
pythainlp - PyThaiNLP v4.1.0-beta3
What's Changed
- Add 🪿 Han-solo by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/830
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta2...v4.1.0-beta3
- Python
Published by wannaphong almost 3 years ago
pythainlp - PyThaiNLP v4.1.0-beta2
What is change? - Fixed bug #828. Thank you @tonezzz for reporting!
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.1.0-beta1...v4.1.0-beta2
- Python
Published by wannaphong almost 3 years ago
pythainlp - PyThaiNLP v4.1.0-beta1
Schedule - First Beta release: 24 July 2023
Docs: https://pythainlp.github.io/dev-docs/ Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Install: pip install --pre pythanlp
See 4.1 Milestone.
What is new?
Deprecation and other API changes
- https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5 Change the default NER to thainer-v2
New API
- Add pythainlp.coref: Add pythainlp.coref for support Thai Coreference resolution #802
- Add wtpsplit to sentence segmentation & paragraph segmentation #804 and add paragraphthreshold into paragraphtokenize function #806
- Add word approximation to pythainlp.soundex.sound by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/809
- Add pythainlp.wsd for Thai Word Sense Disambiguation by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/818
- Add pythainlp.chat and WangChanGLM to pythainlp.generate by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/819
- Add a param-free classification model (
pythainlp.cls) by @c4n in https://github.com/PyThaiNLP/pythainlp/pull/821 - Add pythainlp.el by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/822
- Add pythainlp.util.abbreviationtofull_text #826 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/826
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenizefuntion to split Thai text to a paragraph. #804 - add
paragraph_thresholdintoparagraph_tokenizefunction by @pavaris-pm in https://github.com/PyThaiNLP/pythainlp/pull/806
Translate
- Add small100 to pythainlp.translate by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/815
Corpus
- Add orst list by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/810
- Add thai_synonym #825 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/825
Util
- Add pythainlp.util.encoding by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/813
- Add pythainlp.util.spell_words by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/817
- Add pythainlp.util.abbreviationtofull_text #826 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/826
New Contributors
- @pavaris-pm made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/806
- @falukelo made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/824
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.0...v4.1.0-beta1
- Python
Published by wannaphong almost 3 years ago
pythainlp - PyThaiNLP v4.0.2 Released!
PyThaiNLP v4.0.2 is a bug fix release of PyThaiNLP v4.0.
Upgrade: pip install -U pythainlp
Documentation: https://pythainlp.github.io/docs/4.0
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's Changed
- fixed bug by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/798
- fig เอือน อวน by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/799
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.1...v4.0.2
Contributors
Thanks all the contributors. (Image made with contributors-img)
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
- Python
Published by wannaphong about 3 years ago
pythainlp - PyThaiNLP v4.0.1 Released!
PyThaiNLP v4.0.1 is a bug fix release of PyThaiNLP v4.0.
Upgrade: pip install -U pythainlp
Documentation: https://pythainlp.github.io/docs/4.0
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's Changed
- Fix mishandling Karun in Kavee Matra Checker by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/793
- adding tonemark removal to fix mattra checking by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/795
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.0...v4.0.1
Contributors
Thanks all the contributors. (Image made with contributors-img)
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
- Python
Published by wannaphong about 3 years ago
pythainlp - PyThaiNLP 4.0 Released!
PyThaiNLP published the first version is 0.0.4 to PyPI at 6 years ago, so PyThaiNLP 4.0 will have special codename. The codename for PyThaiNLP 4.0 is PyThaiNLP 4.0 (Real).
See 4.0 Milestone.
Documentation: https://pythainlp.github.io/docs/4.0
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
If you want to contribute to PyThaiNLP, you can read Contributing to PyThaiNLP.
What is new?
Deprecation and other API changes
- Delete all LST20 model #728
- https://github.com/PyThaiNLP/pythainlp/commit/947c7be9ce4199af58ecf042629dda4d752dbcd6 Change pythainlp.tools.misspell to pythainlp.tools.misspell.misspell
Improve
- Reduce import time #719
- Fix/broken numeric data format (#652) #723
Tokenizer
- Add blackboard cls #732
- Add
rule to TCC and Change TCC rule for newmm #741
Tag
- Add blackboard pos_tag #733
- Add ThaiNER 2.0 #781
Util
- Add pythainlp.util.countthaichars #748
- Add thaistrptime and convertyears #767
Transliterate
- Add Thai2Rom ONNX model #743
Khavee
- add khavee to pythainlp #777
- add aek/too checker function to khavee #779
Parse
- Add ud_goeswith #757
Corpus
- Add new science word #763
Full Changelog
- Improve: Reduce import time by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/719
- Create CITATION.cff by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/721
- Fix/broken numeric data format (#652) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/723
- Add blackboard pos_tag to cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/734
- Update perceptron.py by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/736
- Feature/integrate transliteration dictionary (#681) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/735
- Delete all LST20 model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/728
- Add blackboard cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/732
- Add blackboard pos_tag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/733
- Add style.css: extend docs page width by @LXZE in https://github.com/PyThaiNLP/pythainlp/pull/742
- Add
rule to TCC and Change TCC rule for newmm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/741 - Setup action to check for code formatting by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/746
- Add more test for TCC by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/747
- Add Thai2Rom ONNX model by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/743
- Add pythainlp.util.countthaichars by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/748
- Feature: keyword extraction with keybert and frequency ranking by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/751
- Add ud_goeswith by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/757
- Bump tensorflow from 2.7.2 to 2.9.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/758
- Add new science word by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/763
- Add thaistrptime and convertyears by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/767
- Fix typo in thaifullmonth_lists for February by @PhakphumV in https://github.com/PyThaiNLP/pythainlp/pull/770
- Add pythainlp.util.phoneme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/772
- Add remove tone ipa by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/776
- add khavee to pythainlp by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/777
- Add khavee docs tests by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/778
- add aek/too checker function to khavee by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/779
- Add Thai NER 2.0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/781
- Add Copyright to the header files by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/782
- Fixed some issues in Khavee. It's a problem with use อ by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/785
- PyThaiNLP 4.0 beta 1 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/786
- fix some bugs and add checkkarulahu function by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/787
- PyThaiNLP 4.0 Released! by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/789
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0...v4.0.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
New Contributors
- @LXZE made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/742
- @new5558 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/746
- @PhakphumV made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/770
- @kangkengkhadev made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/777
- @HRNPH made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/779
- Python
Published by wannaphong about 3 years ago
pythainlp - PyThaiNLP v4.0.0-beta1
This post will give you the change log for PyThaiNLP 4.0. PyThaiNLP published the first version is 0.0.4 to PyPI at 6 years ago, so PyThaiNLP 4.0 will have special codename. The codename for PyThaiNLP 4.0 is PyThaiNLP 4.0 (Real).
This release is the first beta release of PyThaiNLP 4.0.
Schedule - Beta release: 1 April 2023 - Production release: 14 April 2023
See 4.0 Milestone.
What is new?
Deprecation and other API changes
- Delete all LST20 model #728
- https://github.com/PyThaiNLP/pythainlp/commit/947c7be9ce4199af58ecf042629dda4d752dbcd6 Change pythainlp.tools.misspell to pythainlp.tools.misspell.misspell
Improve
- Reduce import time #719
- Fix/broken numeric data format (#652) #723
Tokenizer
- Add blackboard cls #732
- Add
rule to TCC and Change TCC rule for newmm #741
Tag
- Add blackboard pos_tag #733
- Add ThaiNER 2.0 #781
Util
- Add pythainlp.util.countthaichars #748
- Add thaistrptime and convertyears #767
Transliterate
- Add Thai2Rom ONNX model #743
Khavee
- add khavee to pythainlp #777
- add aek/too checker function to khavee #779
Parse
- Add ud_goeswith #757
Corpus
- Add new science word #763
What's Changed
- Improve: Reduce import time by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/719
- Create CITATION.cff by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/721
- Fix/broken numeric data format (#652) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/723
- Add blackboard pos_tag to cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/734
- Update perceptron.py by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/736
- Feature/integrate transliteration dictionary (#681) by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/735
- Delete all LST20 model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/728
- Add blackboard cls by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/732
- Add blackboard pos_tag by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/733
- Add style.css: extend docs page width by @LXZE in https://github.com/PyThaiNLP/pythainlp/pull/742
- Add
rule to TCC and Change TCC rule for newmm by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/741 - Setup action to check for code formatting by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/746
- Add more test for TCC by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/747
- Add Thai2Rom ONNX model by @new5558 in https://github.com/PyThaiNLP/pythainlp/pull/743
- Add pythainlp.util.countthaichars by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/748
- Feature: keyword extraction with keybert and frequency ranking by @noppayut in https://github.com/PyThaiNLP/pythainlp/pull/751
- Add ud_goeswith by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/757
- Bump tensorflow from 2.7.2 to 2.9.3 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/758
- Add new science word by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/763
- Add thaistrptime and convertyears by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/767
- Fix typo in thaifullmonth_lists for February by @PhakphumV in https://github.com/PyThaiNLP/pythainlp/pull/770
- Add pythainlp.util.phoneme by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/772
- Add remove tone ipa by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/776
- add khavee to pythainlp by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/777
- Add khavee docs tests by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/778
- add aek/too checker function to khavee by @HRNPH in https://github.com/PyThaiNLP/pythainlp/pull/779
- Add Thai NER 2.0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/781
- Add Copyright to the header files by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/782
- Fixed some issues in Khavee. It's a problem with use อ by @kangkengkhadev in https://github.com/PyThaiNLP/pythainlp/pull/785
- PyThaiNLP 4.0 beta 1 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/786
New Contributors
- @LXZE made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/742
- @new5558 made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/746
- @PhakphumV made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/770
- @kangkengkhadev made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/777
- @HRNPH made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/779
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0...v4.0.0-beta1
- Python
Published by wannaphong about 3 years ago
pythainlp - PyThaiNLP v3.1.1 Released!
PyThaiNLP v3.1.1 is the releases updates of PyThaiNLP v3.1.0.
What's Changed
pythainlp.tools.misspellchanged topythainlp.tools.misspell.misspell.- Add Reduce import time #719 to PyThaiNLP 3.1.1 #753
- Doc: Lst20 deprecation warning for 3.1.1 (#749) #752 (Thank you @noppayut)
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0...v3.1.1
You can install or upgrade by pip install pythainlp==3.1.1.
Documentation: https://pythainlp.github.io/docs/3.1
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.1.0 Released!
This is the release version for PyThaiNLP v3.1.0
You can install by pip install pythainlp==3.1.0.
Documentation: https://pythainlp.github.io/docs/3.1
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
What is new?
Deprecation and other API changes
687 Remove deprecated function
- pythainlp.wordvector; doesntmatch, getmodel, mostsimilarcosmul, sentencevectorizer, similarity. use WordVector class instead
- pythainlp.util.deletetone. use pythainlp.util.removetonemark instead
- Remove pythainlp.util.timetime. use pythainlp.util.timeto_thaiword instead
- pythainlp.tokenize.syllabletokenize. use pythainlp.tokenize.subwordtokenize instead
Dependency Parsing
- Now, PyThaiNLP support dependencyparsing 🎉 Add pythainlp.parse.dependencyparsing https://github.com/PyThaiNLP/pythainlp/pull/706
Name Entity Tagging
- #665 Add Thai-NNER
pythainlp.tag.NNER - #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by WangchanBERTa model.
Transliteration
- #659 Add ISO 11940 transliteration
- #660 Add Thai W2P v0.2
- #686 Add wunsen
- #694 Wunsen Mandarin and Japanese update
PyThaiNLP Corpus downloader
- #656 Add support zip/tar.gz to download corpus
Text normalization
- #673 Add a normalising rule for Lakkhangyao ๅ
Translate
- #674 add gpu option
Text summarize
- #679 Add mt5 cpe kmutt thai sentence sum
Util
- #682 Add live-dead syllable classification
- #684 Add live dead syllable classify
- #690 Add tone detector
Soundex
- #699 Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique
Other
- #689 map NG tag to PART
- #691 Remove TinyDB as a dependency
- #692 Fix notifications that newer versions of corpora are available
- Add warning about LST20 license
Contributors
New Contributors
- @chameleonTK made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/673
- @vikimark made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/674
- @BLKSerene made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/691
- @cakimpei made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/694
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.10...v3.1.0
All Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.0.10 Released!
PyThaiNLP v3.0.10 is This release is a bug fix release of PyThaiNLP v3.0.9.
Bug Fixed - Fixed Wrong tag mapping from lst20 to UD #711
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.9...v3.0.10
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.1.0-beta0
This is the beta version for PyThaiNLP v3.1.
You can install by pip install --pre pythainlp==3.1.0b0.
Documentation: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
What is new?
Deprecation and other API changes
687 Remove deprecated function
- pythainlp.wordvector; doesntmatch, getmodel, mostsimilarcosmul, sentencevectorizer, similarity. use WordVector class instead
- pythainlp.util.deletetone. use pythainlp.util.removetonemark instead
- Remove pythainlp.util.timetime. use pythainlp.util.timeto_thaiword instead
- pythainlp.tokenize.syllabletokenize. use pythainlp.tokenize.subwordtokenize instead
Dependency Parsing
- Now, PyThaiNLP support dependencyparsing 🎉 Add pythainlp.parse.dependencyparsing https://github.com/PyThaiNLP/pythainlp/pull/706
Name Entity Tagging
- #665 Add Thai-NNER
pythainlp.tag.NNER - #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by WangchanBERTa model.
Transliteration
- #659 Add ISO 11940 transliteration
- #660 Add Thai W2P v0.2
- #686 Add wunsen
- #694 Wunsen Mandarin and Japanese update
PyThaiNLP Corpus downloader
- #656 Add support zip/tar.gz to download corpus
Text normalization
- #673 Add a normalising rule for Lakkhangyao ๅ
Translate
- #674 add gpu option
Text summarize
- #679 Add mt5 cpe kmutt thai sentence sum
Util
- #682 Add live-dead syllable classification
- #684 Add live dead syllable classify
- #690 Add tone detector
Soundex
- #699 Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique
Other
- #689 map NG tag to PART
- #691 Remove TinyDB as a dependency
- #692 Fix notifications that newer versions of corpora are available
- Add warning about LST20 license
What's Changed
- Add more words from Royal Society by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/653
- Add support zip/tar.gz to download corpus by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/656
- Update from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/657
- Add ISO 11940 transliteration by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/659
- Add Thai W2P v0.2 and PyThaiNLP v3.0.6dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/660
- Add LST20NER onnx model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/658
- Add Thai-NNER by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/665
- Update dev base from 3.0 base by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/668
- PyThaiNLP 3.0.7 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/670
- Update dev branche from pythainlp-3.0 branche by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/672
- Normalise Lakkhangyao by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/673
- add gpu option by @vikimark in https://github.com/PyThaiNLP/pythainlp/pull/674
- Bump tensorflow from 2.5.3 to 2.6.4 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/677
- Bump tensorflow from 2.6.4 to 2.7.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/678
- Add mt5 cpe kmutt thai sentence sum by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/679
- Add live-dead syllable classification by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/682
- Fixed CI Bug by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/683
- Add live dead syllable classify by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/684
- Add wunsen by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/686
- Add ThaiSum sentence segmentor by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/688
- map NG tag to PART by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/689
- Add tone detector by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/690
- Remove deprecated function by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/687
- Remove TinyDB as a dependency by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/691
- Fix notifications that newer versions of corpora are available by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/692
- Start PyThaiNLP v3.1.0-dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/693
- Wunsen Mandarin and Japanese update by @cakimpei in https://github.com/PyThaiNLP/pythainlp/pull/694
- Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/699
- Fixed #700 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/701
- Update add-word_detokenize from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/703
- Add word_detokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/697
- Move model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/705
- Add pythainlp.parse.dependency_parsing by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/706
New Contributors
- @chameleonTK made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/673
- @vikimark made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/674
- @BLKSerene made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/691
- @cakimpei made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/694
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.9...v3.1.0-beta0
All Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.1.0-dev3
This is a development release for PyThaiNLP v3.1.
You can install by pip install --pre pythainlp==3.1.0.dev3.
Documentation: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
What's Changed
- Move model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/705
- Add pythainlp.parse.dependency_parsing by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/706
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0-dev2...v3.1.0-dev3
All Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.1.0-dev2
This is the development release for PyThaiNLP v3.1.
You can install by pip install --pre pythainlp==3.1.0.dev2.
Documentation: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
What's Changed
- Add Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/699
- Fixed #700 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/701
- Update add-word_detokenize from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/703
- Add word_detokenize by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/697
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0-dev1...v3.1.0-dev2
All Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.0.9 Released!
PyThaiNLP v3.0.9 is This release is a bug fix release of PyThaiNLP v3.0.8.
Bug Fixed - Fixed Thai w2p model version is 0.1 https://github.com/PyThaiNLP/pythainlp/commit/b1cddd934c9224e0f513b6ccb71021a8a3c51260
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 3 years ago
pythainlp - PyThaiNLP v3.1.0-dev1
This is the development release for PyThaiNLP v3.1.
You can install by pip install --pre pythainlp==3.1.0.dev1.
Documentation: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
What's Changed
- Wunsen Mandarin and Japanese update by @cakimpei in https://github.com/PyThaiNLP/pythainlp/pull/694
New Contributors
- @cakimpei made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/694
All Contributors
Thanks all the contributors. (Image made with contributors-img)
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.1.0-dev0...v3.1.0-dev1
- Python
Published by wannaphong almost 4 years ago
pythainlp - PyThaiNLP v3.1.0-dev0
This is the first development release for PyThaiNLP v3.1.
You can install by pip install --pre pythainlp==3.1.0.dev0.
Documentation: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See 3.1 Milestone.
What is new?
Deprecation and other API changes
687 Remove deprecated function
- pythainlp.wordvector; doesntmatch, getmodel, mostsimilarcosmul, sentencevectorizer, similarity. use WordVector class instead
- pythainlp.util.deletetone. use pythainlp.util.removetonemark instead
- Remove pythainlp.util.timetime. use pythainlp.util.timeto_thaiword instead
- pythainlp.tokenize.syllabletokenize. use pythainlp.tokenize.subwordtokenize instead
Name Entity Tagging
- #665 Add Thai-NNER
pythainlp.tag.NNER - #658 Add LST20NER onnx model. It is LST20NER model to onnx model from fine-turning by WangchanBERTa model.
Transliteration
- #659 Add ISO 11940 transliteration
- #660 Add Thai W2P v0.2
- #686 Add wunsen
PyThaiNLP Corpus downloader
- #656 Add support zip/tar.gz to download corpus
Text normalization
- #673 Add a normalising rule for Lakkhangyao ๅ
Translate
- #674 add gpu option
Text summarize
- #679 Add mt5 cpe kmutt thai sentence sum
Util
- #682 Add live-dead syllable classification
- #684 Add live dead syllable classify
- #690 Add tone detector
Other
- #689 map NG tag to PART
- #691 Remove TinyDB as a dependency
- #692 Fix notifications that newer versions of corpora are available
What's Changed
- Add more words from Royal Society by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/653
- Add support zip/tar.gz to download corpus by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/656
- Update from dev by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/657
- Add ISO 11940 transliteration by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/659
- Add Thai W2P v0.2 and PyThaiNLP v3.0.6dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/660
- Add LST20NER onnx model by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/658
- Add Thai-NNER by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/665
- Update dev base from 3.0 base by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/668
- PyThaiNLP 3.0.7 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/670
- Update dev branche from pythainlp-3.0 branche by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/672
- Normalise Lakkhangyao by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/673
- add gpu option by @vikimark in https://github.com/PyThaiNLP/pythainlp/pull/674
- Bump tensorflow from 2.5.3 to 2.6.4 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/677
- Bump tensorflow from 2.6.4 to 2.7.2 by @dependabot in https://github.com/PyThaiNLP/pythainlp/pull/678
- Add mt5 cpe kmutt thai sentence sum by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/679
- Add live-dead syllable classification by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/682
- Fixed CI Bug by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/683
- Add live dead syllable classify by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/684
- Add wunsen by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/686
- Add ThaiSum sentence segmentor by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/688
- map NG tag to PART by @chameleonTK in https://github.com/PyThaiNLP/pythainlp/pull/689
- Add tone detector by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/690
- Remove deprecated function by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/687
- Remove TinyDB as a dependency by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/691
- Fix notifications that newer versions of corpora are available by @BLKSerene in https://github.com/PyThaiNLP/pythainlp/pull/692
- Start PyThaiNLP v3.1.0-dev0 by @wannaphong in https://github.com/PyThaiNLP/pythainlp/pull/693
Contributors
New Contributors
- @chameleonTK made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/673
- @vikimark made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/674
- @BLKSerene made their first contribution in https://github.com/PyThaiNLP/pythainlp/pull/691
All Contributors
Thanks all the contributors. (Image made with contributors-img)
Full Changelog: https://github.com/PyThaiNLP/pythainlp/compare/v3.0.8...v3.1.0-dev0
- Python
Published by wannaphong almost 4 years ago
pythainlp - PyThaiNLP v3.0.8 Released!
PyThaiNLP v3.0.8 is This release is a bug fix release of PyThaiNLP 3.0.7.
Bug Fixed - Fixed nercut bug. https://github.com/PyThaiNLP/pythainlp/pull/671 Thank you @kmining for your bug report.
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong about 4 years ago
pythainlp - PyThaiNLP v3.0.7 Released!
PyThaiNLP v3.0.7 is This release is a bug fix release of PyThaiNLP 3.0.5.
Bug Fixed - Fixed nercut bug. https://github.com/PyThaiNLP/pythainlp/issues/666 Thank you @kmining for your bug report.
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change loghttps://github.com/PyThaiNLP/pythainlp/issues/545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong about 4 years ago
pythainlp - PyThaiNLP v3.0.6 Released!
PyThaiNLP v3.0.6 is This release is a bug fix release of PyThaiNLP 3.0.5.
Bug Fixed - Fixed nercut bug. #666 Thank you @kmining for your bug report.
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong about 4 years ago
pythainlp - PyThaiNLP v3.0.5 Released!
PyThaiNLP v3.0.5 is This release is a bug fix release of PyThaiNLP 3.0.4.
Bug Fixed - Fixed nercut bug. https://github.com/PyThaiNLP/pythainlp/commit/e9b89628c89dacc7b992dbe7c140e38f3ee52869
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.4 Released!
PyThaiNLP v3.0.4 is This release is a bug fix release of PyThaiNLP 3.0.3.
Bug Fixed - Remove pythainlp.tag.named_entity.ThaiNameTagger to fixed import pycrfsuite. https://github.com/PyThaiNLP/pythainlp/commit/cc628d8cde6d3ea83d22f0d582398a0fdcbe6d84
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.3 Released!
PyThaiNLP v3.0.3 is This release is a bug fix release of PyThaiNLP 3.0.2.
Bug Fixed - Fixed TypeError in pythainlp.spell.symspellpy https://github.com/PyThaiNLP/pythainlp/issues/650
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.2 Release!
PyThaiNLP v3.0.2 is This release is a bug fix release of PyThaiNLP 3.0.1.
Bug Fixed - Fixed some wrong code. from #645
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.1 Release!
PyThaiNLP v3.0.1 is This release is a bug fix release of PyThaiNLP 3.0.
Bug Fixed - Remove warning message in pythainlp.tag.thainer. Fixed #644 - Add PYTHAINLPREADMODE environment variable is config PyThaiNLP to read-only mode. Fixed #645
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
Contributors
Thanks all the contributors. (Image made with contributors-img)
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.0 Released!
After a long time of the development of PyThaiNLP 3.0, We released PyThaiNLP 3.0. PyThaiNLP 3.0 has many improvements and new features to help with Thai language processing tasks.
You can install by pip install pythainlp or upgrade by pip install -U pythainlp.
Documentation: https://pythainlp.github.io/docs/3.0/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log#545
If you want to contribute to PyThaiNLP, you can read Contributing to PyThaiNLP.
News
Since PyThaiNLP 3.0, We will end supporting PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.2.
We have updated the Thai word dictionary & rule for newmm. We recommend retraining your model if you use newmm for word tokenization in your model.
What is new?
Deprecation and other API changes
- Deprecated syllabletokenize. `syllabletokenize
is deprecated, usesubword_tokenize` instead pythainlp.tag.named_entity.ThaiNameTaggeris change topythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 3.1.
Augment
- Add Thai Text Augmentation
Corpus
- Fix lots of misspellings in the dictionary (words_th.txt)
- Add getcorpusdefaultdb and thainer 1.5 model. You can add corpus on `defaultdb.json`, and you don't load the last trainer model from the Internet.
Tag
- Add TLTK (postag and ner) - add TLTK wrapper to pythainlp functions ex ner, wordtokenize and more.
- Add NER class -
NERclass for Named-entity recognizer tasks.
Translate
- Add
pythainlp.translate.TranslateClass - Add Chinese-Thai Machine Translation
- Add Thai-French Machine Translation
Tokenization
- Tokenize repeating dots and commas from numbers
- Fix tokenmaxlen bug that makes it always zero
- Tokenize repeating dots and commas from numbers (fix #461)
- Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- Add SEFR CUT to pythainlp
- Add TLTK (sentencetokenize and wordtokenize) - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
- Add nlpo3
Transliterate
- Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- Manually merge update-royin branch with dev branch to add O-ANG rule
- Add TLTK (g2p and ipa) - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
- Add pythainlp.transliterate.puan
Word Vector
- Fix tokenmaxlen bug that makes it always zero
- Add
pythainlp.word_vector.WordVector
Spell
- Add more spelling engine
- Add TLTK (spell) - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
Generate
- Add pythainlp.generate to generate a text.
Tool
- Add misspell module
Other
- Add TLTK - add TLTK wrapper to pythainlp functions ex ner, word_tokenize, and more.
- Update requirements from ssg 0.0.6 to ssg 0.0.8
- Spoonerism: Add supports words more three syllables
- Add maiyamok; This function is preprocessing MaiYaMok in a Thai sentence.
Contributors
Thanks all the contributors. (Image made with contributors-img)
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
This year is the 6th year's PyThaiNLP, and PyThaiNLP has more than one million downloads. I started to develop PyThaiNLP to help me do Thai language processing tasks. Now, PyThaiNLP has been used in many research and works worldwide. PyThaiNLP can't be grown if it doesn't have contributors, sponsors, and users.
Thank you for all supporting.
Thank you for using PyThaiNLP.
Wannaphong Phatthiyaphaibun
PyThaiNLP Founder
27 January 2022
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.0-beta0
PyThaiNLP 3.0 have many improvement and new features to help you in Thai language processing tasks. This release is PyThaiNLP v3.0.0-beta0. It is The first beta release of PyThaiNLP 3.0
You can install by pip install pythainlp==3.0.0b0.
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 3.0 change log #545
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
News
Since PyThaiNLP 3.0, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.2. We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
What is new?
Deprecation and other API changes
- Deprecated syllabletokenize. `syllabletokenize
is deprecated, usesubword_tokenize` instead pythainlp.tag.named_entity.ThaiNameTaggeris change topythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 3.1.
Augment
- Add Thai Text Augmentation
Corpus
- Fix lots of misspellings in dictionary (words_th.txt)
- Add getcorpusdefaultdb and thainer 1.5 model. Now, You can add corpus on `defaultdb.json` and you dont load last thainer model from Internet.
Tag
- Add tltk (postag and ner) - add tltk wrapper to pythainlp functions ex ner, wordtokenize and more.
- Add NER class -
NERclass for Named-entity recognizer tasks.
Translate
- Add
pythainlp.translate.TranslateClass - Add Chinese-Thai Machine Translation
Tokenization
- Tokenize repeating dots and commas from numbers
- Fix tokenmaxlen bug that makes it always zero
- Tokenize repeating dots and commas from numbers (fix #461)
- Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- Add SEFR CUT to pythainlp
- Add tltk (sentencetokenize and wordtokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add nlpo3
Transliterate
- Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- Manually merge update-royin branch with dev branch to add O-ANG rule
- Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add pythainlp.transliterate.puan
Word Vector
- Fix tokenmaxlen bug that makes it always zero
- Add
pythainlp.word_vector.WordVector
Spell
- Add more spelling engine
- Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Generate
- Add pythainlp.generate
Tool
- Add misspell module
Other
- Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Update requirements from ssg 0.0.6 to ssg 0.0.8
- Spoonerism: Add supports words more 3 syllables
- Add maiyamok; This function is preprocessing MaiYaMok in Thai sentence.
Contributors
Thanks all the contributors. (Image made with contributors-img)
If you want to contributing to PyThaiNLP, you can read Contributing to PyThaiNLP.
PyThaiNLP #ThaiNLP
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v3.0.0-dev0
PyThaiNLP v3.0.0-dev0 is The first development release of PyThaiNLP 3.0 (For development only)
Docs: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues GitHub: https://github.com/PyThaiNLP/pythainlp
News
Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1 We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
What is new?
Deprecation and other API changes
- #550 Deprecated syllabletokenize. `syllabletokenize
is deprecated, usesubword_tokenize` instead - https://github.com/PyThaiNLP/pythainlp/commit/701fb3a7842b3abd0b2318ba9074f1902c2f32e9
pythainlp.tag.named_entity.ThaiNameTaggeris change topythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 2.5.
Augment
- #580 Add Thai Text Augmentation
Corpus
- #557 Fix lots of misspellings in dictionary (words_th.txt)
- #576 Add getcorpusdefaultdb and thainer 1.5 model. Now, You can add corpus on `defaultdb.json` and you dont load last thainer model from Internet.
Tag
- #599 Add tltk (postag and ner) - add tltk wrapper to pythainlp functions ex ner, wordtokenize and more.
- #600 Add NER class -
NERclass for Named-entity recognizer tasks.
Translate
- #589 Add
pythainlp.translate.TranslateClass - #588 Add Chinese-Thai Machine Translation
Tokenization
- #562 Tokenize repeating dots and commas from numbers
- #585 Fix tokenmaxlen bug that makes it always zero
- #562 Tokenize repeating dots and commas from numbers (fix #461)
- #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- https://github.com/PyThaiNLP/pythainlp/commit/314411086707b60ba8790724301224916f4670b8 Add SEFR CUT to pythainlp
- #599 Add tltk (sentencetokenize and wordtokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #622 Add nlpo3
Transliterate
- #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- #585 Manually merge update-royin branch with dev branch to add O-ANG rule
- #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #624 Add pythainlp.transliterate.puan
Word Vector
- #573 Fix tokenmaxlen bug that makes it always zero
- #583 Add
pythainlp.word_vector.WordVector
Spell
- #591 Add more spelling engine
- #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Generate
- #579 Add pythainlp.generate
Tool
- #614 Add misspell module
Other
- #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- https://github.com/PyThaiNLP/pythainlp/commit/e357cf8f9b626e3a633dc33b8557fe45dc837aba Update requirements from ssg 0.0.6 to ssg 0.0.8
- Spoonerism: Add supports words more 3 syllables #631
- Add maiyamok #623 This function is preprocessing MaiYaMok in Thai sentence.
- Python
Published by wannaphong over 4 years ago
pythainlp - PyThaiNLP v2.3.2 Release!
PyThaiNLP v2.3.2 is This release is a bug fix release of PyThaiNLP 2.3.
Bug Fixed - Fixed clause_tokenize returns an empty list. #609
Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
See PyThaiNLP 2.3 change log #445
- Python
Published by wannaphong almost 5 years ago
pythainlp - PyThaiNLP v2.4.0-dev0
PyThaiNLP v2.4.0-dev0 is The first development release of PyThaiNLP 2.4 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.4 change log #545
News
Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1 We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
Deprecation and other API changes
- #550 Deprecated syllabletokenize. `syllabletokenize
is deprecated, usesubword_tokenize` instead - https://github.com/PyThaiNLP/pythainlp/commit/701fb3a7842b3abd0b2318ba9074f1902c2f32e9
pythainlp.tag.named_entity.ThaiNameTaggeris change topythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 2.5.
Augment
- #580 Add Thai Text Augmentation
Corpus
- #557 Fix lots of misspellings in dictionary (words_th.txt)
- #576 Add getcorpusdefaultdb and thainer 1.5 model. Now, You can add corpus on `defaultdb.json` and you dont load last thainer model from Internet.
Tag
- #599 Add tltk (postag and ner) - add tltk wrapper to pythainlp functions ex ner, wordtokenize and more.
- #600 Add NER class -
NERclass for Named-entity recognizer tasks.
Translate
- #589 Add
pythainlp.translate.TranslateClass - #588 Add Chinese-Thai Machine Translation
Tokenization
- #562 Tokenize repeating dots and commas from numbers
- #585 Fix tokenmaxlen bug that makes it always zero
- #562 Tokenize repeating dots and commas from numbers (fix #461)
- #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- https://github.com/PyThaiNLP/pythainlp/commit/314411086707b60ba8790724301224916f4670b8 Add SEFR CUT to pythainlp
- #599 Add tltk (sentencetokenize and wordtokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Transliterate
- #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- #585 Manually merge update-royin branch with dev branch to add O-ANG rule
- #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Word Vector
- #573 Fix tokenmaxlen bug that makes it always zero
- #583 Add
pythainlp.word_vector.WordVector
Spell
- #591 Add more spelling engine
- #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Generate
- #579 Add pythainlp.generate
Other
- #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Python
Published by wannaphong almost 5 years ago
pythainlp - PyThaiNLP v2.3.1 Release!
PyThaiNLP v2.3.1 is This release is a bug fix release of PyThaiNLP 2.3.
Bug Fixed - Fix gensim #546
Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize() - #502 Add:
corpus.util.revise_wordset()to revise tokenization dictionary - #503 Add:
NERCuttokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[or]from etcc.txt - #467 Add:
corpus.common.provinces()can now return romanized names - #476 Add:
thai_family_names()to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csvnot found issue - #492 Fix: remove erroneous
AITTtag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20language model for part-of-speech tagging - #468 Add: port
PerceptronTaggerfrom NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
- #526 Update ThaiNER 1.4 to ThaiNER 1.5
- #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
Transliterate
- #485 Fixed Romanize failed in some examples
- #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()bug that remove spaces between different vowels - #483 Add: add
remove()method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong about 5 years ago
pythainlp - PyThaiNLP v2.3.1-dev0
PyThaiNLP v2.3.1-dev0 is The development release of PyThaiNLP 2.3.1 (For development only)
Bug Fixed - Fix gensim #546
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
- Python
Published by wannaphong about 5 years ago
pythainlp - PyThaiNLP v2.3.0 Release!
PyThaiNLP v2.3.0 is The production release of PyThaiNLP 2.3
Documentation: https://pythainlp.github.io/docs/2.3/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize() - #502 Add:
corpus.util.revise_wordset()to revise tokenization dictionary - #503 Add:
NERCuttokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[or]from etcc.txt - #467 Add:
corpus.common.provinces()can now return romanized names - #476 Add:
thai_family_names()to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csvnot found issue - #492 Fix: remove erroneous
AITTtag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20language model for part-of-speech tagging - #468 Add: port
PerceptronTaggerfrom NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
- #526 Update ThaiNER 1.4 to ThaiNER 1.5
- #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
Transliterate
- #485 Fixed Romanize failed in some examples
- #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()bug that remove spaces between different vowels - #483 Add: add
remove()method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong about 5 years ago
pythainlp - PyThaiNLP v2.3.0-beta1
PyThaiNLP v2.3.0-beta1 is The first beta release of PyThaiNLP 2.3
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize() - #502 Add:
corpus.util.revise_wordset()to revise tokenization dictionary - #503 Add:
NERCuttokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[or]from etcc.txt - #467 Add:
corpus.common.provinces()can now return romanized names - #476 Add:
thai_family_names()to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csvnot found issue - #492 Fix: remove erroneous
AITTtag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20language model for part-of-speech tagging - #468 Add: port
PerceptronTaggerfrom NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
- #526 Update ThaiNER 1.4 to ThaiNER 1.5
- #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
Transliterate
- #485 Fixed Romanize failed in some examples
- #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()bug that remove spaces between different vowels - #483 Add: add
remove()method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Links
- Website: https://pythainlp.github.io
- Docs: https://pythainlp.github.io/dev-docs/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
- Python
Published by wannaphong about 5 years ago
pythainlp - PyThaiNLP v2.3.0-dev1
PyThaiNLP v2.3.0-dev1 is The development release of PyThaiNLP 2.3 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize() - #502 Add:
corpus.util.revise_wordset()to revise tokenization dictionary - #503 Add:
NERCuttokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[or]from etcc.txt - #467 Add:
corpus.common.provinces()can now return romanized names - #476 Add:
thai_family_names()to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csvnot found issue - #492 Fix: remove erroneous
AITTtag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20language model for part-of-speech tagging - #468 Add: port
PerceptronTaggerfrom NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
- #526 Update ThaiNER 1.4 to ThaiNER 1.5
- #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
Transliterate
- #485 Fixed Romanize failed in some examples
- #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()bug that remove spaces between different vowels - #483 Add: add
remove()method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
- Python
Published by wannaphong about 5 years ago
pythainlp - v2.3.0-dev0
PyThaiNLP v2.3.0-dev0 is The first development release of PyThaiNLP 2.3 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize() - #502 Add:
corpus.util.revise_wordset()to revise tokenization dictionary - #503 Add:
NERCuttokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[or]from etcc.txt - #467 Add:
corpus.common.provinces()can now return romanized names - #476 Add:
thai_family_names()to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csvnot found issue - #492 Fix: remove erroneous
AITTtag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20language model for part-of-speech tagging - #468 Add: port
PerceptronTaggerfrom NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
- #526 Update ThaiNER 1.4 to ThaiNER 1.5
- #538 Add ThaiNameTagger version and add ThaiNER 1.4 support
Transliterate
- #485 Fixed Romanize failed in some examples
- #511 Add Thai W2P (Thai Word-to-Phoneme converter)
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()bug that remove spaces between different vowels - #483 Add: add
remove()method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
- Python
Published by wannaphong about 5 years ago
pythainlp - PyThaiNLP 2.2.6
PyThaiNLP 2.2.6 Released!
This release is a bug fix release.
- Update pythainlp.tag docs #492
- thai_strftime: Normalize output for unsupported directive #490
- port pickle to json and add lst20 postag model to pythainlp.corpus #488
Thanks to the following contributors to 2.2.6: @c4n
Thanks to other contributors listed here: https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md
You can install or upgrade using pip install -U pythainlp
- GitHub Releases: https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.6
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP PyThaiNLP Team
- Python
Published by wannaphong over 5 years ago
pythainlp - PyThaiNLP 2.2.5
PyThaiNLP 2.2.5 Released! This release is a bug fix release. - Fix: not found file for pythainlp.corpus #486
https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.5
You can install or upgrade using pip install -U pythainlp Documentation: https://www.thainlp.org/pythainlp/docs/2.2/ Tutorials: https://thainlp.org/pythainlp/tutorials/ GitHub: https://github.com/PyThaiNLP/pythainlp We build Thai NLP PyThaiNLP Team
- Python
Published by wannaphong over 5 years ago
pythainlp - PyThaiNLP 2.2.4
- #481 Fix: removerepeatvowels() bug that remove spaces between different vowels
- Python
Published by bact over 5 years ago
pythainlp - PyThaiNLP 2.2.3
This release is a bug fix release. - fix crfcut last segment not included if not predicted as end-of-sentence #459
Installation
- You can install or upgrade using
pip install -U pythainlp
More information
- Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
- Python
Published by wannaphong almost 6 years ago
pythainlp - PyThaiNLP 2.2.2
This release is a bug fix release.
- Remove entries with
[or]frometcc.txt#449 - Update license information:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- For more information about corpora and models created by PyThaiNLP project, see PyThaiNLP Corpus.
- For other corpora and models that may included with PyThaiNLP distribution, please advise Corpus License.
Installation
- You can install or upgrade using
pip install -U pythainlp
More information
- Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
- Python
Published by bact almost 6 years ago
pythainlp - PyThaiNLP 2.2.1
This release is a bug fix release. - Fix %O modifier for thai_strftime() #441 - Fix db.json #442
Installation
- You can install or upgrade using
pip install -U pythainlp
More information
- Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
- Python
Published by wannaphong almost 6 years ago
pythainlp - PyThaiNLP 2.2.0
English
Hello World. Today, we're happy to announce the availability of PyThaiNLP. It has been four years since PyThaiNLP's the first release. Thank you very much for supporting PyThaiNLP.
Summary – Release Highlights
New Features
Tokenizer
- Fix longest engine, last character is now consumed
- Add CRFCut sentence segmentation
Transliteration
- Add Thai Grapheme-to-Phoneme (Thai G2P) deep learning sequence-to-sequence model
Normalization
- Add more normalize functions, like remove zero-width characters, remove duplicate spaces, etc.
Utilities
- Add thaiwordtodate() and thaiwordtotime()
- Fix countthai() to handle a case where the text has only numbers and symbols
Command line
- Update command and sub-command syntax - see command line docs
Others
- Code improvement: Move non-init code out of init.py files, etc.
- Remove dependency: Unigram POS tagger no longer need NLTK module
Installation
You can install or upgrade using pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
ภาษาไทย
สวัสดีชาวโลก วันนี้ 24 มิถุนายน 2563 พวกเราได้ปล่อย PyThaiNLP 2.2 ตอนนี้ PyThaiNLP อายุครบ 4 ปี ขอบคุณที่ใช้ PyThaiNLP :)
สรุป – สิ่งที่สำคัญ
คุณลักษณะใหม่
ตัวตัดข้อความ
- แก้ไขตัวตัดคำ longest
- เพิ่มตัวตัดประโยค CRFCut
ถอดเสียง
- เพิ่มการถอดเสียงภาษาไทยเป็น IPA ด้วย Thai Grapheme-to-Phoneme (Thai G2P)
Normalization
- เพิ่มเติมความสามารถให้กับฟังก์ชัน normalize เช่น ลบช่องว่างซ้ำกัน เป็นต้น
เครื่องมือ
- เพิ่ม thaiwordtodate() และ thaiwordtotime()
- ปรับปรุง countthai()
Command line
- ปรับปรุงคำสั่ง command และไวยากรณ์ sub-command - ดูเพิ่มเติมได้ที่ command line docs
อื่น ๆ
- ปรับปรุงโค้ด: ย้ายโค้ดออกจากไฟล์ init.py เป็นต้น
- ลดความต้องการไลบรารีภายนอก: Unigram POS tagger สามารถทำงานได้โดยไม่ต้องการ NLTK
การติดตั้ง
สามารถติดตั้งหรือปรับรุ่นได้ด้วยคำสั่ง pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/330
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
พวกเราสร้าง Thai NLP
ทีม PyThaiNLP
- Python
Published by wannaphong almost 6 years ago
pythainlp - PyThaiNLP 2.2.0-beta1
This the first beta version of PyThaiNLP 2.2.
Installation
pip install --pre pythainlp
PyThaiNLP 2.2 change log #330
Documentation : https://www.thainlp.org/pythainlp/docs/dev/
Report bug : https://github.com/PyThaiNLP/pythainlp/issues
We build Thai NLP.
PyThaiNLP Team
- Python
Published by wannaphong almost 6 years ago
pythainlp - PyThaiNLP 2.2.0-dev1
Dev version For developer only
PyThaiNLP 2.2 change log #330
Documentation : https://www.thainlp.org/pythainlp/docs/dev/
- Python
Published by wannaphong about 6 years ago
pythainlp - PyThaiNLP 2.2.0-dev0
Dev version For developer only
PyThaiNLP 2.2 change log #330
Documentation : https://www.thainlp.org/pythainlp/docs/dev/
- Python
Published by wannaphong about 6 years ago
pythainlp - PyThaiNLP 2.1.4
This release is a bug fix release.
- Remove NumPy and pandas requirements from base install (#353)
- Fix longest matching bug (fail when the entire input text is a full word) (#357)
- Python
Published by bact over 6 years ago
pythainlp - PyThaiNLP 2.1.3
This release is a bug fix release.
numtowordnumber to thai word (#350)
Installation
You can install or upgarde from pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/181
Documentation: https://www.thainlp.org/pythainlp/docs/2.1/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
- Python
Published by wannaphong over 6 years ago
pythainlp - PyThaiNLP 2.1.2
This release is a bug fix release.
thainerhtml-like output: Fixed output of the html-like is incorrect. (#346)
Installation
You can install or upgarde from pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/181
Documentation: https://www.thainlp.org/pythainlp/docs/2.1/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
- Python
Published by wannaphong over 6 years ago
pythainlp - PyThaiNLP 2.1.1
This release is a bug fix release.
newmmword tokenizer: Add graph size limit in_onecut()to avoid long wait for ambiguous text (#333)
Installation
You can install or upgarde from pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/181
Documentation: https://www.thainlp.org/pythainlp/docs/2.1/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
- Python
Published by wannaphong over 6 years ago
pythainlp - PyThaiNLP 2.1
English
Hello World. Today, we're happy to announce the availability of PyThaiNLP. Since the project moved to GitHub, we have recorded over 197,000 downloads -- thank you for using PyThaiNLP.
Summary – Release Highlights
New Features
Tokenizer
- AttaCut, a fast and accurate tokenizer, is now available through engine="attacut" in pythainlp.tokenize.word_tokenize(). Read more about AttaCut implementation at https://arxiv.org/abs/1911.07056, as presented at New in ML Workshop, NeurIPS 2019.
- ssg, a syllable segmentor, is now available through engine=”ssg” in pythainlp.tokenize.subword_tokenize()
- Tokenization benchmark
Corpus
- Add Thai female, male names corpus
- Add PYTHAINLPDATADIR environment variable to set location of downloaded data
Named-Entity Tagger
- Add HTML-like tag in output
Localization
- New function: pythainlp.util.thai_time, time spell out to Thai words
Other improvements
- Removing and updating many dependencies
- Remove marisa-trie from pythainlp
- Updated tutorial notebooks and documentation
- Better command-line interface
Installation
You can install or upgarde from pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/181
Documentation: https://www.thainlp.org/pythainlp/docs/2.1/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
ภาษาไทย
สวัสดีชาวโลก วันนี้ 10 ธันวาคม 2562 พวกเราได้ปล่อย PyThaiNLP 2.1 ตอนนี้ PyThaiNLP มียอดดาวน์โหลดมากกว่า 197,000 ครั้ง ขอบคุณที่ใช้ PyThaiNLP
สรุป – สิ่งที่สำคัญ
คุณลักษณะใหม่
ตัวตัดข้อความ
- เพิ่ม AttaCut ตัวตัดคำที่เร็วและแม่นยำ เรียกใช้ผ่าน engine="attacut" ใน pythainlp.tokenize.word_tokenize() อ่านวิธีการทำงานของ AttaCut ตามที่นำเสนอที่ New in ML Workshop, NeurIPS 2019 ได้ที่ https://arxiv.org/abs/1911.07056
- เพิ่ม ssg ตัวตัดพยางค์แบบ CRF เรียกใช้ผ่าน engine="ssg" ใน pythainlp.tokenize.subword_tokenize()
- ตัววัดประสิทธิภาพตัวตัดคำ
คลังข้อความ
- เพิ่มคลังข้อมูลชื่อผู้หญิงและผู้ชาย
- เพิ่ม PYTHAINLPDATADIR environment variable สำหรับตั้งค่าการโหลดข้อมูลโมเดล
ตัวหาชื่อ
- เพิ่ม tag ทำนอง HTML ครอบข้อความที่มีชื่อ
การปรับเป็นท้องถิ่น
- เพิ่ม pythainlp.util.thai_time สำหรับแปลงเวลาเป็นคำอ่านภาษาไทย
การปรับปรุงอื่นๆ
- ลบและอัปเดตไลบรารีหลายอัน
- ลบ marisa-trie จาก pythainlp
- ปรับปรุง tutorial notebooks และเอกสาร
- ปรับปรุง command-line interface
การติดตั้ง
สามารถติดตั้งหรือปรับรุ่นได้ด้วยคำสั่ง pip install -U pythainlp
Change log: https://github.com/PyThaiNLP/pythainlp/issues/181
Documentation: https://www.thainlp.org/pythainlp/docs/2.1/
Tutorials https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
พวกเราสร้าง Thai NLP
ทีม PyThaiNLP
- Python
Published by wannaphong over 6 years ago
pythainlp - PyThaiNLP 2.1.dev8
ขอเชิญทุกท่านร่วมกันทดสอบ PyThaiNLP 2.1dev8 PyThaiNLP 2.1dev เป็นรุ่นสำหรับนักพัฒนาไว้ทดสอบ ก่อนปล่อยรุ่นจริงออกมา โดย PyThaiNLP 2.1 จะมีคุณสมบัติใหม่ดังนี้
ความสามารถใหม่ - เพิ่ม pythainlp.benchmarks สำหรับทดสอบการตัดคำภาษาไทย - เพิ่ม pythainlp.util.thai_time สำหรับใช้แปลงเวลาให้เป็นภาษาไทย เช่น 8:17 เป็น แปดนาฬิกาสิบเจ็ดนาที (24 ชั่วโมง) หรือ แปดโมงสิบเจ็ดนาที (6 ชั่วโมง)
การตัดคำ - เพิ่ม ssg เข้ามาเป็นส่วนหนึ่งในการตัดพยางค์ภาษาไทย - เพิ่มตัวตัดคำ attacut ซึ่งเป็นตัวตัดคำที่ใช้ deep learning ที่ถูกสร้างขึ้นเพื่อแก้ไขปัญหาด้านความเร็วตัดคำภาษาไทย - เพิ่ม "newmm-safe" เข้ามาเพื่อแก้ไขปัญหาเวลาเจอข้อความที่กำกวมหรือใช้เวลาตัดคำนานจนผิดปกติ เช่น "หน้าด้านหน้าด้านหน้าด้านหน้าด้านหน้าด้าน" - ปรับปรุงพจนานุกรมที่ใช้ในการตัดคำ
Model updated
- thai2rom เวอร์ชั่นใหม่ใช้ PyTorch ทำงานแทน TF แถมกินแรมน้อยกว่าเดิมมาก
- ThaiNER 1.3 รุ่นใหม่ล่าสุด (ThaiNER) HTML -> SGML พร้อมสามารถส่ง output ออกมาเป็นแท็ก html ได้แล้ว เช่น 'วันที่
Refactoring - ลบ marisa-trie ออกจาก PyThaiNLP ต่อไปใช้ PyThaiNLP ไม่ต้องเจอกับปัญหาติดตั้ง PyThaiNLP แล้ว (@korakot เขียน Trie ใน Python) - ลบ fastai ออกจาก dependencies ที่ถูกใช้ใน pythainlp.ulmfit - ทำความสะอาดโค้ดและเพิ่มชุด Test โดยผ่าน coveralls กว่า 90% - เพิ่ม MD5 checksum ให้กับโมเดลที่โหลดผ่าน pythainlp - รองรับการเปลี่ยนตำแหน่งที่ตั้ง pythainlp-data ได้ง่าย ๆ โดยแก้ตัวแปร env var ชื่อ PYTHAINLPDATADIR ใส่ path ที่ต้องการ
ดูการเปลี่ยนแปลง PyThaiNLP 2.1 ได้ที่ https://github.com/PyThaiNLP/pythainlp/issues/181
สามารถทดลองโดยใช้คำสั่ง
pip install -U --pre pythainlp
ลิงก์ที่สำคัญ - เอกสาร API ศึกษาได้ที่ https://www.thainlp.org/pythainlp/docs/dev/ - เอกสารสอนการใช้งานศึกษาได้ที่ https://thainlp.org/pythainlp/tutorials/ - แจ้ง Bug, สอบถามข้อมูลเกี่ยวกับ PyThaiNLP และรายงานปัญหาได้ที่ https://github.com/PyThaiNLP/pythainlp/issues
ขอขอบคุณผู้ร่วมพัฒนาในเวอร์ชั่นนี้ https://github.com/PyThaiNLP/pythainlp/graphs/contributors
We build Thai NLP. PyThaiNLP
ThaiNLP #NLP #PyThaiNLP
- Python
Published by wannaphong over 6 years ago
pythainlp - PyThaiNLP 2.1.dev5
- Change from
marisa-trieto a Trie implementation written in python
- Python
Published by wannaphong over 6 years ago
pythainlp - PyThaiNLP 2.0.7
PyThaiNLP 2.0.7 Release change log * Bug fix: Include case THANTHAKHAT and SARA U, UU too (pythainlp.util.normalize) https://github.com/PyThaiNLP/pythainlp/pull/244
Upgrade : pip install -U pythainlp Docs : https://thainlp.org/pythainlp/docs/2.0/ User guide: https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb
- Python
Published by wannaphong almost 7 years ago
pythainlp - PyThaiNLP 2.0.6
- fixed #230
- new train ThaiNER
- Python
Published by wannaphong almost 7 years ago
pythainlp - PyThaiNLP 2.0.5
- Clean word lists in
pythainlp.corpus(remove duplicates, etc.) - Fix/add return type hinting for functions in
pythainlp.corpus - Fix deprecated inline flag for regular expression in
pythainlp.corpus.tnc(Thai National Corpus) - Bug fix: reorder condition checks in
pythainlp.tokenize.dict_trieso it catchTriebeforeIterable
- Python
Published by bact about 7 years ago
pythainlp - PyThaiNLP 2.0.4
word_tokenize()'s argumentwhitespacesis nowkeep_whitespaceto make is more explicit, default behavior is to keep whitespacesword_tokenize()can now take a custom dictionary throughtcustom_dictparameterdict_word_tokenize()will be deprecated soon
- Python
Published by bact about 7 years ago
pythainlp - PyThaiNLP 2.0.3
- Fix TCC (Thai Textbook Corpus) corpus always downloading new file issue
- Words and their frequencies from TTC (Thai Textbook Corpus) now has a local copy at
ttc_freq.txtinsidepythainlp.corpus. - Other refactoring and code improvements, including ones related to subword tokenization (Thai Character Cluster / TCC and ETCC), see #193
- Python
Published by bact about 7 years ago
pythainlp - PyThaiNLP 2.0.2
- Fixed tree map
- Subword tokeniser documentation improvement https://github.com/PyThaiNLP/pythainlp/pull/190
- Python
Published by wannaphong about 7 years ago
pythainlp - PyThaiNLP 2.0.1
- Add Tokenizer from pythainlp.tokenize.Tokenizer 79432c2
- NER fixes, code cleaning, and type hinting #186
- Python
Published by wannaphong about 7 years ago
pythainlp - PyThaiNLP 2.0
PyThaiNLP 2.0
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see From PyThaiNLP 1.7 to PyThaiNLP 2.0
📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0
📫 follow us on Facebook Pythainlp
What's new in version 2.0 ?
- New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
- Terminate Python 2 support. Remove all Python 2 compatibility code.
- Remove old, obsolated, deprecated, and experimental code.
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
- ThaiNER 1.0
- Remove sentiment analysis
- Improved wordtokenize (newmm, mm) and dictword_tokenize
- Improved POS-tagging
- More and improved examples
-
Links
Docs: https://thainlp.org/pythainlp/docs/2.0/
GitHub: https://github.com/PyThaiNLP/pythainlp
Issues: https://github.com/PyThaiNLP/pythainlp/issues
Thank you for choosing us.
PyThaiNLP team
- Python
Published by wannaphong about 7 years ago
pythainlp - PyThaiNLP 2.0 Beta
PyThaiNLP is a Python package for text processing and linguistic analysis, similar to nltk but with focus on Thai language.
PyThaiNLP 2.0 Beta for beta testing PyThaiNLP 2.0.
What's new in PyThaiNLP 2.0 ?
- Consolidate documentation files
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
- Remove Python 2 compatibility code
- Remove temporary files, experiment files, and obsoleted files
- Remove sentiment analysis
- More consistent indentations in source code
- Improved wordtokenize (newmm, mm) and dictword_tokenize
- Improved POS-tagging
- More and improved examples
- Improved test coverages with more test case
More details https://github.com/PyThaiNLP/pythainlp/issues/118
Install
pip install https://github.com/PyThaiNLP/pythainlp/archive/2.0b.zip
Docs : https://thainlp.org/pythainlp/docs/2.0/index.html
Website : https://pythainlp.github.io/
GitHub : https://github.com/PyThaiNLP/pythainlp
Issues : https://github.com/PyThaiNLP/pythainlp/issues
Thank you for choosing us.
PyThaiNLP team
- Python
Published by wannaphong about 7 years ago
pythainlp - PyThaiNLP 1.7.4
- Fixed #176
- removed conllu from requirements.txt #175
- Python
Published by wannaphong about 7 years ago
pythainlp - PyThaiNLP 1.7.3
- fixed import thai_syllable.txt
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7.2
- fix sent_tokenize also split text by vertical line #166
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7.1
- Remove duplicated codes , More meaningful exception message, report unknown engine name (@bact )
- Move test folder , Fix Flake8 errors (@zkan )
and more
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7.0.1
- remove import test in PyThaiNLP
- update README.md
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7.0
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
What's new in PyThaiNLP 1.7 ?
- Deprecate Python 2 support
- Refactor pythainlp.tokenize.pyicu for readability
- Add Thai NER model to pythainlp.ner
- thai2vec v0.2 - larger vocab, benchmarking results on Wongnai dataset
- Sentiment classifier based on ULMFit and various product review datasets
- Add ULMFit utility to PyThaiNLP
- Add Thai romanization model thai2rom
- Retrain POS-tagging model
- Improve word tokenize (newmm,mm) and dictwordtokenize
- Documentation added
Install
pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7.0.zip
Docs : https://thainlp.org/pythainlp/docs/1.7/
GitHub : https://github.com/PyThaiNLP/pythainlp
Issues : https://github.com/PyThaiNLP/pythainlp/issues
Thank you for choosing us.
PyThaiNLP team
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7 Beta 1
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
PyThaiNLP 1.7 Beta 1 for beta testing PyThaiNLP 1.7
What's new in PyThaiNLP 1.7 ?
- Deprecate Python 2 support
- Refactor pythainlp.tokenize.pyicu for readability
- Add Thai NER model to pythainlp.ner
- thai2vec v0.2 - larger vocab, benchmarking results on Wongnai dataset
- Sentiment classifier based on ULMFit and various product review datasets
- Add ULMFit utility to PyThaiNLP
- Add Thai romanization model thai2rom
- Retrain POS-tagging model
- Improve word tokenize (newmm,mm) and dictwordtokenize
- Documentation added
Install
pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7b1.zip
Docs : https://thainlp.org/pythainlp/docs/1.7/ (in progress)
Website : https://thainlp.org/pythainlp/ (in progress)
GitHub : https://github.com/PyThaiNLP/pythainlp
Issues : https://github.com/PyThaiNLP/pythainlp/issues
Thank you for choosing us.
PyThaiNLP team
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7 Alpha 2
PyThaiNLP 1.7 Alpha 2 เป็นเวชั่นทดสอบสำหรับนักพัฒนา ไม่แนะนำให้นำไปใช้งานจริง
มีอะไรใหม่ใน PyThaiNLP 1.7
สรุปประเด็นสำคัญ
- เพิ่ม pythainlp.ner เป็น NER สำหรับ PyThaiNLP
- ยกเลิกการสนับสนุน Python 2.7 อย่างเป็นทางการ
- เพิ่ม ULMFit utility เข้ามาใน PyThaiNLP
- ปรับปรุงระบบตัดคำใหม่ ทั้ง newmm และ mm
- thai2vec v0.2
- sentiment analysis ตัวใหม่ทำงานด้วย Deep learning
- เพิ่ม thai2rom เป็น Thai Romanization ทำด้วย Deep learning ในระดับตัวอักษร
- Train Pos tag ใหม่เพิ่มเติมจากเดิม
การติดตั้ง
ใช้คำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7a2.zip
แจ้งข้อผิดพลาดหรือเสนอแนะนำได้ที่ https://github.com/PyThaiNLP/pythainlp/issues
- Python
Published by wannaphong over 7 years ago
pythainlp - PyThaiNLP 1.7 Alpha 1
PyThaiNLP 1.7 Alpha 1 เป็นเวชั่นทดสอบสำหรับนักพัฒนา ไม่แนะนำให้นำไปใช้งานจริง
มีอะไรใหม่ใน PyThaiNLP 1.7
สรุปประเด็นสำคัญ
- ยกเลิกการสนับสนุน Python 2.7 อย่างเป็นทางการ
- เพิ่ม ULMFit utility เข้ามาใน PyThaiNLP
- ปรับปรุงระบบตัดคำใหม่ ทั้ง newmm และ mm
- thai2vec v0.2
- sentiment analysis ตัวใหม่ทำงานด้วย Deep learning
- เพิ่ม thai2rom เป็น Thai Romanization ทำด้วย Deep learning ในระดับตัวอักษร
- Train Pos tag ใหม่เพิ่มเติมจากเดิม
การติดตั้ง
ใช้คำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.7a1.zip
แจ้งข้อผิดพลาดหรือเสนอแนะนำได้ที่ https://github.com/PyThaiNLP/pythainlp/issues
- Python
Published by wannaphong almost 8 years ago
pythainlp - PyThaiNLP 1.6.0.7
- edit dropbox url for thai2vec
- Python
Published by wannaphong almost 8 years ago
pythainlp - PyThaiNLP 1.6.0.5
- fix tcc rule https://github.com/PyThaiNLP/pythainlp/commit/729d32277e2cecd52fa237dcd97f9629d009d8a0
- Python
Published by wannaphong about 8 years ago
pythainlp - PyThaiNLP 1.6
มีอะไรใหม่ใน PyThaiNLP 1.6
- ตัวตัดคำ newmm ถูกเขียนขึ้นใหม่โดยใช้หลัก Maximum Matching algorithm และ TCC เพื่อแก้ไขข้อผิดพลาดจากการตัดคำที่ไม่มีในฐานข้อมูลโดยคุณ @korakot และตัดคำได้รวดเร็วยิ่งขึ้น
- เพิ่ม cutkum (https://github.com/pucktada/cutkum) เข้ามาเป็นส่วนหนึ่งของระบบตัดคำ
- เพิ่ม syllable_tokenize ระบบตัดพยางค์ภาษาไทยโดยใช้ dict ในการตัดพยางค์
- เพิ่ม dictwordtokenize สำหรับใช้เป็นฐานข้อมูลตัดคำได้ตามที่ต้องการ
- pythainlp.romanization โดยใช้ royin ถูกเขียนขึ้นใหม่
- pythainlp.sentiment ถูก Train ใหม่โดยใช้ตัวตัดคำ newmm ทำให้ได้ผลลัพธ์ที่แม่นยำขึ้นมากกว่าเดิม
- เพิ่ม pythainlp.word_vector.thai2vec โดยสามารถนำ https://github.com/cstorm125/thai2vec ของคุณ @cstorm125 ไปใช้งานได้
- เพิ่มระบบเก็บไฟล์ไว้ใน pythainlp-data สำหรับใช้เก็บข้อมูลต่าง ๆ ของ PyThaiNLP
- ติดตั้งได้สะดวกยิ่งขึ้นด้วยการเขียนโค้ดทดแทน pyicu ทำให้ไม่จำเป็นต้องติดตั้ง pyicu อีกต่อไป
เอกสารการใช้งาน https://github.com/PyThaiNLP/pythainlp/blob/pythainlp1.6/docs/pythainlp-1-6-thai.md
แล้วติดตั้งได้ด้วยคำสั่ง pip install -U pythainlp
- Python
Published by wannaphong over 8 years ago
pythainlp - PyThaiNLP 1.6 Beta 1
PyThaiNLP 1.6 Beta 1 รุ่นทดสอบสำหรับนักพัฒนาและบุคคลทั่วไป เป็นรุ่นที่ API นิ่งแล้ว
มีอะไรใหม่ใน PyThaiNLP 1.6
- ตัวตัดคำ newmm ถูกเขียนขึ้นใหม่โดยใช้หลัก Maximum Matching algorithm และ TCC เพื่อแก้ไขข้อผิดพลาดจากการตัดคำที่ไม่มีในฐานข้อมูลโดยคุณ @korakot และตัดคำได้รวดเร็วยิ่งขึ้น
- เพิ่ม cutkum (https://github.com/pucktada/cutkum) เข้ามาเป็นส่วนหนึ่งของระบบตัดคำ
- เพิ่ม syllable_tokenize ระบบตัดพยางค์ภาษาไทยโดยใช้ dict ในการตัดพยางค์
- เพิ่ม dictwordtokenize สำหรับใช้เป็นฐานข้อมูลตัดคำได้ตามที่ต้องการ
- pythainlp.romanization โดยใช้ royin ถูกเขียนขึ้นใหม่
- pythainlp.sentiment ถูก Train ใหม่โดยใช้ตัวตัดคำ newmm ทำให้ได้ผลลัพธ์ที่แม่นยำขึ้นมากกว่าเดิม
- เพิ่ม pythainlp.word_vector.thai2vec โดยสามารถนำ https://github.com/cstorm125/thai2vec ของคุณ @cstorm125 ไปใช้งานได้
- เพิ่มระบบเก็บไฟล์ไว้ใน pythainlp-data สำหรับใช้เก็บข้อมูลต่าง ๆ ของ PyThaiNLP
- ติดตั้งได้สะดวกยิ่งขึ้นด้วยการเขียนโค้ดทดแทน pyicu ทำให้ไม่จำเป็นต้องติดตั้ง pyicu อีกต่อไป
เอกสารการใช้งาน https://github.com/PyThaiNLP/pythainlp/blob/dev/docs/pythainlp-1-6-thai.md (กำลังปรับปรุง)
สามารถทดลองใช้งานได้ โดยลบ PyThaiNLP เวชั่นก่อนทิ้งด้วยคำสั่ง pip uninstall pythainlp
แล้วติดตั้งได้ด้วยคำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.6-beta-1.zip
หากท่านพบ Bug สามารถแจ้งได้ที่ https://www.facebook.com/pythainlp/ หรือหน้า https://github.com/PyThaiNLP/pythainlp/issues
ขอบคุณท่านที่ใช้ PyThaiNLP :)
ทีมนักพัฒนา PyThaiNLP
- Python
Published by wannaphong over 8 years ago
pythainlp - PyThaiNLP 1.6 Alpha 2
มีอะไรใหม่ ? - ปรับปรุงความเร็วในการตัดคำด้วย newmm ด้วยการเขียนโค้ดตัดคำใหม่โดยคุณ @korakot และปรับปรุงประสิทธิภาพในการตัดคำภาษาไทย https://github.com/PyThaiNLP/pythainlp/issues/65 - เพิ่ม pythainlp.word_vector.thai2vec โดยรวม thaivec ของคุณ @cstorm125 เข้ามาใน PyThaiNLP
ก่อนทดลองใช้งานให้ทำการลบ PyThaiNLP เวชั่นเก่าทิ้งด้วยคำสั่ง pip uninstall pythainlp
ติดตั้งได้ด้วยคำสั่ง pip install https://github.com/PyThaiNLP/pythainlp/archive/1.6a2.zip
- Python
Published by wannaphong over 8 years ago
pythainlp - PyThaiNLP 1.6 Alpha 1
PyThaiNLP 1.6 รุ่น alpha 1 (รุ่นสำหรับนักพัฒนาเท่านั้น)
มีอะไรใหม่
- เพิ่มความเร็วในการตัดคำด้วยการ build model Trie ไว้
- เพิ่มตัวตัดพยางค์ภาษาไทย
- เพิ่ม API ให้ผู้ใช้งานโมดูลสามารถใช้พจนานุกรมของตัวเองในการตัดคำได้
- เปลี่ยนจากตัวตัดคำ icu ค่าเริ่มต้นไปเป็น newmm
- แก้ไขการตัดคำผิดโดยใช้ TCC (Thai Character Clusters) เข้ามาช่วยตัดคำด้วย
ทดลองได้ด้วยคำสั่ง
pip install https://github.com/PyThaiNLP/pythainlp/archive/1.6a1.zip
- Python
Published by wannaphong over 8 years ago