Recent Releases of gibasa
gibasa - v1.1.1
What's Changed
tokenizenow warns rather than throws an error when an invalid input is given during partial parsing. With this change,tokenizeis no longer entirely aborted even if an invalid string is given. Parsing of those strings is simply skipped.
Full Changelog: https://github.com/paithiov909/gibasa/compare/v1.1.0...v1.1.1
- C++
Published by paithiov909 almost 2 years ago
gibasa - v1.1.0
What's Changed
- chore(deps): update actions/setup-python action to v5 by @renovate in https://github.com/paithiov909/gibasa/pull/32
- Fix global_idf3 by @paithiov909 in https://github.com/paithiov909/gibasa/pull/35
- Refactor bindtfidf2 by @paithiov909 in https://github.com/paithiov909/gibasa/pull/36
Full Changelog: https://github.com/paithiov909/gibasa/compare/v1.0.1...v1.1.0
- C++
Published by paithiov909 over 2 years ago
gibasa - v1.0.1
New Feature: dictionary compiler is integrated 🚀
In this release, added wrappers around the 'dictionary compiler' of MeCab. With source dictionaries and CSV files, you can build MeCab system/user dictionaries without leaving your R console.
Even in environments where MeCab is not installed, such as the Posit Cloud, you can try this snippet right away!!
```r require(gibasa)
if (requireNamespace("withr")) { # create a sample dictionary in temporary directory buildsysdic( dicdir = system.file("latin", package = "gibasa"), outdir = tempdir(), encoding = "utf8" ) # copy the 'dicrc' file file.copy( system.file("latin/dicrc", package = "gibasa"), tempdir() ) # write a csv file and compile it into a user dictionary csvfile <- tempfile(fileext = ".csv") writeLines( c( "qa, 0, 0, 5, \u304f\u3041", "qi, 0, 0, 5, \u304f\u3043", "qu, 0, 0, 5, \u304f", "qe, 0, 0, 5, \u304f\u3047", "qo, 0, 0, 5, \u304f\u3049" ), csvfile ) builduserdic( dicdir = tempdir(), file = (userdic <- tempfile(fileext = ".dic")), csvfile = csvfile, encoding = "utf8" ) # mocking a 'mecabrc' file to temporarily use the dictionary withr::withenvvar( c( "MECABRC" = if (.Platform$OS.type == "windows") { "nul" } else { "/dev/null" }, "RCPPPARALLELBACKEND" = "tinythread" ), { tokenize("quensan", sysdic = tempdir(), userdic = userdic) } ) } ```
Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.9.5...v1.0.1
- C++
Published by paithiov909 over 2 years ago
gibasa - v0.9.2
Initial CRAN release 🚀😎✨
I'm excited to announce {gibasa} is now on CRAN!! Now you can more easily install {gibasa} from CRAN as well as from r-universe.
Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.8.1...v0.9.2
- C++
Published by paithiov909 about 3 years ago
gibasa - v0.8.0
What's changed
- [Breaking Change] Changed numbering style of 'sentence_id' when
splitisFALSE. - Added
grain_sizeargument totokenize. - Added new
bind_lrfunction. - Use
RcppParallel::parallelForinstead oftbb::parallel_for.
Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.7.1...v0.8.0
- C++
Published by paithiov909 about 3 years ago
gibasa - v0.7.1
What's Changed
gibasa 0.7.1
- Fix documentations. There are no visible changes.
gibasa 0.7.0
tokenizecan now accept a character vector in addition to a data.frame like object.gbs_tokenizeis now deprecated. Please use thetokenizefunction instead.
gibasa 0.6.4
- Refactored
is_blank.
gibasa 0.6.3
- Added the
partialargument togbs_tokenizeandtokenize. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.
gibasa 0.6.2
- More friendly errors are returned when invalid dictionary path was provided.
- Added new
posDebugRcppfunction.
gibasa 0.6.1
- Revert some missing examples.
Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.6.0...v0.7.1
- C++
Published by paithiov909 over 3 years ago
gibasa - v0.5.1
- Added some new functions.
bind_tf_idf2can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset.collapse_tokens,mute_tokens, andlexical_densitycan be used for handling a tidy text dataset of tokens.
Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.5.0...v0.5.1
- C++
Published by paithiov909 over 3 years ago
gibasa - v0.5.0
What's Changed
- Include MeCab source code in package by @paithiov909 in https://github.com/paithiov909/gibasa/pull/15
- gibasa now includes the MeCab source, so that users do not need to pre-install the MeCab library when building and installing the package (to use tokenize, it still requires MeCab and its dictionaries installed and available).
Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.4.1...v0.5.0
- C++
Published by paithiov909 over 3 years ago