Recent Releases of gibasa

gibasa - v1.1.2

This is a patch release. There are no user's visible changes.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v1.1.1...v1.1.2

- C++
Published by paithiov909 over 1 year ago

gibasa - v1.1.1

What's Changed

  • tokenize now warns rather than throws an error when an invalid input is given during partial parsing. With this change, tokenize is no longer entirely aborted even if an invalid string is given. Parsing of those strings is simply skipped.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v1.1.0...v1.1.1

- C++
Published by paithiov909 almost 2 years ago

gibasa - v1.1.0

What's Changed

  • chore(deps): update actions/setup-python action to v5 by @renovate in https://github.com/paithiov909/gibasa/pull/32
  • Fix global_idf3 by @paithiov909 in https://github.com/paithiov909/gibasa/pull/35
  • Refactor bindtfidf2 by @paithiov909 in https://github.com/paithiov909/gibasa/pull/36

Full Changelog: https://github.com/paithiov909/gibasa/compare/v1.0.1...v1.1.0

- C++
Published by paithiov909 over 2 years ago

gibasa - v1.0.1

New Feature: dictionary compiler is integrated 🚀

In this release, added wrappers around the 'dictionary compiler' of MeCab. With source dictionaries and CSV files, you can build MeCab system/user dictionaries without leaving your R console.

Even in environments where MeCab is not installed, such as the Posit Cloud, you can try this snippet right away!!

```r require(gibasa)

if (requireNamespace("withr")) { # create a sample dictionary in temporary directory buildsysdic( dicdir = system.file("latin", package = "gibasa"), outdir = tempdir(), encoding = "utf8" ) # copy the 'dicrc' file file.copy( system.file("latin/dicrc", package = "gibasa"), tempdir() ) # write a csv file and compile it into a user dictionary csvfile <- tempfile(fileext = ".csv") writeLines( c( "qa, 0, 0, 5, \u304f\u3041", "qi, 0, 0, 5, \u304f\u3043", "qu, 0, 0, 5, \u304f", "qe, 0, 0, 5, \u304f\u3047", "qo, 0, 0, 5, \u304f\u3049" ), csvfile ) builduserdic( dicdir = tempdir(), file = (userdic <- tempfile(fileext = ".dic")), csvfile = csvfile, encoding = "utf8" ) # mocking a 'mecabrc' file to temporarily use the dictionary withr::withenvvar( c( "MECABRC" = if (.Platform$OS.type == "windows") { "nul" } else { "/dev/null" }, "RCPPPARALLELBACKEND" = "tinythread" ), { tokenize("quensan", sysdic = tempdir(), userdic = userdic) } ) } ```

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.9.5...v1.0.1

- C++
Published by paithiov909 over 2 years ago

gibasa - v0.9.5

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.9.4...v0.9.5

- C++
Published by paithiov909 almost 3 years ago

gibasa - v0.9.4

Updated Makevars for Unix alikes. Users can now use a file specified by the MECABRC environment variable or ~/.mecabrc to set up dictionaries.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.9.3...v0.9.4

- C++
Published by paithiov909 almost 3 years ago

gibasa - v0.9.3

This is a patch release. For CRAN's checks, removed unnecessary C++ files.

- C++
Published by paithiov909 about 3 years ago

gibasa - v0.9.2

Initial CRAN release 🚀😎✨

I'm excited to announce {gibasa} is now on CRAN!! Now you can more easily install {gibasa} from CRAN as well as from r-universe.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.8.1...v0.9.2

- C++
Published by paithiov909 about 3 years ago

gibasa - v0.8.1

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.8.0...v0.8.1

- C++
Published by paithiov909 about 3 years ago

gibasa - v0.8.0

What's changed

  • [Breaking Change] Changed numbering style of 'sentence_id' when split is FALSE.
  • Added grain_size argument to tokenize.
  • Added new bind_lr function.
  • Use RcppParallel::parallelFor instead of tbb::parallel_for.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.7.1...v0.8.0

- C++
Published by paithiov909 about 3 years ago

gibasa - v0.7.1

What's Changed

gibasa 0.7.1

  • Fix documentations. There are no visible changes.

gibasa 0.7.0

  • tokenize can now accept a character vector in addition to a data.frame like object.
  • gbs_tokenize is now deprecated. Please use the tokenize function instead.

gibasa 0.6.4

  • Refactored is_blank.

gibasa 0.6.3

  • Added the partial argument to gbs_tokenize and tokenize. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.

gibasa 0.6.2

  • More friendly errors are returned when invalid dictionary path was provided.
  • Added new posDebugRcpp function.

gibasa 0.6.1

  • Revert some missing examples.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.6.0...v0.7.1

- C++
Published by paithiov909 over 3 years ago

gibasa - v0.6.0

What's Changed

  • Functions added in version '0.5.1' was moved to 'audubon' package (>= 0.4.0) by @paithiov909 in https://github.com/paithiov909/gibasa/pull/19

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.5.1...v0.6.0

- C++
Published by paithiov909 over 3 years ago

gibasa - v0.5.1

  • Added some new functions.
    • bind_tf_idf2 can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset.
    • collapse_tokens, mute_tokens, and lexical_density can be used for handling a tidy text dataset of tokens.

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.5.0...v0.5.1

- C++
Published by paithiov909 over 3 years ago

gibasa - v0.5.0

What's Changed

  • Include MeCab source code in package by @paithiov909 in https://github.com/paithiov909/gibasa/pull/15
    • gibasa now includes the MeCab source, so that users do not need to pre-install the MeCab library when building and installing the package (to use tokenize, it still requires MeCab and its dictionaries installed and available).

Full Changelog: https://github.com/paithiov909/gibasa/compare/v0.4.1...v0.5.0

- C++
Published by paithiov909 over 3 years ago

gibasa - v0.4.1

  • tokenize now preserves the original order of docid_field.

- C++
Published by paithiov909 over 3 years ago

gibasa - v0.4.0

  • Added bind_tf_idf2 function and is_blank function.

- C++
Published by paithiov909 almost 4 years ago

gibasa - v0.3.0

- C++
Published by paithiov909 about 4 years ago

gibasa - v0.2.1

- C++
Published by paithiov909 about 4 years ago

gibasa - v0.1.3

- C++
Published by paithiov909 about 4 years ago