Releases | Open Source Science

collogetr - Bug fixes to comply with tidyr's `nest` and `unnest` new behaviour

This is a backward-compatible release of bug fix, following the new behaviour of tidyr's nest() and unnest() functions that require the data argument to be specified.

- R
Published by gederajeg almost 6 years ago

collogetr - Bug fix and update

Bug fixes

The bug includes error when pulling out nn as the results of tally() in the previous version of dplyr (i.e. v0.7.8). This bug was identified in the AppVeyor and Travis builds (cf. here and here respectively), where column nn was not identified from .data. There is one line of code where column nn was used in colloc_leipzig(). Now, that has been changed into n and the builds for this release are success with the updated dplyr version (v0.8.0.1) (cf. here and here for AppVeyor and Travis builds respectively).

Development

Add a new function called collex_llr() to perform association measure using log-likelihood ratio.

- R
Published by gederajeg almost 7 years ago

collogetr - Bug fixes and updates

Bug fixes

Fix bug in the searching procedure. In this version, the corpus is firstly tokenised and the node word is searched through its exact word-form.
Fix bug in the output column names and the number of columns output when the save_interim argument is TRUE.

Development

Increase the test coverage for the codes
Add lifecycle and repo status badge, including the app veyor build badge

Next release

Add the Log-likelihood as alternative association measure
Add the Multiple Distinctive Collexeme Analysis (MDCA) as association measure for contrasting more than two near-synonymous node words. MDCA uses one-tailed, exact Binomial Test to determine the distinctive collocates of a node word in comparison to its near-synonyms.

- R
Published by gederajeg over 7 years ago

collogetr - Minor update on LICENSE and Website

This is a minor update involving change of License from GPL-2 to MIT. The update also includes setting up GitHub webpage for the package. There are no additional functions, but more test coverage for the existing functions.

- R
Published by gederajeg over 7 years ago

collogetr - collogetr 1.0.0

Breaking changes

Existing functions

colloc_leipzig()
- A feature to search collocates for multiple node words in one go. These words have to be combined in the form of a character vector (e.g., c("membeli", "menjual")).
- Additional output of (i) sentence-match in which the collocates and the node word(s) are found, and (ii) window span information of the collocates in relation to the node word(s) (e.g., r1 for collocates occurring one-word to the right of the node).
assoc_prepare()
- Allows processing the input frequency data per corpus or combined across all corpus files.
- Allows to select a give collocate span to focus on for the association measure.

New functions

assoc_prepare_dca()
- The function to generate required input data for performing Distinctive Collexeme/Collocates Analysis (DCA). It takes the output of assoc_prepare(), which in turns is fed with the output of colloc_leipzig().
collex_fye_dca()
- The function to perform DCA using one-tailed Fisher-Yates' Exact (FYE) test. It requires the output of the assoc_prepare_dca().
dca_top_collex()
- The function to extract the top-n distinctive collocates/collexemes for a given word/construction.
collex_chisq()
- The function to perform association measure using the Chi-square statistics.

Future developments

The next iteration of the package will include:
- Other kinds of association measures commonly used in collocational studies, such as Mutual Information and Log-likelihood, and the inclusion of the odds ratio from the FYE test.
- Another function to retrieve collocates from different corpus types (e.g. from a corpus that is not parsed/split according to sentences as in the Indonesian Leipzig Corpora).

- R
Published by gederajeg over 7 years ago

collogetr - First release

The package contains one function called colloc_leipzig() to retrieve window-span collocates from Indonesian Leipzig Corpora. The function currently can only search for one word at a time. Thus, it is slow considering the function do tokenisation in the process. So, if we want to search for word X and Y in corpus C, two searching calls are required and thus corpus C need to be tokenised in each of these calls.

The package also contains a function to prepare an input table (assoc_prepare()) for performing association measure for collocational analysis using Fisher's Yates Exact Test (collex_fye()).

The next release will fix the colloc_leipzig() function for multiple pattern search and more efficient procedure.

- R
Published by gederajeg over 7 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of collogetr

collogetr - Bug fixes to comply with tidyr's `nest` and `unnest` new behaviour

collogetr - Bug fix and update

Bug fixes

Development

collogetr - Bug fixes and updates

Bug fixes

Development

Next release

collogetr - Minor update on LICENSE and Website

collogetr - collogetr 1.0.0

Breaking changes

Existing functions

New functions

Future developments

collogetr - First release