Recent Releases of collogetr
collogetr - Bug fixes to comply with tidyr's `nest` and `unnest` new behaviour
This is a backward-compatible release of bug fix, following the new behaviour of tidyr's nest() and unnest() functions that require the data argument to be specified.
- R
Published by gederajeg almost 6 years ago
collogetr - Bug fix and update
Bug fixes
- The bug includes error when pulling out
nnas the results oftally()in the previous version ofdplyr(i.e. v0.7.8). This bug was identified in the AppVeyor and Travis builds (cf. here and here respectively), where columnnnwas not identified from.data. There is one line of code where columnnnwas used incolloc_leipzig(). Now, that has been changed intonand the builds for this release are success with the updateddplyrversion (v0.8.0.1) (cf. here and here for AppVeyor and Travis builds respectively).
Development
- Add a new function called
collex_llr()to perform association measure using log-likelihood ratio.
- R
Published by gederajeg almost 7 years ago
collogetr - Bug fixes and updates
Bug fixes
- Fix bug in the searching procedure. In this version, the corpus is firstly tokenised and the node word is searched through its exact word-form.
- Fix bug in the output column names and the number of columns output when the
save_interimargument isTRUE.
Development
- Increase the test coverage for the codes
- Add lifecycle and repo status badge, including the app veyor build badge
Next release
- Add the Log-likelihood as alternative association measure
- Add the Multiple Distinctive Collexeme Analysis (MDCA) as association measure for contrasting more than two near-synonymous node words. MDCA uses one-tailed, exact Binomial Test to determine the distinctive collocates of a node word in comparison to its near-synonyms.
- R
Published by gederajeg over 7 years ago
collogetr - Minor update on LICENSE and Website
This is a minor update involving change of License from GPL-2 to MIT. The update also includes setting up GitHub webpage for the package. There are no additional functions, but more test coverage for the existing functions.
- R
Published by gederajeg over 7 years ago
collogetr - collogetr 1.0.0
Breaking changes
Existing functions
colloc_leipzig()- A feature to search collocates for multiple node words in one go. These words have to be combined in the form of a character vector (e.g.,
c("membeli", "menjual")). - Additional output of (i)
sentence-matchin which the collocates and the node word(s) are found, and (ii) windowspaninformation of the collocates in relation to the node word(s) (e.g.,r1for collocates occurring one-word to the right of the node).
- A feature to search collocates for multiple node words in one go. These words have to be combined in the form of a character vector (e.g.,
assoc_prepare()- Allows processing the input frequency data per corpus or combined across all corpus files.
- Allows to select a give collocate
spanto focus on for the association measure.
New functions
assoc_prepare_dca()- The function to generate required input data for performing Distinctive Collexeme/Collocates Analysis (DCA). It takes the output of
assoc_prepare(), which in turns is fed with the output ofcolloc_leipzig().
- The function to generate required input data for performing Distinctive Collexeme/Collocates Analysis (DCA). It takes the output of
collex_fye_dca()- The function to perform DCA using one-tailed Fisher-Yates' Exact (FYE) test. It requires the output of the
assoc_prepare_dca().
- The function to perform DCA using one-tailed Fisher-Yates' Exact (FYE) test. It requires the output of the
dca_top_collex()- The function to extract the top-n distinctive collocates/collexemes for a given word/construction.
collex_chisq()- The function to perform association measure using the Chi-square statistics.
Future developments
- The next iteration of the package will include:
- Other kinds of association measures commonly used in collocational studies, such as Mutual Information and Log-likelihood, and the inclusion of the odds ratio from the FYE test.
- Another function to retrieve collocates from different corpus types (e.g. from a corpus that is not parsed/split according to sentences as in the Indonesian Leipzig Corpora).
- R
Published by gederajeg over 7 years ago
collogetr - First release
The package contains one function called colloc_leipzig() to retrieve window-span collocates from Indonesian Leipzig Corpora. The function currently can only search for one word at a time. Thus, it is slow considering the function do tokenisation in the process. So, if we want to search for word X and Y in corpus C, two searching calls are required and thus corpus C need to be tokenised in each of these calls.
The package also contains a function to prepare an input table (assoc_prepare()) for performing association measure for collocational analysis using Fisher's Yates Exact Test (collex_fye()).
The next release will fix the colloc_leipzig() function for multiple pattern search and more efficient procedure.
- R
Published by gederajeg over 7 years ago