collogetr

Collocates retriever and Collocation association measure

https://github.com/gederajeg/collogetr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

association-measures collexeme-analysis collocates-retriever collocation-extraction collostructional-analysis corpus corpus-linguistics dplyr indonesian-language leipzig leipzig-corpora-collection leipzig-corpus-files purrr stringr tidyr tidyverse
Last synced: 6 months ago · JSON representation ·

Repository

Collocates retriever and Collocation association measure

Basic Info
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 6
Topics
association-measures collexeme-analysis collocates-retriever collocation-extraction collostructional-analysis corpus corpus-linguistics dplyr indonesian-language leipzig leipzig-corpora-collection leipzig-corpus-files purrr stringr tidyr tidyverse
Created almost 8 years ago · Last pushed almost 6 years ago
Metadata Files
Readme License Citation

README.Rmd

---
output: 
    github_document:
      html_preview: FALSE
link-citations: TRUE
bibliography: 'citation_README.bib'
csl: 'apa-single-spaced.csl'
---



[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active) [![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing) [![Travis-CI Build Status](https://travis-ci.org/gederajeg/collogetr.svg?branch=master)](https://travis-ci.org/gederajeg/collogetr) [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/gederajeg/collogetr?branch=master&svg=true)](https://ci.appveyor.com/project/gederajeg/collogetr) [![Coverage Status](https://img.shields.io/codecov/c/github/gederajeg/collogetr/master.svg)](https://codecov.io/github/gederajeg/collogetr?branch=master) [![DOI](https://img.shields.io/badge/doi-10.26180/5b7b9c5e32779-blue.svg?style=flat&labelColor=gainsboro&logoWidth=40&logo=data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAFAAAAAZCAYAAACmRqkJAAAKi0lEQVR4Ae3ZaVBUV97H8evuE0EfH32MmkcfoyAuGjXKgkvMaFRAFuiloemWvRuEXlgEBREXBYJiXAQUFeKocUniQiKogAJhQWwWENDEjLNYvjFLzUzNkplEZb5kTme6nCRjKlOpSZlb9SmL2%2Ffcuv3re87%2FnKP0TYfOcslqPMbt63xBKuh09MTxgi7HKT1Sj1TvKp%2BMkZB6%2FXT8c4AjUYPyVdfb7Qs6HTIJ8EHe7Ul%2B152CphDabRQ0uMr7%2FRQgh%2B8qU6%2FBiPDVGv0jq0uGE94b0ZZ3j%2B25MTetoMsh%2FWD91OBqT9%2Fsehd5EqGV17nKMzTqOHvaRMMLEp7qACfinq%2FW1BBx5ZxB13x5X3Jr1v%2Fz9pUcaHU63PiicjrhvXfNRbY1Th49Q6Y1vu6zyqSjzX3aVIgf4OkKToxhgxpd5OMzV0bYE4CRN1Chu34pnTfwnV03FiTlfzDRXBHo6dfgIq8sX6ByV6vjthGc0UdrrPPVGFQBxlSjzJQWENVUZkebceiLpyM8IZSx7O7Zl4JivUNMZX5h8Rt4%2B2L0llKfgu6JKa%2BXvpB5bZ48%2Ba3F6lil2pDkE2rODzCsU0VUnNFHNZQqdS3lx3Utl%2FMILQcfYt5TEeC1GSprgAq0XlgYGLQyxJTlr0uK0DVX7E5s2ZtOgHvLw5fLK9xVmcqguEj%2F2LXbwsvPBkZZKl4j5NcIKinaUsLbejFWZ7m8Do2cmwnb4cFqArRwx3TEYzi%2Bz7DTD0uhxnj8cAEWWUZK%2BTcdhh4pmTWUsW01Y1uCUmNY7Rtqzo5svJSS0poVXtg6yVj7sn9qunek3j8xPVXXeMFoaDkev6lDF7ene7Y5r2taNAXmEBXaP69zevaOjuUeeZ0zhzJuPsM5CdYvOhZVqBMhBqIVDt8zwGdQjR4of9AA%2BXJjUFpww7GodnHAQca4srDAWCXjW3pETal%2BbfumuOLKqSm17vIQtWr1Uu3JYy6JbXuXFbRN1R8pm5byxtG5CcdOz9EUVc7I5IeQEWQ7wWVwzwrsRn%2BbAFeiCxNsKv5Y9P03BFgjAlT90AGOQy2T47fObl00ocFZHl%2B2UGXw0RjzNUWHTPFthckHWh18al8KsGuaFigVVzlKuY%2BG9z37qvuoGlelpsJVldrgrFjbOE%2BeWe8uW18W84qCqc4s7tmCIgzI75hs%2FaJKNFu7rF%2BIIIhr%2BmIQ%2Btn8LQkDMQOeWAYnDHgsQI3NNU7W9j4h5t72o%2FEyvLEQ%2F%2Bu7ymzbOxbCAeOxAgtghz6YgOVYiufEOUlqu0M37ho%2BYn%2FnpJT8bsejVSt90uqdFdlGmV7hF7cuWXetNCShLX%2BI3nKhN%2ByvCs%2Bs6GQpWB33fzKNQR%2BqWr022yvc94q7spBCY%2Bbzkou6ZfJNPf89ZN%2FdidYHnIsKfIzjCMIc7MAwSJiMPFxGMcKQixGwx07R%2FiEe4CNsxFCbAJvwifj8LkIgYRHa8Lm47jNY8AokmMS5NryPh%2FijOB%2BOX4h7foEuyPHlisMtylJpzu1YspkQ36YbLqnx8F1X4abaqmYs9DGmLlrk4CE9XlHlKZskxfpt%2FUJLzyhV23dG%2BITF72fqo9njEaokwIu8lSbG1N4wx273CrP%2B%2BjniQVZhGrzQjlEioFIRcjDM6MIdjBVtHogvl4W9qIX8sTfwU5SgU%2FzdhdGYLcJ9BzvRID6vgx2SxN8PUI9KnIEWH4n7FuIo%2FoRfYV5vMMV4wHRFs%2BvG%2FKl05ZrDVdP11T7eulK3oNQcz%2FAXcj3DpMePjO44KetDL2lDh%2FmV1S3nNoeWnJb7RSXmMJl%2BI0GmH13rKs8lvEdQwfoWKmCxdmGbAEdgAW5jFiQhBb8WXSYTPSjGCBHaMPR5LMANkOCM%2B%2FgD3MS5Z8W1ElzwW3HNJCSI9tcw2ub%2BO8T5LPTBQBy1nusNcB7ztximI1sIsSSzXb04v3vyusJmx63nMufHXlV6LvpEShDd9x%2FHFYWXVPuSX7%2FD7zmpcjuWRupbyvaHnj8Z7BNsUFCArm70iTRcd5bFEN7oxwJs%2FpoA%2FwfBaLJ2Z2EFbmEsNKL7fYYPUI9DIqj%2Fsgkw0CasW%2BL6RbBDFI7gTZSKzz6Gk02AJ23G3QF4xybYU8INce6s5CJNlTyXhYwKv%2FRWMiEeimquzIhrPpGzuSNCsbvLec2%2Brpmh2e0yu%2FxOp96wv6p8X0xeIZW5Bo2%2F6ucdvb%2FdMWVDm8lX11pRpD16OJ6VyZsrQ8yK%2BVFJ9h4UhwEHDj5JgGE23UkSfoZujMMzSESNCPBT9KAFjqi2rcIYZRPgYmzDQ9xDLSz4%2FGsCPIE%2BNkWrTJy%2FhRrRthpVyJJExbnmG2I%2B6x%2BT%2FHxYyQkzQfJGlufpWy6bYlvPUEgu%2BHlHJA5boo7rE3blnBR7r6mv%2BvCBMYEag%2Faqsyr1%2BIk5a%2Fd2z9zGBDpZ31qulCWk9443Hfg5BuJJAgxAG0ZBEmS4DZ7RKIliMVi0d8UvRUCeuPoNAf4Z%2FmgV13pAwiwR3iffFKBQJM5noB%2F6Y5h45v7Wwf0cDtD1DlMIeiugWmZOy5Cv3RgjX7%2FF4GdMXasOjgurmqdafqpojltml9IjvOJ8NMu9lNL5gQmXdMu0BTefz8loMyoJvivs3VMZvhpjqaig%2FZ8gwJGYIsIKRh%2FY4wh%2Bg%2FGQoxYbREgZ%2BB3uww1V3xKgN%2BrwCNtF4Pvx8NveQCEYX%2BAukhCIYuHZLy%2FyDjHbJQfo7PTK1dEBWqPBX2vS%2B2hNW1XquDURypiwXStCjVWuyrSKQC%2FdoUaHtOT2HENoyal4b40x7rK7ylip9NIV3Jy0P6fD24fl3Ra6uoe3PNqOH2Pw3x%2FC8K8CHIU%2BIpQ7OI8yNOJ9TMJO%2FAU9Nn6PjRiGmm%2FpwgsRLQpKjwjuU%2Fz1CQK0R4G4T4%2FwCHWYKlmcA6xr4SA2EzobXeUa9vh21LgpdKxK8hqd5RsaXWS7S9YvlhU2O7ya3ekXrm%2B9lK3KzFH6a4y5V92Ve5hkM4d02EShMestZekE2IxZX7MWdkAgBtmsi9U2lXEwliAOK%2BGLTowThWIZkrEVSSKYgegPOUxwtFmdaBGLsRgg2qeKtosQDh2GYzbisUIEaPvcQ8T5VGzCKowBk2I3mTVALe4wd4tumKcoaZirSKte4RtVrvXwLrw%2BJXV%2F18Ts3BtLEmOaS0yRtRdMfpGJhTKNMbDJWR5V7eEbUNDtcIQAd1PJMwnuJl6E9KQHY7AAHkzQoBkj8B%2B%2FpTWQ4Maezne1P3x1esLBuqmB%2BbccNhJMGetbM%2BGZIi1V%2FoRyOXB77sKVWuPmrd4RBvYQm9ihVue%2F7xDPGljB50MoJmO%2By36gCGsQovCyCGwOarD9R7PLLXZOJjKZvse%2FDQQSvffG7F1rWrZPiLKUX2DPr1hbfHAKb0kDBSeTed5MQj94Pn1xBMvA%2B2IDYTAkcXzXANPRjHq04ACeFeH9aAIcBC3LOq%2FY5pPDeYtO4yRTmzUhbx9LozCEea8ybaHoxDNmVtPltxSVzxhCm3Asg4Tvs683Aa5wwkD8qP9XbgQqUbb6Tp09U5Os3rWiV4jZv2OuvxPdvht70RfST8fjATZd7P33OYzxZ%2FdF7FwcgqPU0yMR2vMYDulpDfBvw%2BGCdBePpq8AAAAASUVORK5CYII%3D)](http://dx.doi.org/10.26180/5b7b9c5e32779)


```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
scaps <- function(concept) {
  paste("", concept, "", sep="")
}
```

# collogetr

## Overview

collogetr performs (i) collocates retrieval (currently from the sentence-based (Indonesian) [Leipzig Corpora](http://wortschatz.uni-leipzig.de/en/download)) and (ii) computation of collocation association-measures. The function `colloc_leipzig()` is used to retrieve window-span collocates for a set of word forms (viz. the *nodeword(s)* or *keyword(s)*). 

Two other functions (namely, `assoc_prepare()` and `assoc_prepare_dca()`) serve to process the output of `colloc_leipzig()` into tabular/data frame formats, which then become the input data for computing the **association measure** between the collocates and the node (as in Stefanowitsch and Gries' [-@stefanowitsch_collostructions_2003] [collostructional/collocation analysis](http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/index.html)) [@hoffmann_collostructional_2013; @stefanowitsch_corpora_2009; see also, @gries_more_2015]. The function `assoc_prepare()` generates input data for computing [*Simple Collexeme/Collocational Analysis*](http://www.linguistics.ucsb.edu/faculty/stgries/research/2003_AS-STG_Collostructions_IJCL.pdf) (SCA), meanwhile `assoc_prepare_dca()` uses the output of `assoc_prepare()` to generate input data for computing [*Distinctive Collexeme/Collocates Analysis*](http://www.linguistics.ucsb.edu/faculty/stgries/research/2004_STG-AS_ExtendingCollostructions_IJCL.pdf) (DCA) [@hilpert_distinctive_2006; @gries_extending_2004]. Based on the output of `assoc_prepare()`, SCA can then be computed using the default `collex_fye()`, which is based on Fisher-Yates Exact test (other association measures available for computing SCA include (i) `collex_llr()` for Log-Likelihood Ratio, (ii) `collex_MI()` for Mutual Information score, (iii) `collex_TScore()` for T-Score, (iv) `collex_chisq()` for Chi-Square-based score, and (v) `collex_logOR()` for Log~10~ Odds Ratio). DCA is computed using `collex_fye_dca()`.

collogetr is built on top of the core packages in the [tidyverse](https://www.tidyverse.org).

## Installation

Install collogetr from GitHub with [devtools](https://github.com/hadley/devtools):

```{r eval = FALSE}
library(devtools)
install_github("gederajeg/collogetr")
```

## Usages

### Load collogetr
```{r load-collogetr, message = FALSE}
library(collogetr)
```

### Citation for collogetr

To cite collogetr in publication, type as follows:

```{r cite-collogetr, eval = TRUE}
citation("collogetr")
```

### Package data

The package has three data sets for demonstration. The important one is the `demo_corpus_leipzig` whose documentation can be accessed via `?demo_corpus_leipzig`. Another data is a list of Indonesian stopwords (i.e. `stopwords`) that can be filtered out when performing collocational measure. The last one is `leipzig_corpus_path` containing character vector of full path to my Leipzig Corpus files in my computer.

#### Accepted inputs

`colloc_leipzig()` accepts two types of corpus-input data:

1. A named-list object with character-vector elements of each Leipzig Corpus Files, represented by `demo_corpus_leipzig` and the format of which is shown below:

```{r leipzig_list_input}
lapply(demo_corpus_leipzig[1:2], sample, 2)
```

2. Full-paths to the Leipzig Corpus plain texts, as in the `leipzig_corpus_path`.

```{r leipzig_path_input}
leipzig_corpus_path[1:2]
```

In terms of the input strings for the `pattern` argument, `colloc_leipzig()` accepts three scenarios:

1. Plain string representing a whole word form, such as `"memberikan"` 'to give'

2. Regex of a whole word, such as `"^memberikan$"` 'to give'

3. Regex of a whole word with word boundary character (`\\b`), such as `"\\bmemberikan\\b"`.

All of these three forms will be used to match the exact word form of the search pattern after the corpus file is tokenised into individual words. That is, input patterns following scenario 1 or 3 will be turned into their exact search pattern represented in scenario 2 (i.e., with the beginning- and end-of-line anchors, hence `"^...$"`). So user can directly use the input pattern in scenario 2 for the `pattern` argument. If there are more than one word to be searched, put them into a character vector (e.g., `c("^memberi$", "^membawa$")`).

### Demo

#### Retrieving the collocates

The codes below show how one may retrieve the collocates for the Indonesian verb *mengatakan* 'to say sth.'. The function `colloc_leipzig()` will print out progress messages of the stages onto the console. It generates warning(s) when a search pattern or node word is not found in a corpus file or in all loaded corpus files.

```{r example, eval = TRUE, message = FALSE, warning = FALSE}
out <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig,
                       pattern = "mengatakan",
                       window = "r",
                       span = 1L,
                       save_interim = FALSE)
```

In the example above, the collocates are restricted to those occurring _one_ word (i.e. `span = 1L`) to the _right_ (`window = "r"`) of *mengatakan* 'to say'. The `"r"` character in `window` stands for *right*-side collocates (`"l"` for *left*-side collocates and `"b"` for *both* right- and left-side collocates). The `span` argument requires integer (i.e., a whole number) to indicate the range of words covered in the specified window. The `pattern` argument requires one or more exact word forms; if more than one, put into a character vector (e.g., `c("mengatakan", "menjanjikan")`). 

The `save_interim` is `FALSE` means that no output is saved into the computer, but in the console (i.e., in the `out` object). If `save_interim = TRUE`, the function will save the outputs into the files in the computer. `colloc_leipzig()` has specified the default file names for the outputs via these arguments: (i) `freqlist_output_file`, (ii) `colloc_output_file`, (iii) `corpussize_output_file`, and (iv) `search_pattern_output_file`. It is recommended that the output filenames are stored as a character vector. See **Examples** "(2)" in the documentation of `colloc_leipzig()` for a call when `save_interim = TRUE`.

#### Exploring the output of `colloc_leipzig()`.

The output of `colloc_leipzig()` is a list of `r length(out)` elements:

1. `colloc_df`; a table/tibble of raw collocates data with columns for:
    1. corpus names
    2. sentence id in which the collocates and the node word(s) are found
    3. the collocates (column `w`)
    5. the span information (e.g., `"r1"` for one-word, right-side collocates)
    6. the node word
    7. the text/sentence match in which the collocates and the node are found
2. `freqlist_df`; a table/tibble of word-frequency list in the loaded corpus
3. `corpussize_df`; a table/tibble of total word-tokens in the loaded corpus
4. `pattern`; a character vector of the search pattern/node

```{r colloc_output}
str(out)
```

The `freqlist_df` and `corpussize_df` are important for performing the collocational strength measure for the search pattern with the collocates.

#### Preparing input data for *Simple Collexeme/Collocational Analysis* (SCA).

First we need to call `assoc_prepare()` for generating the data SCA. The demo illustrates it with in-console output of `colloc_leipzig()`. See the **Examples** "2.2" in the documentation for `assoc_prepare()` for handling saved outputs (`?assoc_prepare()`).

```{r assoc-prepare}
assoc_tb <- assoc_prepare(colloc_out = out, 
                          window_span = "r1",
                          per_corpus = FALSE, # combine all data across corpus
                          stopword_list = collogetr::stopwords,
                          float_digits = 3L)
```

Inspect the output of `assoc_prepare()`:

```{r assoc-prepare-head}
head(assoc_tb)
```

The `assoc_prepare()` and `collex_fye()` functions are designed following the tidy principle so that the association/collocation measure is performed in a row-wise fashion, benefiting from the combination of [*nested* column](http://r4ds.had.co.nz/many-models.html#list-columns-1) [cf., @wickham_r_2017, p. 409] for the input-data (using `tidyr::nest()`) and `purrr`'s `map_*` function. `assoc_prepare()` includes calculating the expected co-occurrence frequencies between the collocates/collexemes and the node word/construction. 

The column `data` in `assoc_tb` above consists of nested tibble/table as a list. Each contains required data for performing association measure for each of the collocates in column `w` [@gries_more_2015; @gries_50-something_2013; @stefanowitsch_corpora_2009; @stefanowitsch_collostructions_2003]. This nested column can be inspected as follows (for the first row, namely for the word *pihaknya* 'the party').

```{r inspect-nested-column}
# get the tibble in the `data` column for the first row
assoc_tb$data[[1]]
```

Column `a` indicates the co-occurrence frequency between the node word and the collocates column `w`, meanwhile `a_exp` indicates the *expected co-occurrence frequency* between them. The `n_w_in_corp` represents the total token/occurrence frequency of a given collocate. The `n_pattern` stores the total token/occurrence frequency of the node word in the corpus. Column `b`, `c`, and `d` are required for the association measure that is essentially based on 2-by-2 crosstabulation table. The `assoc` column indicates whether the value in `a` is higher than that in `a_exp`, thus indicating *attraction* or *positive association* between the node word and the collocate. The reverse is *repulsion* or *negative association* when the value in `a` is less/lower than that in `a_exp`.

#### *Simple Collexeme/Collocates Analysis (SCA)*

As in the *Collostructional Analysis* [@stefanowitsch_collostructions_2003], `collex_fye()` uses one-tailed *Fisher-Yates Exact* test whose *p*-~FisherExact~value is log-transformed to the base of 10 to indicate the collostruction strength between the collocates and the node word [@gries_converging_2005]. `collex_fye()` simultaneously performs two uni-directional measures of *Delta P* [@gries_50-something_2013; @gries_more_2015, p. 524]. One of these shows the extent to which the presence of the node-word cues the presence of the collocates/collexemes; the other one determines the extent to which the collocates/collexemes cues the presence of the node-word.

Here is the codes to perform the SCA using `collex_fye()`:

```{r perform-fye}
# perform FYE test for Collexeme Analysis
am_fye <- collex_fye(df = assoc_tb, collstr_digit = 3)
```

Now we can retrieve the top-10 most strongly attracted collocates to *mengatakan* 'to say sth.'. The association strength is shown in the `collstr` column, which stands for *collostruction strength*. The higher, the stronger the association.

```{r attracted-collocates}
# get the top-10 most strongly attracted collocates
dplyr::top_n(am_fye, 10, collstr)
```

Column `a` contains the co-occurrence frequency of the collocates (`w`) with the `node` as its R1 collocates in the demo corpus. `p_fye` shows the one-tailed *p*~FisherExact~-value.

#### *Distinctive Collexeme/Collocate Analysis* (DCA)

The idea of distinctive collexemes/collocates is to contrast *two* functionally/semantically similar constructions or words in terms of the collocates that are (significantly) more frequent for one of the two contrasted constructions/words [@hilpert_distinctive_2006; see @gries_extending_2004]. `colloc_leipzig()` can be used to retrieve collocates of *two* functionally/semantically similar words by specifying the `pattern` argument with two character vectors of words. 

The following example use one of the Leipzig corpus files (not included in the package but can be downloaded from the Leipzig Corpora webpage for free), namely the `"ind_mixed_2012_1M-sentences"`. The aim is to contrast collocational preferences of two deadjectival transitive verbs based on the root *kuat* 'strong' framed within two causative morphological schemas: one with *per-*+ADJ and the other with ADJ+*-kan*. Theoretically, the *per-* schema indicates that the direct object of the verb is caused to have *more* of the characteristic indicated by the adjectival root, meanwhile the *-kan* schema indicates that the direct object is caused to have the characteristic indicated by the root (that is not previously had). The focus here is on the R1 collocates of the verbs (i.e. one word immediately to the right of the verbs in the sentences).

```{r dca-data-retrieval, message = FALSE, warning = FALSE, eval = FALSE}
my_leipzig_path <- collogetr::leipzig_corpus_path[1]
dca_coll <- collogetr::colloc_leipzig(leipzig_path = my_leipzig_path, 
                                     pattern = c("memperkuat", "menguatkan"),
                                     window = "r",
                                     span = 1,
                                     save_interim = FALSE)
```

Then, we prepare the output into the format required for performing DCA with `collex_fye_dca()`.

```{r load-dca-coll, echo = FALSE}
dca_coll <- collogetr::dca_coll
```


```{r prepare-dca-input}
assoc_tb <- assoc_prepare(colloc_out = dca_coll,
                          window_span = "r1",
                          per_corpus = FALSE,
                          stopword_list = collogetr::stopwords,
                          float_digits = 3L)

# prepare the dca input table
dca_tb <- assoc_prepare_dca(assoc_tb)
```

Compute DCA for the two verbs and view the results snippet.

```{r dca-compute}
dca_res <- collex_fye_dca(dca_tb)
head(dca_res, 10)
```

The package also includes a function called `dca_top_collex()` to retrieve the top-n distinctive collocates for one of the two contrasted words. The `dist_for` argument can be specified by either the character vector of the name of the contrasted words, or the character IDs of the constructions/words (e.g., `..., dist_for = "a", ...` or `..., dist_for = "A", ...` for construction/word appearing in the second column from the output of `collex_fye_dca()`; `..., dist_for = "b", ...` or `..., dist_for = "B", ...` for construction/word appearing in the third column).

```{r top-collex-a}
# retrieve distinctive collocates for Construction A (i.e., memperkuat)
dist_for_a <- dca_top_collex(dca_res, dist_for = "memperkuat", top_n = 10)
head(dist_for_a)
```

The codes below retrieve the distinctive collocates for *menguatkan* 'to strengthen' or Construction B.

```{r top-collex-b}
# retrieve distinctive collocates for Construction B (i.e., menguatkan)
dist_for_b <- dca_top_collex(dca_res, dist_for = "menguatkan", top_n = 10)
head(dist_for_b)
```


### Session info
```{r sessinfo}
devtools::session_info()
```


### References

Owner

  • Name: Gede Primahadi Wijaya Rajeg
  • Login: gederajeg
  • Kind: user
  • Location: Bali, Indonesia
  • Company: Universitas Udayana

I am interested in cognitive linguistics, corpus linguistics, and construction grammar. I use R for all things data science in linguistics.

Citation (citation_README.bib)

@incollection{hoffmann_collostructional_2013,
  location = {{Oxford}},
  title = {Collostructional Analysis},
  isbn = {978-0-19-539668-3},
  timestamp = {2017-08-13T19:46:07Z},
  booktitle = {The {{Oxford}} Handbook of {{Construction Grammar}}},
  series = {Oxford Handbooks Online},
  publisher = {{Oxford University Press}},
  author = {Stefanowitsch, Anatol},
  editor = {Hoffmann, Thomas and Trousdale, Graeme},
  date = {2013},
  pages = {290--306},
  keywords = {cognitive grammar,cognitive linguistics,construction grammar},
  doi = {10.1093/oxfordhb/9780195396683.013.0016}
}

@article{hilpert_distinctive_2006,
  title = {Distinctive Collexeme Analysis and Diachrony},
  volume = {2},
  timestamp = {2013-09-04T12:15:35Z},
  number = {2},
  journaltitle = {Corpus Linguistics and Linguistic Theory},
  author = {Hilpert, Martin},
  date = {2006},
  pages = {243--256},
  keywords = {collocation,collostruction,construction grammar,corpus linguistics,diachronic linguistics,quantitative corpus linguistics}
}

@article{gries_extending_2004,
  title = {Extending Collostructional Analysis: {{A}} Corpus-Based Perspective on 'Alternations'},
  volume = {9},
  number = {1},
  journaltitle = {International Journal of Corpus Linguistics},
  date = {2004},
  pages = {97-129},
  author = {Gries, Stefan Th. and Stefanowitsch, Anatol},
  file = {/Users/Primahadi/Documents/_Zotero Digital Library/Journal Article/Gries_Stefanowitsch_2004_Extending collostructional analysis.pdf}
}

@article{stefanowitsch_collostructions_2003,
  title = {Collostructions: {{Investigating}} the Interaction of Words and Constructions},
  volume = {8},
  number = {2},
  journaltitle = {International Journal of Corpus Linguistics},
  date = {2003},
  pages = {209-243},
  author = {Stefanowitsch, Anatol and Gries, Stefan Th.},
  file = {/Users/Primahadi/Documents/_Zotero Digital Library/Journal Article/Stefanowitsch_Gries_2003_Collostructions.pdf}
}

@incollection{stefanowitsch_corpora_2009,
  location = {{Berlin}},
  title = {Corpora and Grammar},
  volume = {2},
  booktitle = {Corpus Linguistics: {{An}} International Handbook},
  publisher = {{Mouton de Gruyter}},
  date = {2009},
  pages = {933-951},
  author = {Stefanowitsch, Anatol and Gries, Stefan Th.},
  editor = {Lüdeling, Anke and Kytö, Merja}
}

@article{gries_more_2015,
  title = {More (Old and New) Misunderstandings of Collostructional Analysis: {{On Schmid}} and {{Küchenhoff}} (2013)},
  volume = {26},
  issn = {09365907},
  doi = {10.1515/cog-2014-0092},
  shorttitle = {More (Old and New) Misunderstandings of Collostructional Analysis},
  abstract = {Ever since the first studies introducing collostructional analysis in general and collexeme analysis in particular, these methods have been widely used for the analysis of constructions' semantic and functional characteristics. However, the more recent past has seen two publications, and , which criticized several aspects of these methods. This paper briefly recaps my response to (published as ) as a prelude to its main contribution, viz a rebuttal of various claims and problems of .},
  number = {3},
  journaltitle = {Cognitive Linguistics},
  shortjournal = {Cognitive Linguistics},
  date = {2015-08},
  pages = {505-536},
  keywords = {LINGUISTIC analysis (Linguistics),PSYCHOLINGUISTICS,SEMANTICS,construction grammar,FISHER exact test,association measures,attraction and reliance,collexeme analysis,Fisher-Yates exact test,collostructional analysis,Construction grammar,Semantics},
  author = {Gries, Stefan Th.},
  file = {/Users/Primahadi/Documents/_Zotero Digital Library/Journal Article/Gries_2015_More (old and new) misunderstandings of collostructional analysis.pdf}
}

@article{gries_50-something_2013,
  langid = {english},
  title = {50-Something Years of Work on Collocations: {{What}} Is or Should Be next …},
  volume = {18},
  issn = {1384-6655, 1569-9811},
  doi = {10.1075/ijcl.18.1.09gri},
  shorttitle = {50-Something Years of Work on Collocations},
  number = {1},
  journaltitle = {International Journal of Corpus Linguistics},
  date = {2013-05-23},
  pages = {137-166},
  author = {Gries, Stefan Th.}
}

@article{gries_converging_2005,
  title = {Converging Evidence: Bringing Together Experimental and Corpus Data on the Association of Verbs and Constructions},
  volume = {16},
  number = {4},
  journaltitle = {Cognitive Linguistics},
  date = {2005},
  pages = {635-676},
  author = {Gries, Stefan Th. and Hampe, Beate and Schönefeld, Doris},
  file = {/Users/Primahadi/Documents/_Zotero Digital Library/Journal Article/Gries et al_2005_Converging evidence.pdf}
}

@book{wickham_r_2017,
  location = {{Canada}},
  title = {R for {{Data Science}}},
  url = {http://r4ds.had.co.nz/},
  abstract = {This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.},
  publisher = {{O'Reilly}},
  urldate = {2017-03-07},
  date = {2017},
  author = {Wickham, Hadley and Grolemund, Garrett},
  file = {/Users/Primahadi/Documents/_Zotero Digital Library/Book/Wickham_Grolemund_2017_R for Data Science.pdf}
}

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.6 depends
  • assertthat >= 0.2.1 imports
  • dplyr >= 0.8.5 imports
  • lifecycle >= 0.2.0 imports
  • purrr >= 0.3.3 imports
  • readr >= 1.3.1 imports
  • rlang >= 0.4.3 imports
  • stringi >= 1.4.5 imports
  • stringr >= 1.4.0 imports
  • tibble >= 2.1.3 imports
  • tidyr >= 1.0.2 imports
  • covr * suggests
  • testthat >= 2.1.0 suggests