sehrnett

📖 A Very Nice Interface To Princeton's WordNet

https://github.com/chainsawriot/sehrnett

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • â—‹
    .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • â—‹
    Committers with academic emails
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (18.3%) to scientific vocabulary

Keywords

r r-package rstats wordnet
Last synced: 9 months ago · JSON representation

Repository

📖 A Very Nice Interface To Princeton's WordNet

Basic Info
  • Host: GitHub
  • Owner: chainsawriot
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 188 KB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 1
Topics
r r-package rstats wordnet
Created over 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
devtools::load_all()
```

# sehrnett 


[![R-CMD-check](https://github.com/chainsawriot/sehrnett/workflows/R-CMD-check/badge.svg)](https://github.com/chainsawriot/sehrnett/actions)
[![Codecov test coverage](https://codecov.io/gh/chainsawriot/sehrnett/branch/master/graph/badge.svg)](https://app.codecov.io/gh/chainsawriot/sehrnett?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/sehrnett)](https://CRAN.R-project.org/package=sehrnett)
[![R-CMD-check](https://github.com/chainsawriot/sehrnett/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/chainsawriot/sehrnett/actions/workflows/R-CMD-check.yaml)


The goal of sehrnett is to provide a nice (and fast) interface to [Princeton's WordNet](https://wordnet.princeton.edu/). Unlike the original [wordnet package](https://cran.r-project.org/package=wordnet) (Feinerer et al., 2020), you don't need to install WordNet and / or setup rJava.

The data is not included in the package. Please run `download_wordnet()` to download the data (~100M Zipped, ~400M Unzipped) from the Internet, if such data is not available. Please make sure you agree with the [WordNet License](https://wordnet.princeton.edu/license-and-commercial-use).

## Installation

``` r
devtools::install_github("chainsawriot/sehrnett")
```

## `get_lemmas`

The most basic function is `get_lemmas`. It generates basic information about the lemmas [^1] you provided.

```r
library(sehrnett)
```

```{r}
get_lemmas(c("very", "nice"))
```

```{r}
get_lemmas("nice")
```

```{r}
get_lemmas("nice", pos = "n")
```

Please note that some definitions in WordNet are considered pejorative or offensive, e.g. 

```{r}
get_lemmas("dog")
```

### Dot notation

The dot notation ("lemma.pos.sensenum") can be used to quick search for a particular word sense. For example, one can search for "king.n.10" to quickly pin down the word sense of "king" as a chess piece.

```{r}
get_lemmas("king.n.10")
```

### Lemmatization

The [morphological processing](https://wordnet.princeton.edu/documentation/morphy7wn) of the original Wordnet is partially implemented in `sehrnett` [^2]. As the Wordnet's database contains only information about lemmas (e.g. *eat*), you need to convert inflected variants (e.g. *ate*, *eaten*, *eating*) back to their lemmas to query them. The process is otherwise known as [lemmatization](https://en.wikipedia.org/wiki/Lemmatisation).

`sehrnett` provides such lemmatization. But you need to provide exactly one `pos` and set `lemmatize` to `TRUE` (default).

```{r}
get_lemmas(c("ate", "ducking"), pos = "v")
```

```{r}
get_lemmas(c("loci", "lemmata", "boxesful"), pos = "n")
```

```{r}
get_lemmas(c("nicest", "stronger"), pos = "a")
```


## A practical example

For example, you want to know the synonyms of the word "nuance" (very important for academic writing). You can first search using the lemma "nuance" with `get_lemmas`.

```{r}
res <- get_lemmas("nuance")
res
```

There could be multiple word senses and you need to choose which word sense you want to convey. But in this case, there is only one. You can then search for the `synsetid` (cognitive synonym identifier) of that word sense.

```{r}
# get_synonyms() is a wrapper to get_synsetids
get_synsetids(res$synsetid[1])
```

## Chainablilty

All `get_` functions are chainable by using the magrittr pipe operator.

```{r}
c("switch off") %>% get_lemmas(pos = "v") %>% get_synonyms
```

## `get_outdegrees`

WordNet is indeed a network. synsetids are connected to each other in a directed graph. An node (a synsetid) is linked to another with different link (edge) types labelling with different `linkid`s. You can list out all available `linkid`s with the function `list_linktypes`.

```{r}
list_linktypes()
```

```{r}
## all hypernyms
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 1)
```

```{r}
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 2)
```

```{r}
## all antonyms
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_outdegrees(linkid = 30)
```

### Sugars

`sehrnett` provides several syntactic sugars as `get_` functions. For example:

```{r}
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_hyponyms()
```

```{r}
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_antonyms()
```

```{r}
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_derivatives()
```

---

[^1]: Yes, the plural of *lemma* can also be *lemmata*, you Latin-speaking people. 

[^2]: Like many implementations (e.g. NLTK, Ruby's rwordnet and node-wordnet-magic), the morpological processing is only partial. Collocations and hyphenation are not supported. Therefore, please don't expect that lemmatizing *asking for it* would obtain *ask for it* (as documented in Wordnet's website).

Owner

  • Login: chainsawriot
  • Kind: user
  • Location: Germany
  • Company: @gesistsa

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 32
  • Total Committers: 1
  • Avg Commits per committer: 32.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 7
  • Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
chainsawriot c****y@g****m 32

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 2
  • Total pull requests: 1
  • Average time to close issues: 1 day
  • Average time to close pull requests: 24 minutes
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • chainsawriot (2)
Pull Request Authors
  • chainsawriot (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 226 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
cran.r-project.org: sehrnett

A Very Nice Interface to 'WordNet'

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 226 Last month
Rankings
Forks count: 21.9%
Stargazers count: 22.5%
Dependent packages count: 29.8%
Average: 30.0%
Dependent repos count: 35.5%
Downloads: 40.2%
Maintainers (1)
Last synced: 9 months ago

Dependencies

DESCRIPTION cran
  • DBI * imports
  • RSQLite * imports
  • dplyr * imports
  • magrittr * imports
  • purrr * imports
  • tibble * imports
  • utils * imports
  • covr * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite