glossarywho

Tidy data from the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)

https://github.com/openwashdata/glossarywho

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Tidy data from the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 2
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.Rmd

---
output: github_document
always_allow_html: true
editor_options: 
  markdown: 
    wrap: 72
  chunk_output_type: console
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  message = FALSE,
  warning = FALSE,
  fig.retina = 2,
  fig.align = 'center'
)
```

# glossarywho



[![License: CC BY
4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)



The goal of glossarywho is to provide data from the [WHO Glossary](https://www.who.int/publications/i/item/9789240105485) in a tidy format. Access definitions by themes [here](https://openwashdata.github.io/glossarywho/articles/Themes.html) or use the search bar at the top of the page.

## Installation

You can install the development version of glossarywho from
[GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("openwashdata/glossarywho")
```

```{r}
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
```

Alternatively, you can download the individual datasets as a CSV or XLSX
file from the table below.

1.  Click Download CSV. A window opens that displays the CSV in
    your browser.
2.  Right-click anywhere inside the window and select "Save Page As...".
3.  Save the file in a folder of your choice.

```{r, echo=FALSE, message=FALSE, warning=FALSE}

extdata_path <- "https://github.com/openwashdata/glossarywho/raw/main/inst/extdata/"

read_csv("data-raw/dictionary.csv") |> 
  distinct(file_name) |> 
  dplyr::mutate(file_name = str_remove(file_name, ".rda")) |> 
  dplyr::rename(dataset = file_name) |> 
  mutate(
    CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),
    XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")
  ) |> 
  knitr::kable()

```

## Data

The package provides access to glossary terms, definitions and thematic areas from the WHO Glossary. The datasets are: themes and definitions.

```{r}
library(glossarywho)
```

### definitions

The dataset `definitions` contains data about definitions from the WHO glossary It has
`r nrow(definitions)` observations and `r ncol(definitions)`
variables

```{r}
definitions |> 
  head(3) |> 
  gt::gt() |>
  gt::as_raw_html()
```

### themes

The dataset `themes` contains data about thematic areas from the WHO glossary. It has
`r nrow(themes)` observations and `r ncol(themes)`
variables

```{r}
themes |> 
  head(3) |> 
  gt::gt() |>
  gt::as_raw_html()
```


For an overview of the variable names, see the following table.

```{r echo=FALSE, message=FALSE, warning=FALSE}
readr::read_csv("data-raw/dictionary.csv") |>
  dplyr::filter(file_name == "definitions.rda") |>
  dplyr::select(variable_name:description) |> 
  knitr::kable() |> 
  kableExtra::kable_styling("striped") |> 
  kableExtra::scroll_box(height = "200px")
```

## Example

```{r}
library(glossarywho)
library(ggplot2)
library(tidyverse)
# Plot a bar chart of count of definitions by thematic areas 
themes |> 
  count(`Thematic Area`) |> 
  ggplot2::ggplot(aes(x = fct_reorder(`Thematic Area`, n), y = n)) +
  geom_col(fill = "skyblue") +
  coord_flip() +
  labs(title = "Count of definitions by thematic areas",
       x = "Count",
       y = "Thematic area") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 8)) +
  theme(panel.grid.major.y = element_blank(),
         panel.grid.minor.y = element_blank())

```

```{r}
# Wordcloud of most common words from definitions
library(wordcloud)
library(tm)

# Create a corpus
corpus <- Corpus(VectorSource(definitions$Description))

# Clean the corpus
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
corpus <- tm_map(corpus, stripWhitespace)

# Create a document term matrix
dtm <- DocumentTermMatrix(corpus)

# Create a wordcloud
wordcloud(words = names(sort(colSums(as.matrix(dtm)), decreasing = TRUE)),
          freq = colSums(as.matrix(dtm)),
          min.freq = 1,
          max.words = 100,
          random.order = FALSE,
          colors = brewer.pal(8, "Dark2"))
```


## License

Data are available as
[CC-BY](https://github.com/openwashdata/%7B%7B%7Bpackagename%7D%7D%7D/blob/main/LICENSE.md).

## Citation

Please cite this package using:

```{r}
citation("glossarywho")
```

Owner

  • Name: openwashdata
  • Login: openwashdata
  • Kind: organization

Citation (CITATION.cff)

# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "glossarywho" in publications use:'
type: software
license: CC-BY-4.0
title: 'glossarywho: WHO Glossary'
version: 0.1.0
doi: 10.5281/zenodo.14754017
abstract: This package provides access to a tidy version of the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)
  and thematic areas for each term.
authors:
- family-names: Dubey
  given-names: Yash
  email: ydubey@ethz.ch
  orcid: https://orcid.org/0009-0001-2849-970X
repository-code: https://github.com/openwashdata/glossarywho
url: https://github.com/openwashdata/glossarywho
date-released: '2025-01-28'
contact:
- family-names: Dubey
  given-names: Yash
  email: ydubey@ethz.ch
  orcid: https://orcid.org/0009-0001-2849-970X

GitHub Events

Total
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Public event: 1
  • Push event: 3
Last Year
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Public event: 1
  • Push event: 3

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v4 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 3.5 depends