sweater

sweater: Speedy Word Embedding Association Test and Extras Using R - Published in JOSS (2022)

https://github.com/gesistsa/sweater

Keywords

bias-detection r textanalysis wordembedding

Scientific Fields

Mathematics Computer Science - 38% confidence

Last synced: 6 months ago · JSON representation ·

Repository

👚 Speedy Word Embedding Association Test & Extras using R

Basic Info

Host: GitHub
Owner: gesistsa
License: gpl-3.0
Language: R
Default Branch: v0.1
Homepage:
Size: 23.4 MB

Statistics

Stars: 30
Watchers: 3
Forks: 6
Open Issues: 6
Releases: 2

Topics

bias-detection r textanalysis wordembedding

Created about 5 years ago · Last pushed 6 months ago

Metadata Files

Readme License Code of conduct Citation

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
  )
set.seed(46709394)
devtools::load_all()
```

# sweater 


[![CRAN status](https://www.r-pkg.org/badges/version/sweater)](https://CRAN.R-project.org/package=sweater)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.04036/status.svg)](https://doi.org/10.21105/joss.04036)
[![R-CMD-check](https://github.com/gesistsa/sweater/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/sweater/actions/workflows/R-CMD-check.yaml)


The goal of sweater (**S**peedy **W**ord **E**mbedding **A**ssociation **T**est & **E**xtras using **R**) is to test for associations among words in word embedding spaces. The methods provided by this package can also be used to test for unwanted associations, or biases.

The package provides functions that are speedy. They are either implemented in C++, or are speedy but accurate approximation of the original implementation proposed by Caliskan et al (2017). See the benchmark [here](https://github.com/gesistsa/sweater/blob/master/paper/benchmark.md).

This package provides extra methods such as Relative Norm Distance, Embedding Coherence Test, SemAxis and Relative Negative Sentiment Bias.

If your goal is to reproduce the analysis in Caliskan et al (2017), please consider using the [original Java program](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DX4VWP&version=2.0) or the R package [cbn](https://github.com/conjugateprior/cbn) by Lowe. To reproduce the analysis in Garg et al (2018), please consider using the [original Python program](https://github.com/nikhgarg/EmbeddingDynamicStereotypes). To reproduce the analysis in Mazini et al (2019), please consider using the [original Python program](https://github.com/TManzini/DebiasMulticlassWordEmbedding/).

Please cite this software as:

Chan, C., (2022). sweater: Speedy Word Embedding Association Test and Extras Using R. Journal of Open Source Software, 7(72), 4036, https://doi.org/10.21105/joss.04036

For a BibTeX entry, use the output from `citation(package = "sweater")`.

## Installation

Recommended: install the latest development version

``` r
remotes::install_github("gesistsa/sweater")
```

or the "stable" release

```r
install.packages("sweater")
```

## Notation of a query

All tests in this package use the concept of queries (see Badilla et al., 2020) to study associations in the input word embeddings `w`. This package uses the "STAB" notation from Brunet et al (2019). [^1]

[^1]: In the pre 0.1.0 version of this package, the package used `S`, `T`, `A`, and `B` as the main parameters. It was later rejected because the symbol `T` is hardlinked to the logical value `TRUE` [as a global variable](https://stat.ethz.ch/R-manual/R-devel/library/base/html/logical.html); and it is considered to be a [bad style](https://style.tidyverse.org/syntax.html) to use the symbol `T`. Accordingly, they were renamed to `S_words`, `T_words`, `A_words`, and `B_words` respectively. But in general, please stop using the symbol `T` to represent `TRUE`!

All tests depend on two types of words. The first type, namely, `S_words` and `T_words`, is *target words* (or *neutral words* in Garg et al). In the case of studying biases, these are words that **should** have no bias. For instance, the words such as "nurse" and "professor" can be used as target words to study the gender bias in word embeddings. One can also separate these words into two sets, `S_words` and `T_words`, to group words by their perceived bias. For example, Caliskan et al. (2017) grouped target words into two groups: mathematics ("math", "algebra", "geometry", "calculus", "equations", "computation", "numbers", "addition") and arts ("poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture"). Please note that also `T_words` is not always required.

The second type, namely `A_words` and `B_words`, is *attribute words* (or *group words* in Garg et al). These are words with known properties in relation to the bias that one is studying. For example, Caliskan et al. (2017) used gender-related words such as "male", "man", "boy", "brother", "he", "him", "his", "son" to study gender bias. These words qualify as attribute words because we know they are related to a certain gender.

It is recommended using the function `query()` to make a query and `calculate_es()` to calculate the effect size.

## Available methods

| Target words     | Attribute words  | Method                                                      | `method` argument | Suggested by `query`? | legacy functions [^legacy]                         |
|------------------|------------------|-------------------------------------------------------------|-------------------|-----------------------|----------------------------------------------------|
| S_words          | A_words          | Mean Average Cosine Similarity (Mazini et al. 2019)         | "mac"             | yes                   | mac(), mac_es()                                    |
| S_words          | A_words, B_words | Relative Norm Distance (Garg et al. 2018)                   | "rnd"             | yes                   | rnd(), rnd_es()                                    |
| S_words          | A_words, B_words | Relative Negative Sentiment Bias (Sweeney & Najafian. 2019) | "rnsb"            | no                    | rnsb(), rnsb_es()                                  |
| S_words          | A_words, B_words | Embedding Coherence Test (Dev & Phillips. 2019)             | "ect"             | no                    | ect(), ect_es(), plot_ect()                        |
| S_words          | A_words, B_words | SemAxis (An et al. 2018)                                    | "semaxis"         | no                    | semaxis()                                          |
| S_words          | A_words, B_words | Normalized Association Score (Caliskan et al. 2017)         | "nas"             | no                    | nas()                                              |
| S_words, T_words | A_words, B_words | Word Embedding Association Test (Caliskan et al. 2017)      | "weat"            | yes                   | weat(), weat_es(), weat_resampling(), weat_exact() |
| S_words, T_words | A_words, B_words | Word Embeddings Fairness Evaluation (Badilla et al. 2020)   | To be implemented |                       |                                                    |

## Example: Mean Average Cosine Similarity

The simplest form of bias detection is Mean Average Cosine Similarity (Mazini et al. 2019). The same method was used in Kroon et al. (2020). `googlenews` is a subset of [the pretrained word2vec word embeddings provided by Google](https://code.google.com/archive/p/word2vec/).

By default, the `query()` function guesses the method you want to use based on the combination of target words and attribute words provided (see the "Suggested?" column in the above table). You can also make this explicit by specifying the `method` argument. Printing the returned object shows the effect size (if available) as well as the functions that can further process the object: `calculate_es` and `plot`. Please read the help file of `calculate_es` (`?calculate_es`) on what is the meaning of the effect size for a specific test.

```{r, eval = FALSE}
require(sweater)
```

```{r mac_neg}
S1 <- c("janitor", "statistician", "midwife", "bailiff", "auctioneer",
"photographer", "geologist", "shoemaker", "athlete", "cashier",
"dancer", "housekeeper", "accountant", "physicist", "gardener",
"dentist", "weaver", "blacksmith", "psychologist", "supervisor",
"mathematician", "surveyor", "tailor", "designer", "economist",
"mechanic", "laborer", "postmaster", "broker", "chemist", "librarian",
"attendant", "clerical", "musician", "porter", "scientist", "carpenter",
"sailor", "instructor", "sheriff", "pilot", "inspector", "mason",
"baker", "administrator", "architect", "collector", "operator",
"surgeon", "driver", "painter", "conductor", "nurse", "cook",
"engineer", "retired", "sales", "lawyer", "clergy", "physician",
"farmer", "clerk", "manager", "guard", "artist", "smith", "official",
"police", "doctor", "professor", "student", "judge", "teacher",
"author", "secretary", "soldier")

A1 <- c("he", "son", "his", "him", "father", "man", "boy", "himself",
"male", "brother", "sons", "fathers", "men", "boys", "males",
"brothers", "uncle", "uncles", "nephew", "nephews")

## The same as:
## mac_neg <- query(googlenews, S_words = S1, A_words = A1, method = "mac")
mac_neg <- query(googlenews, S_words = S1, A_words = A1)
mac_neg
```

The returned object is an S3 object. Please refer to the help file of the method for the definition of all slots (in this case: `?mac`). For example, the magnitude of bias for each word in `S1` is available in the `P` slot.

```{r mac_neg2}
sort(mac_neg$P)
```

## Example: Relative Norm Distance

This analysis reproduces the analysis in Garg et al (2018), namely Figure 1.

```{r}
B1 <- c("she", "daughter", "hers", "her", "mother", "woman", "girl",
"herself", "female", "sister", "daughters", "mothers", "women",
"girls", "females", "sisters", "aunt", "aunts", "niece", "nieces"
)

garg_f1 <- query(googlenews, S_words = S1, A_words = A1, B_words = B1)
garg_f1
```

The object can be plotted by the function `plot` to show the bias of each word in S. Words such as "nurse", "midwife" and "librarian" are more associated with female, as indicated by the positive relative norm distance.

```{r rndplot, fig.height = 12}
plot(garg_f1)
```

The effect size is simply the sum of all relative norm distance values (Equation 3 in Garg et al. 2018). It is displayed simply by printing the object. You can also use the function `calculate_es` to obtain the numeric result.

The more positive effect size indicates that words in `S_words` are more associated with `B_words`. As the effect size is negative, it indicates that the concept of occupation is more associated with `A_words`, i.e. male.

```{r}
calculate_es(garg_f1)
```

## Example: SemAxis

This analysis attempts to reproduce the analysis in An et al. (2018).

You may obtain the word2vec word vectors trained with Trump supporters Reddit from [here](https://github.com/ghdi6758/SemAxis). This package provides a tiny version of the data `small_reddit` for reproducing the analysis.

```{r semxaxisplot}
S2 <- c("mexicans", "asians", "whites", "blacks", "latinos")
A2 <- c("respect")
B2 <- c("disrespect")
res <- query(small_reddit, S_words = S2, A_words = A2, B_words = B2, method = "semaxis", l = 1)
plot(res)
```

## Example: Embedding Coherence Test

Embedding Coherence Test (Dev & Phillips, 2019) is similar to SemAxis. The only significant difference is that no "SemAxis" is calculated (the difference between the average word vectors of `A_words` and `B_words`). Instead, it calculates two separate axes for `A_words` and `B_words`. Then it calculates the proximity of each word in `S_words` with the two axes. It is like doing two separate `mac`, but `ect` averages the word vectors of `A_words` / `B_words` first.

It is important to note that `P` is a 2-D matrix. Hence, the plot is 2-dimensional. Words above the equality line are more associated with `B_words` and vice versa.

```{r ectplot}
res <- query(googlenews, S_words = S1, A_words = A1, B_words = B1, method = "ect")
res$P
plot(res)
```

Effect size can also be calculated. It is the Spearman Correlation Coefficient of the two rows in `P`. Higher value indicates more "coherent", i.e. less bias.

```{r}
res
```


## Example: Relative Negative Sentiment Bias

This analysis attempts to reproduce the analysis in Sweeney & Najafian (2019).

Please note that the datasets `glove_sweeney`, `bing_pos` and `bing_neg` are not included in the package. If you are interested in reproducing the analysis, the 3 datasets are available from [here](https://github.com/gesistsa/sweater/tree/master/tests/testdata).

```{r}
load("tests/testdata/bing_neg.rda")
load("tests/testdata/bing_pos.rda")
load("tests/testdata/glove_sweeney.rda")

S3 <- c("swedish", "irish", "mexican", "chinese", "filipino",
        "german", "english", "french", "norwegian", "american",
        "indian", "dutch", "russian", "scottish", "italian")
sn <- query(glove_sweeney, S_words = S3, A_words = bing_pos, B_words = bing_neg, method = "rnsb")
```

The analysis shows that `indian`, `mexican`, and `russian` are more likely to be associated with negative sentiment.

```{r rnsbplot}
plot(sn)
```

The effect size from the analysis is the Kullback–Leibler divergence of P from the uniform distribution. It is extremely close to the value reported in the original paper (0.6225).

```{r}
sn
```

## Support for Quanteda Dictionaries

`rnsb` supports [quanteda](https://github.com/quanteda/quanteda) dictionaries as `S_words`. This support will be expanded to other methods later.

This analysis uses the data from [here](https://github.com/gesistsa/sweater/tree/master/tests/testdata).

For example, `newsmap_europe` is an abridged dictionary from the package newsmap (Watanabe, 2018). The dictionary contains keywords of European countries and has two levels: regional level (e.g. Eastern Europe) and country level (e.g. Germany).

```{r}
load("tests/testdata/newsmap_europe.rda")
load("tests/testdata/dictionary_demo.rda")

require(quanteda)
newsmap_europe
```

Country-level analysis

```{r rnsb2, fig.height = 10}
country_level <- rnsb(w = dictionary_demo, S_words = newsmap_europe, A_words = bing_pos, B_words = bing_neg, levels = 2)
plot(country_level)
```

Region-level analysis

```{r rnsb3}
region_level <- rnsb(w = dictionary_demo, S_words = newsmap_europe, A_words = bing_pos, B_words = bing_neg, levels = 1)
plot(region_level)
```

Comparison of the two effect sizes. Please note the much smaller effect size from region-level analysis. It reflects the evener distribution of P across regions than across countries.

```{r}
calculate_es(country_level)
calculate_es(region_level)
```

## Example: Normalized Association Score

Normalized Association Score (Caliskan et al., 2017) is similar to Relative Norm Distance above. It was used in Müller et al. (2023).

```{r nasplot, fig.height = 12}
S3 <- c("janitor", "statistician", "midwife", "bailiff", "auctioneer",
"photographer", "geologist", "shoemaker", "athlete", "cashier",
"dancer", "housekeeper", "accountant", "physicist", "gardener",
"dentist", "weaver", "blacksmith", "psychologist", "supervisor",
"mathematician", "surveyor", "tailor", "designer", "economist",
"mechanic", "laborer", "postmaster", "broker", "chemist", "librarian",
"attendant", "clerical", "musician", "porter", "scientist", "carpenter",
"sailor", "instructor", "sheriff", "pilot", "inspector", "mason",
"baker", "administrator", "architect", "collector", "operator",
"surgeon", "driver", "painter", "conductor", "nurse", "cook",
"engineer", "retired", "sales", "lawyer", "clergy", "physician",
"farmer", "clerk", "manager", "guard", "artist", "smith", "official",
"police", "doctor", "professor", "student", "judge", "teacher",
"author", "secretary", "soldier")
A3 <- c("he", "son", "his", "him", "father", "man", "boy", "himself",
"male", "brother", "sons", "fathers", "men", "boys", "males",
"brothers", "uncle", "uncles", "nephew", "nephews")
B3 <- c("she", "daughter", "hers", "her", "mother", "woman", "girl",
"herself", "female", "sister", "daughters", "mothers", "women",
"girls", "females", "sisters", "aunt", "aunts", "niece", "nieces"
)

nas_f1 <- query(googlenews, S_words= S3, A_words = A3, B_words = B3, method = "nas")
plot(nas_f1)
```

There is a very strong correlation between NAS and RND.

```{r}
cor.test(nas_f1$P, garg_f1$P)
```

## Example: Word Embedding Association Test

This example reproduces the detection of "Math. vs Arts" gender bias in Caliskan et al (2017).

```{r maths}
data(glove_math) # a subset of the original GLoVE word vectors

S4 <- c("math", "algebra", "geometry", "calculus", "equations", "computation", "numbers", "addition")
T4 <- c("poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture")
A4 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B4 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")
sw <- query(glove_math, S4, T4, A4, B4)

# extraction of effect size
sw
```

## A note about the effect size

By default, the effect size from the function `weat_es` is adjusted by the pooled standard deviaion (see Page 2 of Caliskan et al. 2007). The standardized effect size can be interpreted the way as Cohen's d (Cohen, 1988).

One can also get the unstandardized version (aka. test statistic in the original paper):

```{r}
## weat_es
calculate_es(sw, standardize = FALSE)
```

The original implementation assumes equal size of `S` and `T`. This assumption can be relaxed by pooling the standard deviaion with sample size adjustment. The function `weat_es` does it when `S` and `T` are of different length.

Also, the effect size can be converted to point-biserial correlation (mathematically equivalent to the Pearson's product moment correlation).

```{r}
weat_es(sw, r = TRUE)
```

## Exact test

The exact test described in Caliskan et al. (2017) is also available. But it takes a long time to calculate.

```r
## Don't do it. It takes a long time and is almost always significant.
weat_exact(sw)
```

Instead, please use the resampling approximation of the exact test. The p-value is very close to the reported 0.018.

```{r}
weat_resampling(sw)
```

## How to get help

* Read the documentation
* Search for [issues](https://github.com/gesistsa/sweater/issues)

## Contributing

Contributions in the form of feedback, comments, code, and bug report are welcome.

* Fork the source code, modify, and issue a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork).
* Issues, bug reports: [File a Github issue](https://github.com/gesistsa/sweater/issues).

## Code of Conduct

Please note that the sweater project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

## References

1. An, J., Kwak, H., & Ahn, Y. Y. (2018). SemAxis: A lightweight framework to characterize domain-specific word semantics beyond sentiment. arXiv preprint arXiv:1806.05521.
2. Badilla, P., Bravo-Marquez, F., & Pérez, J. (2020). WEFE: The word embeddings fairness evaluation framework. In Proceedings of the 29 th Intern. Joint Conf. Artificial Intelligence.
3. Brunet, M. E., Alkalay-Houlihan, C., Anderson, A., & Zemel, R. (2019, May). Understanding the origins of bias in word embeddings. In International Conference on Machine Learning (pp. 803-811). PMLR.
4. Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. "Semantics derived automatically from language corpora contain human-like biases." Science 356.6334 (2017): 183-186.
5. Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences, 2nd Edition. Hillsdale: Lawrence Erlbaum.
6. Dev, S., & Phillips, J. (2019, April). Attenuating bias in word vectors. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 879-887). PMLR.
7. Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.
8. Manzini, T., Lim, Y. C., Tsvetkov, Y., & Black, A. W. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. arXiv preprint arXiv:1904.04047.
9. McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological methods, 11(4), 386.
10. Müller, P., Chan, C. H., Ludwig, K., Freudenthaler, R., & Wessler, H. (2023). Differential Racism in the News: Using Semi-Supervised Machine Learning to Distinguish Explicit and Implicit Stigmatization of Ethnic and Religious Groups in Journalistic Discourse. Political Communication, 1-19.
11. Rosenthal, R. (1991), Meta-Analytic Procedures for Social Research. Newbury Park: Sage
12. Sweeney, C., & Najafian, M. (2019, July). A transparent framework for evaluating unintended demographic bias in word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1662-1667).
13. Watanabe, K. (2018). Newsmap: A semi-supervised approach to geographical news classification. Digital Journalism, 6(3), 294-309.

---

[^legacy]: Please use the `query` function. These functions are kept for backward compatibility.

Owner

Name: Transparent Social Analytics
Login: gesistsa
Kind: organization
Location: Germany

Repositories: 2
Profile: https://github.com/gesistsa

Open Science Tools maintained by Transparent Social Analytics Team, GESIS

Citation (CITATION.cff)

# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "sweater" in publications use:'
type: software
license: GPL-3.0-or-later
title: 'sweater: Speedy Word Embedding Association Test and Extras Using R'
version: 0.1.8
doi: 10.21105/joss.04036
abstract: 'Conduct various tests for evaluating implicit biases in word embeddings:
  Word Embedding Association Test (Caliskan et al., 2017), <https://doi.org/10.1126/science.aal4230>,
  Relative Norm Distance (Garg et al., 2018), <https://doi.org/10.1073/pnas.1720347115>,
  Mean Average Cosine Similarity (Mazini et al., 2019) <arXiv:1904.04047>, SemAxis
  (An et al., 2018) <arXiv:1806.05521>, Relative Negative Sentiment Bias (Sweeney
  & Najafian, 2019) <https://doi.org/10.18653/v1/P19-1162>, and Embedding Coherence
  Test (Dev & Phillips, 2019) <arXiv:1901.07656>.'
authors:
- family-names: Chan
  given-names: Chung-hong
  email: chainsawtiney@gmail.com
  orcid: https://orcid.org/0000-0002-6232-7530
preferred-citation:
  type: article
  title: 'sweater: Speedy Word Embedding Association Test and Extras Using R'
  authors:
  - family-names: Chan
    given-names: Chung-hong
    email: chainsawtiney@gmail.com
    orcid: https://orcid.org/0000-0002-6232-7530
  journal: Journal of Open Source Software
  doi: 10.21105/joss.04036
  url: https://github.com/gesistsa/sweater
  volume: '7'
  issue: '72'
  year: '2022'
  start: '4036'
repository: https://CRAN.R-project.org/package=sweater
repository-code: https://github.com/gesistsa/sweater
url: https://github.com/gesistsa/sweater
contact:
- family-names: Chan
  given-names: Chung-hong
  email: chainsawtiney@gmail.com
  orcid: https://orcid.org/0000-0002-6232-7530
keywords:
- bias-detection
- r
- textanalysis
- wordembedding
references:
- type: software
  title: Rcpp
  abstract: 'Rcpp: Seamless R and C++ Integration'
  notes: LinkingTo
  url: https://www.rcpp.org
  repository: https://CRAN.R-project.org/package=Rcpp
  authors:
  - family-names: Eddelbuettel
    given-names: Dirk
  - family-names: Francois
    given-names: Romain
  - family-names: Allaire
    given-names: JJ
  - family-names: Ushey
    given-names: Kevin
  - family-names: Kou
    given-names: Qiang
  - family-names: Russell
    given-names: Nathan
  - family-names: Ucar
    given-names: Inaki
  - family-names: Bates
    given-names: Douglas
  - family-names: Chambers
    given-names: John
  year: '2024'
- type: software
  title: purrr
  abstract: 'purrr: Functional Programming Tools'
  notes: Imports
  url: https://purrr.tidyverse.org/
  repository: https://CRAN.R-project.org/package=purrr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
    orcid: https://orcid.org/0000-0003-4757-117X
  - family-names: Henry
    given-names: Lionel
    email: lionel@rstudio.com
  year: '2024'
- type: software
  title: quanteda
  abstract: 'quanteda: Quantitative Analysis of Textual Data'
  notes: Imports
  url: https://quanteda.io
  repository: https://CRAN.R-project.org/package=quanteda
  authors:
  - family-names: Benoit
    given-names: Kenneth
    email: kbenoit@lse.ac.uk
    orcid: https://orcid.org/0000-0002-0797-564X
  - family-names: Watanabe
    given-names: Kohei
    email: watanabe.kohei@gmail.com
    orcid: https://orcid.org/0000-0001-6519-5265
  - family-names: Wang
    given-names: Haiyan
    email: whyinsa@yahoo.com
    orcid: https://orcid.org/0000-0003-4992-4311
  - family-names: Nulty
    given-names: Paul
    email: paul.nulty@gmail.com
    orcid: https://orcid.org/0000-0002-7214-4666
  - family-names: Obeng
    given-names: Adam
    email: quanteda@binaryeagle.com
    orcid: https://orcid.org/0000-0002-2906-4775
  - family-names: Müller
    given-names: Stefan
    email: stefan.mueller@ucd.ie
    orcid: https://orcid.org/0000-0002-6315-4125
  - family-names: Matsuo
    given-names: Akitaka
    email: a.matsuo@essex.ac.uk
    orcid: https://orcid.org/0000-0002-3323-6330
  - family-names: Lowe
    given-names: William
    email: lowe@hertie-school.org
    orcid: https://orcid.org/0000-0002-1549-6163
  year: '2024'
- type: software
  title: LiblineaR
  abstract: 'LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library'
  notes: Imports
  repository: https://CRAN.R-project.org/package=LiblineaR
  authors:
  - family-names: Helleputte
    given-names: Thibault
    email: thibault.helleputte@dnalytics.com
  - family-names: Paul
    given-names: Jérôme
  - family-names: Gramme
    given-names: Pierre
  year: '2024'
- type: software
  title: proxy
  abstract: 'proxy: Distance and Similarity Measures'
  notes: Imports
  repository: https://CRAN.R-project.org/package=proxy
  authors:
  - family-names: Meyer
    given-names: David
    email: David.Meyer@R-project.org
  - family-names: Buchta
    given-names: Christian
  year: '2024'
- type: software
  title: data.table
  abstract: 'data.table: Extension of `data.frame`'
  notes: Imports
  url: https://r-datatable.com
  repository: https://CRAN.R-project.org/package=data.table
  authors:
  - family-names: Barrett
    given-names: Tyson
    email: t.barrett88@gmail.com
  - family-names: Dowle
    given-names: Matt
    email: mattjdowle@gmail.com
  - family-names: Srinivasan
    given-names: Arun
    email: asrini@pm.me
  - family-names: Gorecki
    given-names: Jan
  - family-names: Chirico
    given-names: Michael
  - family-names: Hocking
    given-names: Toby
    orcid: https://orcid.org/0000-0002-3146-0865
  year: '2024'
- type: software
  title: cli
  abstract: 'cli: Helpers for Developing Command Line Interfaces'
  notes: Imports
  url: https://cli.r-lib.org
  repository: https://CRAN.R-project.org/package=cli
  authors:
  - family-names: Csárdi
    given-names: Gábor
    email: csardi.gabor@gmail.com
  year: '2024'
- type: software
  title: combinat
  abstract: 'combinat: combinatorics utilities'
  notes: Imports
  repository: https://CRAN.R-project.org/package=combinat
  authors:
  - family-names: Chasalow
    given-names: Scott
  year: '2024'
- type: software
  title: covr
  abstract: 'covr: Test Coverage for Packages'
  notes: Suggests
  url: https://covr.r-lib.org
  repository: https://CRAN.R-project.org/package=covr
  authors:
  - family-names: Hester
    given-names: Jim
    email: james.f.hester@gmail.com
  year: '2024'
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  url: https://testthat.r-lib.org
  repository: https://CRAN.R-project.org/package=testthat
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2024'
  version: '>= 3.0.0'
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  institution:
    name: R Foundation for Statistical Computing
    address: Vienna, Austria
  year: '2024'
  version: '>= 3.5'

GitHub Events

Total

Watch event: 3
Delete event: 1
Push event: 2
Pull request event: 3
Fork event: 1
Create event: 1

Last Year

Watch event: 3
Delete event: 1
Push event: 2
Pull request event: 3
Fork event: 1
Create event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 132
Total Committers: 2
Avg Commits per committer: 66.0
Development Distribution Score (DDS): 0.008

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
chainsawriot	c**y@g**m	131
Christina Maimone	c**e@g**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 31
Total pull requests: 15
Average time to close issues: about 1 month
Average time to close pull requests: about 12 hours
Total issue authors: 3
Total pull request authors: 3
Average comments per issue: 0.77
Average comments per pull request: 0.13
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 2 days
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

cmaimone (15)
chainsawriot (13)
psychbruce (1)

Pull Request Authors

chainsawriot (13)
cmaimone (2)
ArthurMuehl (1)

Top Labels

Issue Labels

documentation (2)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 519 last-month

Total dependent packages: 1
Total dependent repositories: 1
Total versions: 8
Total maintainers: 1

cran.r-project.org: sweater

Speedy Word Embedding Association Test and Extras Using R

Homepage: https://github.com/gesistsa/sweater
Documentation: http://cran.r-project.org/web/packages/sweater/sweater.pdf
License: GPL (≥ 3)
Latest release: 0.1.8
published over 2 years ago

Versions: 8
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 519 Last month

Rankings

Stargazers count: 10.4%

Forks count: 14.1%

Average: 17.5%

Dependent packages count: 18.1%

Downloads: 21.1%

Dependent repos count: 23.8%

Maintainers (1)

chainsawtiney@gmail.com

Last synced: 6 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

sweater

Science Score: 67.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: sweater

Rankings

Maintainers (1)

Dependencies