conText

An R package for estimating and doing statistical inference on context-specific word embeddings.

https://github.com/prodriguezsosa/context

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

An R package for estimating and doing statistical inference on context-specific word embeddings.

Basic Info
  • Host: GitHub
  • Owner: prodriguezsosa
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 11 MB
Statistics
  • Stars: 106
  • Watchers: 8
  • Forks: 20
  • Open Issues: 6
  • Releases: 0
Created over 5 years ago · Last pushed 10 months ago
Metadata Files
Readme

README.md

logo-conText

About

conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).

How to Install

install.packages("conText")

Datasets

To use conText you will need three objects:

  1. A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
  2. A set of (GloVe) pre-trained embeddings.
  3. A transformation matrix specific to the pre-trained embeddings.

conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds). We provide an equivalent RDS file for the 2024 GloVe embeddings released in July 2025 (labeled glove2024.rds).

Quick Start Guides

Check out this Quick Start Guide to get going with conText (last updated: 07/28/2025).

Latest Updates

As noted in Rodriguez et al. (2023) (p. 1272), distance measures typically used to compare representations in high-dimensional space (such as embedding vectors) exhibit statistical bias. In Green et al. (2025), we explore the severity of this problem for text-as-data applications and provide and validate a bias correction for the squared Euclidean distance. We implement this estimator and other recommendations from the paper in the latest update to the conText() function. Please refer to the Bias in Distance Measures vignette for additional information and the Quick Start Guide for examples of how to use the new version of the function and a description of changes in the output.

Multilanguage Resources

For those working in languages other than English, we have a set of data and code resources here

Owner

  • Name: Pedro L. Rodríguez
  • Login: prodriguezsosa
  • Kind: user
  • Company: Data Science Institute, Vanderbilt University

Research Scientist in Core Data Science at Meta.

GitHub Events

Total
  • Issues event: 2
  • Watch event: 8
  • Issue comment event: 7
  • Push event: 9
  • Pull request event: 2
  • Pull request review event: 2
  • Pull request review comment event: 7
  • Fork event: 2
Last Year
  • Issues event: 2
  • Watch event: 8
  • Issue comment event: 7
  • Push event: 9
  • Pull request event: 2
  • Pull request review event: 2
  • Pull request review comment event: 7
  • Fork event: 2

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 15
  • Total pull requests: 20
  • Average time to close issues: 17 days
  • Average time to close pull requests: about 2 months
  • Total issue authors: 12
  • Total pull request authors: 10
  • Average comments per issue: 3.13
  • Average comments per pull request: 1.9
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 2
  • Average time to close issues: about 1 hour
  • Average time to close pull requests: 2 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • frodew (2)
  • ldshuttleworth (2)
  • ArthurSpirling (2)
  • justinsavoie (1)
  • caslerdon (1)
  • ElisaWirsching (1)
  • fireescapefilms (1)
  • poppyseed-bagel (1)
  • porter-rachel (1)
  • kilbu (1)
  • prodriguezsosa (1)
  • EPINetz (1)
Pull Request Authors
  • cjbarrie (5)
  • prodriguezsosa (4)
  • ElisaWirsching (2)
  • sofiaaj (2)
  • xiliny (2)
  • davidycliao (1)
  • FriederRodewald (1)
  • CharlieCarter (1)
  • frodew (1)
  • MLBurnham (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 209 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
cran.r-project.org: conText

'a la Carte' on Text (ConText) Embedding Regression

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 209 Last month
Rankings
Stargazers count: 5.2%
Forks count: 14.9%
Average: 24.4%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Downloads: 36.9%
Maintainers (1)
Last synced: 9 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.6.0 depends
  • Matrix >= 1.3 imports
  • dplyr * imports
  • fastDummies >= 1.6.3 imports
  • ggplot2 * imports
  • methods * imports
  • quanteda >= 3.0.0 imports
  • reshape2 >= 1.4.4 imports
  • stringr >= 1.4.0 imports
  • text2vec >= 0.6 imports
  • tidyr >= 1.1.3 imports
  • SnowballC >= 0.7.0 suggests
  • formatR * suggests
  • hunspell * suggests
  • knitr * suggests
  • rmarkdown * suggests