conText
An R package for estimating and doing statistical inference on context-specific word embeddings.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.9%) to scientific vocabulary
Repository
An R package for estimating and doing statistical inference on context-specific word embeddings.
Basic Info
Statistics
- Stars: 106
- Watchers: 8
- Forks: 20
- Open Issues: 6
- Releases: 0
Metadata Files
README.md

About
conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).
How to Install
install.packages("conText")
Datasets
To use conText you will need three objects:
- A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
- A set of (GloVe) pre-trained embeddings.
- A transformation matrix specific to the pre-trained embeddings.
conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds). We provide an equivalent RDS file for the 2024 GloVe embeddings released in July 2025 (labeled glove2024.rds).
Quick Start Guides
Check out this Quick Start Guide to get going with conText (last updated: 07/28/2025).
Latest Updates
As noted in Rodriguez et al. (2023) (p. 1272), distance measures typically used to compare representations in high-dimensional space (such as embedding vectors) exhibit statistical bias. In Green et al. (2025), we explore the severity of this problem for text-as-data applications and provide and validate a bias correction for the squared Euclidean distance. We implement this estimator and other recommendations from the paper in the latest update to the conText() function. Please refer to the Bias in Distance Measures vignette for additional information and the Quick Start Guide for examples of how to use the new version of the function and a description of changes in the output.
Multilanguage Resources
For those working in languages other than English, we have a set of data and code resources here
Owner
- Name: Pedro L. Rodríguez
- Login: prodriguezsosa
- Kind: user
- Company: Data Science Institute, Vanderbilt University
- Website: prodriguezsosa.com
- Repositories: 5
- Profile: https://github.com/prodriguezsosa
Research Scientist in Core Data Science at Meta.
GitHub Events
Total
- Issues event: 2
- Watch event: 8
- Issue comment event: 7
- Push event: 9
- Pull request event: 2
- Pull request review event: 2
- Pull request review comment event: 7
- Fork event: 2
Last Year
- Issues event: 2
- Watch event: 8
- Issue comment event: 7
- Push event: 9
- Pull request event: 2
- Pull request review event: 2
- Pull request review comment event: 7
- Fork event: 2
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 15
- Total pull requests: 20
- Average time to close issues: 17 days
- Average time to close pull requests: about 2 months
- Total issue authors: 12
- Total pull request authors: 10
- Average comments per issue: 3.13
- Average comments per pull request: 1.9
- Merged pull requests: 16
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 2
- Average time to close issues: about 1 hour
- Average time to close pull requests: 2 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- frodew (2)
- ldshuttleworth (2)
- ArthurSpirling (2)
- justinsavoie (1)
- caslerdon (1)
- ElisaWirsching (1)
- fireescapefilms (1)
- poppyseed-bagel (1)
- porter-rachel (1)
- kilbu (1)
- prodriguezsosa (1)
- EPINetz (1)
Pull Request Authors
- cjbarrie (5)
- prodriguezsosa (4)
- ElisaWirsching (2)
- sofiaaj (2)
- xiliny (2)
- davidycliao (1)
- FriederRodewald (1)
- CharlieCarter (1)
- frodew (1)
- MLBurnham (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 209 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
cran.r-project.org: conText
'a la Carte' on Text (ConText) Embedding Regression
- Homepage: https://github.com/prodriguezsosa/conText
- Documentation: http://cran.r-project.org/web/packages/conText/conText.pdf
- License: GPL-3
-
Latest release: 3.0.0
published 9 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.6.0 depends
- Matrix >= 1.3 imports
- dplyr * imports
- fastDummies >= 1.6.3 imports
- ggplot2 * imports
- methods * imports
- quanteda >= 3.0.0 imports
- reshape2 >= 1.4.4 imports
- stringr >= 1.4.0 imports
- text2vec >= 0.6 imports
- tidyr >= 1.1.3 imports
- SnowballC >= 0.7.0 suggests
- formatR * suggests
- hunspell * suggests
- knitr * suggests
- rmarkdown * suggests