perccalc
perccalc: An R package for estimating percentiles from categorical variables - Published in JOSS (2019)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
2 of 5 committers (40.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Last synced: 6 months ago
·
JSON representation
Repository
Estimate percentile differences from ordered categorical data
Basic Info
- Host: GitHub
- Owner: cimentadaj
- License: other
- Language: R
- Default Branch: master
- Homepage: https://cimentadaj.github.io/perccalc/
- Size: 2.89 MB
Statistics
- Stars: 4
- Watchers: 1
- Forks: 3
- Open Issues: 0
- Releases: 2
Created over 8 years ago
· Last pushed over 5 years ago
Metadata Files
Readme
Changelog
Contributing
License
README.Rmd
---
output:
github_document:
html_preview: false
---
# perccalc
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
options(tibble.print_min = 5, tibble.print_max = 5)
```
[](http://cran.r-project.org/package=perccalc)
[](https://travis-ci.org/cimentadaj/perccalc)
[](https://ci.appveyor.com/project/cimentadaj/perccalc)
[](https://codecov.io/gh/cimentadaj/perccalc?branch=master)
[](https://doi.org/10.21105/joss.01796)
## Overview
`perccalc` is a direct implementation of the theoretical work of [Reardon
(2011)](https://www.russellsage.org/publications/whither-opportunity) where it is possible to
estimate the difference between two percentiles from an ordered categorical variable. More
concretely, by specifying an ordered categorical variable and a continuous variable, this method can estimate differences in the continuous variable between percentiles of the ordered categorical
variable. This brings forth a relevant strategy to contrast ordered categorical variables which
usually have alternative continuous measures to the percentiles of the continuous
measures. Moreover, this opens an avenue for calculating percentile distributions and percentile
differences for ordered categorical variables which don't necessarily have an alternative continuous measure such as job occupation classifications. With this package I introduce two functions that apply the procedure.
The package has two main functions:
* `perc_diff`, for calculating percentile differences
* `perc_dist`, for calculating scores for all percentiles
## Installation
You can install and load the package with these commands:
```{r, eval = FALSE}
install.packages("perccalc") # for stable version
# or
devtools::install_github("cimentadaj/perccalc") # for development version
```
## Usage
To look at a real world example, let's use the data from the General Social Survey (GSS). This dataset contains information on the responses given by subjects on a vocabulary test together with their age expressed in age groups (such as `30-39`, `40-49`, etc...). We're interested in calculating the difference in vocabulary test scores between the old and young respondents.
In many scenarios, we could calculate the difference between these groups in their vocabulary tests by estimating the mean difference between two age groups (for example, ages `20-29` versus ages `60-69`). However, in many other settings we're specifically interested in the difference of vocabulary tests by the percentiles of the age variable. In particular, this could be of interest for studies looking to contrast their results to other studies which have age as a continuous variable. In our example, age is a categorical variable so we cannot calculate percentiles. The method implemented in this package introduces a strategy for calculating percentiles from ordered categories.
Let's load our packages of interest and limit the GSS data to the year 1978.
```{r, message = FALSE, warning = FALSE}
library(perccalc)
library(dplyr)
library(ggplot2)
library(carData)
set.seed(213141)
data("GSSvocab")
gss <-
as_tibble(GSSvocab) %>%
filter(year == "1978") %>%
mutate(weight = sample(1:3, size = nrow(.), replace = TRUE, prob = c(0.1, 0.5, 0.4)),
ageGroup = factor(ageGroup, ordered = TRUE)) %>%
select(ageGroup, vocab, weight)
```
Note that the categorical variable (`ageGroup`) has to be an ordered factor (this is a requirement of both functions). Moving to the example, `perc_diff` calculates the difference in the continuous variable by the percentiles of the ordered categorical variable. In our example, this would the question of what's the difference in vocabulary test scores between the 90th and 10th percentile of age groups?
```{r}
perc_diff(gss, ageGroup, vocab, percentiles = c(90, 10))
```
It's about .21 points with a standard error of .39 points. In addittion, you can optionally add weights with the `weights` argument.
```{r}
perc_diff(gss, ageGroup, vocab, percentiles = c(90, 10), weights = weight)
```
On the other hand, the `perc_dist` (short for percentile distribution) allows you to estimate the score for every percentile and not limit the analysis to only the difference between two percentiles.
```{r}
perc_dist <- perc_dist(gss, ageGroup, vocab)
perc_dist
```
We could visualize this in a more intuitive representation:
```{r}
perc_dist %>%
ggplot(aes(percentile, estimate)) +
geom_point() +
geom_line() +
theme_minimal() +
labs(x = "Age group percentiles",
y = "Vocabulary test scores")
```
This function also allows the use of weights.
## Documentation and Support
Please visit https://cimentadaj.github.io/perccalc/ for documentation and
vignettes with real-world examples. In case you want to file an issue or
contribute in another way to the package, please follow this
[guide](https://github.com/cimentadaj/perccalc/blob/master/.github/CONTRIBUTING.md). For
questions about the functionality, feel free to file an issue on Github.
- Reardon, Sean F. "The widening academic achievement gap between the rich and the poor: New evidence and possible explanations." Whither opportunity (2011): 91-116.
Owner
- Name: Jorge Cimentada
- Login: cimentadaj
- Kind: user
- Location: Madrid
- Company: Senior Data Scientist
- Website: https://cimentadaj.github.io/
- Repositories: 6
- Profile: https://github.com/cimentadaj
@ eDreams
JOSS Publication
perccalc: An R package for estimating percentiles from categorical variables
Published
December 08, 2019
Volume 4, Issue 44, Page 1796
Authors
Tags
categorical data analysis achievement gapsGitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jorge Cimentada | c****j@g****m | 128 |
| François Briatte | b****e | 3 |
| Mark A. Jensen | m****n@n****v | 2 |
| Daniel S. Katz | d****z@i****g | 2 |
| Jeroen | j****s@g****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 7
- Average time to close issues: about 2 months
- Average time to close pull requests: 3 months
- Total issue authors: 1
- Total pull request authors: 4
- Average comments per issue: 21.0
- Average comments per pull request: 0.29
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- amoeba (1)
Pull Request Authors
- briatte (4)
- jeroen (1)
- majensen (1)
- danielskatz (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 257 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
cran.r-project.org: perccalc
Estimate Percentiles from an Ordered Categorical Variable
- Homepage: https://cimentadaj.github.io/perccalc/
- Documentation: http://cran.r-project.org/web/packages/perccalc/perccalc.pdf
- License: MIT + file LICENSE
-
Latest release: 1.0.5
published about 6 years ago
Rankings
Forks count: 14.9%
Stargazers count: 24.2%
Dependent packages count: 29.8%
Average: 32.0%
Dependent repos count: 35.5%
Downloads: 55.8%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.4.0 depends
- multcomp * imports
- stats * imports
- tibble * imports
- MASS * suggests
- carData * suggests
- covr * suggests
- dplyr * suggests
- ggplot2 * suggests
- knitr * suggests
- magrittr * suggests
- rmarkdown * suggests
- spelling * suggests
- testthat * suggests
- tidyr >= 1.0.0 suggests
