srvyr
R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 19 committers (5.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.5%) to scientific vocabulary
Keywords
r
survey
Keywords from Contributors
data-manipulation
shiny
visualisation
grammar
geos
summary-tables
reproducibility
easy-to-use
parsing
fwf
Last synced: 6 months ago
·
JSON representation
Repository
R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
Basic Info
- Host: GitHub
- Owner: gergness
- Language: R
- Default Branch: main
- Homepage: http://gdfe.co/srvyr/
- Size: 10 MB
Statistics
- Stars: 217
- Watchers: 9
- Forks: 28
- Open Issues: 18
- Releases: 22
Topics
r
survey
Created over 10 years ago
· Last pushed 7 months ago
Metadata Files
Readme
Changelog
Code of conduct
README.Rmd
---
output:
github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# srvyr
[](https://CRAN.R-project.org/package=srvyr)
[](https://github.com/gergness/srvyr/actions)
[](https://app.codecov.io/gh/gergness/srvyr?branch=main)
srvyr brings parts of [dplyr's](https://github.com/tidyverse/dplyr/) syntax to survey
analysis, using the [survey](https://CRAN.R-project.org/package=survey)
package.
srvyr focuses on calculating summary statistics from survey data, such as the
mean, total or quantile. It allows for the use of many dplyr verbs, such as
`summarize`, `group_by`, and `mutate`, the convenience of pipe-able functions,
rlang's style of non-standard evaluation and more consistent return types
than the survey package.
You can try it out:
```R
install.packages("srvyr")
# or for development version
# remotes::install_github("gergness/srvyr")
```
## Example usage
First, describe the variables that define the survey's structure with the function
`as_survey()`with the bare column names of the names that you would use in functions
from the survey package like `survey::svydesign()`, `survey::svrepdesign()` or
`survey::twophase()`.
```{r}
library(srvyr, warn.conflicts = FALSE)
data(api, package = "survey")
dstrata <- apistrat %>%
as_survey_design(strata = stype, weights = pw)
```
Now many of the dplyr verbs are available.
* `mutate()` adds or modifies a variable.
```{r}
dstrata <- dstrata %>%
mutate(api_diff = api00 - api99)
```
* `summarise()` calculates summary statistics such as mean, total, quantile or ratio.
```{r}
dstrata %>%
summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
```
* `group_by()` and then `summarise()` creates summaries by groups.
```{r}
dstrata %>%
group_by(stype) %>%
summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
```
* Functions from the survey package are still available:
```{r}
my_model <- survey::svyglm(api99 ~ stype, dstrata)
summary(my_model)
```
## Cheat Sheet

## Learning more
Here are some free resources put together by the community about srvyr:
- **"How-to"s & examples of using srvyr**
- srvyr's included vignette ["srvyr vs survey"](http://gdfe.co/srvyr/articles/srvyr-vs-survey.html) and the rest of the [pkgdown website](http://gdfe.co/srvyr/)
- Stephanie Zimmer, Rebecca Powell and Isabella Velásquez's book [Exploring Complex Survey Data Analysis Using R](https://www.routledge.com/Exploring-Complex-Survey-Data-Analysis-Using-R-A-Tidy-Introduction-with-srvyr-and-survey/Zimmer-Powell-Velasquez/p/book/9781032302867?srsltid=AfmBOordog836itDOABXbcZM2BAE1WdJ6muu8sjgAIpO7WFu-x00D6HQ) (releasing in November 2024). See also their [2021 AAPOR Workshop "Tidy Survey Analysis in R using the srvyr Package"](https://github.com/szimmer/tidy-survey-aapor-2021)
- "The Epidemiologist R Handbook", by Neale Batra et al. has a [chapter on survey analysis](https://epirhandbook.com/en/) with srvyr and survey package examples
- Kieran Healy's book ["Data Visualization: A Practical Introduction"](https://socviz.co/modeling.html#plots-from-complex-surveys) has a section on using srvyr to visualize the ESS.
- The IPUMS PMA team's blog had a series showing examples of using the [PMA COVID survey panel with weights](https://tech.popdata.org/pma-data-hub/index.html)
- ["Open Case Studies: Vaping Behaviors in American Youth"](https://www.opencasestudies.org/ocs-bp-vaping-case-study/) by Carrie Wright, Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a detailed case study that includes using srvyr to analyze the National Youth Tobacco Survey.
- The tidycensus package vignette ["Working with Census microdata"](https://walker-data.com/tidycensus/articles/pums-data.html) includes information about using the weights from the ACS retrieved from the census API.
- ["The Joy of Calculating the Direct Standard Error for PUMS Estimates"](https://ldaly.github.io/giveinandblogit/) by GitHub user @ldaly
- **About survey statistics**
- Thomas Lumley's book ["Complex Surveys: a guide to analysis using R"](http://r-survey.r-forge.r-project.org/svybook/)
- [Chris Skinner. Jon Wakefield. "Introduction to the Design and Analysis of Complex Survey Data." Statist. Sci. 32 (2) 165 - 175, May 2017. 10.1214/17-STS614](https://projecteuclid.org/accountAjax/Download?downloadType=journal%20article&urlId=10.1214%2F17-STS614&isResultClick=True)
- Sharon Lohr's textbook "Sampling: Design and Analysis". [Second ](https://www.sharonlohr.com/sampling-design-and-analysis-2e) or [Third ](https://www.sharonlohr.com/sampling-design-and-analysis-3e) Editions
- "Survey weighting is a mess" is the opening to Andrew Gelman's ["Struggles with Survey Weighting and Regression Modeling"](https://sites.stat.columbia.edu/gelman/research/published/STS226.pdf)
- Anthony Damico's website ["Analyze Survey Data for Free"](https://asdfree.com) has the weight specifications for a wide variety of public use survey datasets.
- **Working programmatically and/or on multiple columns at once (eg `dplyr::across` and `rlang`'s "curly curly" `{{}}`)**
- dplyr's included package vignettes ["Column-wise operations"](https://dplyr.tidyverse.org/articles/colwise.html) & ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html)
- **Non-English resources**
- *Em português:* ["Análise de Dados Amostrais Complexos"](https://djalmapessoa.github.io/adac/) by Djalma Pessoa and Pedro Nascimento Silva
- *En español:* ["Usando R para jugar con los microdatos del INEGI"](https://medium.com/tacosdedatos/usando-r-para-sacar-información-de-los-microdatos-del-inegi-b21b6946cf4f) by Claudio Daniel Pacheco Castro
- Chapter 26 of the The Epidemiologist R Handbook, translated:
- *En français:* [Analyse d’enquête](https://epirhandbook.com/fr/new_pages/survey_analysis.fr.html)
- *Tiếng Việt:* [Phân tích khảo sát](https://epirhandbook.com/vn/new_pages/survey_analysis.vn.html)
- *En español:* [Análisis de encuestas](https://epirhandbook.com/es/new_pages/survey_analysis.es.html)
- *日本語で:* [標本調査データ分析](https://epirhandbook.com/jp/new_pages/survey_analysis.jp.html)
- *Em português:* [Analises de pesquisa de questionários (survey)](https://epirhandbook.com/pt/new_pages/survey_analysis.pt.html)
- *Türkçe:* [Anket analizi](https://epirhandbook.com/tr/new_pages/survey_analysis.tr.html)
- *На русском языке:* [Анализ опросов](https://epirhandbook.com/ru/new_pages/survey_analysis.ru.html)
- *På norsk:* [Data med vekter i R](https://oyvindsolheim.com/code/vekter%20i%20r/) by Øyvind Bugge Solheim
- **Other cool stuff that uses srvyr**
- A (free) graphical interface allowing exploratory data analysis of survey data without writing code: [iNZight](https://inzight.nz/) (and [survey data instructions](https://inzight.nz/docs/survey-specification.html))
- ["serosurvey: Serological Survey Analysis For Prevalence Estimation Under Misclassification"](https://avallecam.github.io/serosurvey/) by Andree Valle Campos
- Several packages on CRAN depend on srvyr, you can see them by looking at the [reverse Imports/Suggestions on CRAN](https://cran.r-project.org/package=srvyr).
**Still need help?**
I think the best way to get help is to form a specific question and ask it in some place like [posit's community website](https://forum.posit.co/) (known for it's friendly community) or [stackoverflow.com](https://stackoverflow.com) (maybe not known for being quite as friendly, but probably has more people). If you think you've found a bug in srvyr's code, please file an [issue on GitHub](https://github.com/gergness/srvyr/issues/new), but note that I'm not a great resource for helping specific issue, both because I have limited capacity but also because I do not consider myself an expert in the statistical methods behind survey analysis.
**Have something to add?**
These resources were mostly found via vanity searches on twitter & github. If you know of anything I missed, or have written something yourself, [please let me know in this GitHub issue]()!
## What people are saying about srvyr
> minimal changes to my #r #dplyr script to incorporate survey weights, thanks to the amazing #srvyr and #survey packages. Thanks to @gregfreedman & @tslumley. Integrates soooo nicely into tidyverse
>
> --Brian Guay ([\@BrianMGuay on Jun 16, 2021](https://twitter.com/brianmguay/status/1405224564196622338))
> Spending my afternoon using `srvyr` for tidy analysis of weighted survey data in #rstats and it's so elegant. Vignette here: https://CRAN.R-project.org/package=srvyr/vignettes/srvyr-vs-survey.html
>
> --Chris Skovron ([\@cskovron on Nov 20, 2018](https://twitter.com/cskovron/status/1065015904784842752))
> 1. Yay!
>
> --Thomas Lumley, [in the Biased and Inefficient blog](https://notstatschat.tumblr.com/post/161225885311/pipeable-survey-analysis-in-r)
## Contributing
I do appreciate bug reports, suggestions and pull requests! I started this as a
way to learn about R package development, and am still learning, so you'll have
to bear with me. Please review the [Contributor Code of
Conduct](https://github.com/gergness/srvyr/blob/main/CODE_OF_CONDUCT.md), as all participants are required to abide by its
terms.
If you're unfamiliar with contributing to an R package, I recommend the guides
provided by Rstudio's tidyverse team, such as Jim Hester's [blog
post](https://www.tidyverse.org/blog/2017/08/contributing/) or Hadley
Wickham's [R packages book](https://r-pkgs.org/).
Owner
- Name: Greg Freedman Ellis
- Login: gergness
- Kind: user
- Location: St Paul, MN
- Company: crunch.io
- Repositories: 7
- Profile: https://github.com/gergness
Interested in public health, statistics, and programming. Currently at the crunch.io, previously at CHR, MPC, Iowa DPH and IHME.
GitHub Events
Total
- Issues event: 10
- Watch event: 8
- Issue comment event: 17
- Push event: 10
- Pull request review event: 2
- Pull request event: 6
- Fork event: 1
Last Year
- Issues event: 10
- Watch event: 8
- Issue comment event: 17
- Push event: 10
- Pull request review event: 2
- Pull request event: 6
- Fork event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| gergness | g****n@g****m | 572 |
| Ben Schneider | b****r@g****m | 31 |
| Stephanie Zimmer | s****r@g****m | 12 |
| carlganz | c****z@g****m | 9 |
| Pavel N. Krivitsky | p****y@u****u | 8 |
| tzoltak | t****k@z****g | 4 |
| Etienne Bacher | 5****r | 4 |
| BENSCHNEIDER\Ben Schneider | b****r@i****m | 4 |
| olivroy | o****1@h****m | 3 |
| Daniel Casey | d****y@k****v | 3 |
| Anthony Damico | a****o@g****m | 3 |
| gergness | g****s@g****m | 2 |
| gergness | g****g@G****l | 2 |
| Milan Bouchet-Valat | n****n@c****r | 2 |
| Romain Francois | r****n@r****m | 2 |
| florisvdh | f****e@i****e | 2 |
| Hadley Wickham | h****m@g****m | 1 |
| Lionel Henry | l****y@g****m | 1 |
| Paul | h****g@g****m | 1 |
Committer Domains (Top 20 + Academic)
inbo.be: 1
rstudio.com: 1
club.fr: 1
kingcounty.gov: 1
iqsresearch.com: 1
zozlak.org: 1
unsw.edu.au: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 90
- Total pull requests: 45
- Average time to close issues: 3 months
- Average time to close pull requests: 2 months
- Total issue authors: 45
- Total pull request authors: 13
- Average comments per issue: 2.91
- Average comments per pull request: 1.71
- Merged pull requests: 37
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 7
- Average time to close issues: 13 days
- Average time to close pull requests: 22 days
- Issue authors: 5
- Pull request authors: 4
- Average comments per issue: 2.0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gergness (17)
- szimmer (9)
- bschneidr (6)
- dcaseykc (4)
- krivit (4)
- nalimilan (3)
- yannsay-impact (3)
- pachadotdev (2)
- themichjam (2)
- etiennebacher (2)
- Hanzzman (2)
- dankshan (2)
- ABSOD (2)
- jpferreira33 (1)
- WaceroRuge (1)
Pull Request Authors
- bschneidr (21)
- gergness (10)
- szimmer (4)
- etiennebacher (3)
- krivit (2)
- olivroy (2)
- romainfrancois (1)
- zackarno (1)
- hadley (1)
- tzoltak (1)
- nalimilan (1)
- stephenashton-dhsc (1)
- dcaseykc (1)
Top Labels
Issue Labels
help wanted (2)
good first issue (1)
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- cran 4,789 last-month
- Total docker downloads: 43,645
-
Total dependent packages: 19
(may contain duplicates) -
Total dependent repositories: 40
(may contain duplicates) - Total versions: 40
- Total maintainers: 1
cran.r-project.org: srvyr
'dplyr'-Like Syntax for Summary Statistics of Survey Data
- Homepage: http://gdfe.co/srvyr/
- Documentation: http://cran.r-project.org/web/packages/srvyr/srvyr.pdf
- License: GPL-2 | GPL-3
-
Latest release: 1.3.0
published over 1 year ago
Rankings
Docker downloads count: 0.6%
Stargazers count: 2.1%
Forks count: 2.9%
Average: 3.5%
Dependent packages count: 3.7%
Dependent repos count: 4.1%
Downloads: 7.3%
Maintainers (1)
Last synced:
6 months ago
proxy.golang.org: github.com/gergness/srvyr
- Documentation: https://pkg.go.dev/github.com/gergness/srvyr#section-documentation
-
Latest release: v1.2.0
published about 3 years ago
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.1.2 depends
- dplyr >= 1.0 imports
- magrittr * imports
- methods * imports
- rlang * imports
- survey >= 4.1 imports
- tibble * imports
- tidyr * imports
- tidyselect * imports
- vctrs >= 0.3.0 imports
- DBI * suggests
- Matrix * suggests
- RSQLite * suggests
- convey * suggests
- dbplyr * suggests
- ggplot2 * suggests
- knitr * suggests
- laeken * suggests
- pander * suggests
- rmarkdown >= 2.2.2 suggests
- survival * suggests
- testthat * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite