srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data

https://github.com/gergness/srvyr

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 19 committers (5.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.5%) to scientific vocabulary

Keywords

r survey

Keywords from Contributors

data-manipulation shiny visualisation grammar geos summary-tables reproducibility easy-to-use parsing fwf
Last synced: 6 months ago · JSON representation

Repository

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data

Basic Info
  • Host: GitHub
  • Owner: gergness
  • Language: R
  • Default Branch: main
  • Homepage: http://gdfe.co/srvyr/
  • Size: 10 MB
Statistics
  • Stars: 217
  • Watchers: 9
  • Forks: 28
  • Open Issues: 18
  • Releases: 22
Topics
r survey
Created over 10 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Code of conduct

README.Rmd

---
output:
  github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# srvyr 

[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/srvyr)](https://CRAN.R-project.org/package=srvyr)
[![R build status](https://github.com/gergness/srvyr/workflows/R-CMD-check/badge.svg)](https://github.com/gergness/srvyr/actions)
[![Codecov test coverage](https://codecov.io/gh/gergness/srvyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gergness/srvyr?branch=main)


srvyr brings parts of [dplyr's](https://github.com/tidyverse/dplyr/) syntax to survey
analysis, using the [survey](https://CRAN.R-project.org/package=survey) 
package.

srvyr focuses on calculating summary statistics from survey data, such as the
mean, total or quantile. It allows for the use of many dplyr verbs, such as
`summarize`, `group_by`, and `mutate`, the convenience of pipe-able functions,
rlang's style of non-standard evaluation and more consistent return types
than the survey package.

You can try it out:

```R
install.packages("srvyr")
# or for development version
# remotes::install_github("gergness/srvyr")
```

## Example usage

First, describe the variables that define the survey's structure with the function
`as_survey()`with the bare column names of the names that you would use in functions
from the survey package like `survey::svydesign()`, `survey::svrepdesign()` or 
`survey::twophase()`.

```{r}
library(srvyr, warn.conflicts = FALSE)
data(api, package = "survey")

dstrata <- apistrat %>%
   as_survey_design(strata = stype, weights = pw)
```

Now many of the dplyr verbs are available.

* `mutate()` adds or modifies a variable.
```{r}
dstrata <- dstrata %>%
  mutate(api_diff = api00 - api99)
```

* `summarise()` calculates summary statistics such as mean, total, quantile or ratio.
```{r}
dstrata %>% 
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
```

* `group_by()` and then `summarise()` creates summaries by groups.
```{r}
dstrata %>% 
  group_by(stype) %>%
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
```

* Functions from the survey package are still available:
```{r}
my_model <- survey::svyglm(api99 ~ stype, dstrata)
summary(my_model)
```

## Cheat Sheet


## Learning more Here are some free resources put together by the community about srvyr: - **"How-to"s & examples of using srvyr** - srvyr's included vignette ["srvyr vs survey"](http://gdfe.co/srvyr/articles/srvyr-vs-survey.html) and the rest of the [pkgdown website](http://gdfe.co/srvyr/) - Stephanie Zimmer, Rebecca Powell and Isabella Velásquez's book [Exploring Complex Survey Data Analysis Using R](https://www.routledge.com/Exploring-Complex-Survey-Data-Analysis-Using-R-A-Tidy-Introduction-with-srvyr-and-survey/Zimmer-Powell-Velasquez/p/book/9781032302867?srsltid=AfmBOordog836itDOABXbcZM2BAE1WdJ6muu8sjgAIpO7WFu-x00D6HQ) (releasing in November 2024). See also their [2021 AAPOR Workshop "Tidy Survey Analysis in R using the srvyr Package"](https://github.com/szimmer/tidy-survey-aapor-2021) - "The Epidemiologist R Handbook", by Neale Batra et al. has a [chapter on survey analysis](https://epirhandbook.com/en/) with srvyr and survey package examples - Kieran Healy's book ["Data Visualization: A Practical Introduction"](https://socviz.co/modeling.html#plots-from-complex-surveys) has a section on using srvyr to visualize the ESS. - The IPUMS PMA team's blog had a series showing examples of using the [PMA COVID survey panel with weights](https://tech.popdata.org/pma-data-hub/index.html) - ["Open Case Studies: Vaping Behaviors in American Youth"](https://www.opencasestudies.org/ocs-bp-vaping-case-study/) by Carrie Wright, Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a detailed case study that includes using srvyr to analyze the National Youth Tobacco Survey. - The tidycensus package vignette ["Working with Census microdata"](https://walker-data.com/tidycensus/articles/pums-data.html) includes information about using the weights from the ACS retrieved from the census API. - ["The Joy of Calculating the Direct Standard Error for PUMS Estimates"](https://ldaly.github.io/giveinandblogit/) by GitHub user @ldaly - **About survey statistics** - Thomas Lumley's book ["Complex Surveys: a guide to analysis using R"](http://r-survey.r-forge.r-project.org/svybook/) - [Chris Skinner. Jon Wakefield. "Introduction to the Design and Analysis of Complex Survey Data." Statist. Sci. 32 (2) 165 - 175, May 2017. 10.1214/17-STS614](https://projecteuclid.org/accountAjax/Download?downloadType=journal%20article&urlId=10.1214%2F17-STS614&isResultClick=True) - Sharon Lohr's textbook "Sampling: Design and Analysis". [Second ](https://www.sharonlohr.com/sampling-design-and-analysis-2e) or [Third ](https://www.sharonlohr.com/sampling-design-and-analysis-3e) Editions - "Survey weighting is a mess" is the opening to Andrew Gelman's ["Struggles with Survey Weighting and Regression Modeling"](https://sites.stat.columbia.edu/gelman/research/published/STS226.pdf) - Anthony Damico's website ["Analyze Survey Data for Free"](https://asdfree.com) has the weight specifications for a wide variety of public use survey datasets. - **Working programmatically and/or on multiple columns at once (eg `dplyr::across` and `rlang`'s "curly curly" `{{}}`)** - dplyr's included package vignettes ["Column-wise operations"](https://dplyr.tidyverse.org/articles/colwise.html) & ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html) - **Non-English resources** - *Em português:* ["Análise de Dados Amostrais Complexos"](https://djalmapessoa.github.io/adac/) by Djalma Pessoa and Pedro Nascimento Silva - *En español:* ["Usando R para jugar con los microdatos del INEGI"](https://medium.com/tacosdedatos/usando-r-para-sacar-información-de-los-microdatos-del-inegi-b21b6946cf4f) by Claudio Daniel Pacheco Castro - Chapter 26 of the The Epidemiologist R Handbook, translated: - *En français:* [Analyse d’enquête](https://epirhandbook.com/fr/new_pages/survey_analysis.fr.html) - *Tiếng Việt:* [Phân tích khảo sát](https://epirhandbook.com/vn/new_pages/survey_analysis.vn.html) - *En español:* [Análisis de encuestas](https://epirhandbook.com/es/new_pages/survey_analysis.es.html) - *日本語で:* [標本調査データ分析](https://epirhandbook.com/jp/new_pages/survey_analysis.jp.html) - *Em português:* [Analises de pesquisa de questionários (survey)](https://epirhandbook.com/pt/new_pages/survey_analysis.pt.html) - *Türkçe:* [Anket analizi](https://epirhandbook.com/tr/new_pages/survey_analysis.tr.html) - *На русском языке:* [Анализ опросов](https://epirhandbook.com/ru/new_pages/survey_analysis.ru.html) - *På norsk:* [Data med vekter i R](https://oyvindsolheim.com/code/vekter%20i%20r/) by Øyvind Bugge Solheim - **Other cool stuff that uses srvyr** - A (free) graphical interface allowing exploratory data analysis of survey data without writing code: [iNZight](https://inzight.nz/) (and [survey data instructions](https://inzight.nz/docs/survey-specification.html)) - ["serosurvey: Serological Survey Analysis For Prevalence Estimation Under Misclassification"](https://avallecam.github.io/serosurvey/) by Andree Valle Campos - Several packages on CRAN depend on srvyr, you can see them by looking at the [reverse Imports/Suggestions on CRAN](https://cran.r-project.org/package=srvyr). **Still need help?** I think the best way to get help is to form a specific question and ask it in some place like [posit's community website](https://forum.posit.co/) (known for it's friendly community) or [stackoverflow.com](https://stackoverflow.com) (maybe not known for being quite as friendly, but probably has more people). If you think you've found a bug in srvyr's code, please file an [issue on GitHub](https://github.com/gergness/srvyr/issues/new), but note that I'm not a great resource for helping specific issue, both because I have limited capacity but also because I do not consider myself an expert in the statistical methods behind survey analysis. **Have something to add?** These resources were mostly found via vanity searches on twitter & github. If you know of anything I missed, or have written something yourself, [please let me know in this GitHub issue]()! ## What people are saying about srvyr > minimal changes to my #r #dplyr script to incorporate survey weights, thanks to the amazing #srvyr and #survey packages. Thanks to @gregfreedman & @tslumley. Integrates soooo nicely into tidyverse > > --Brian Guay ([\@BrianMGuay on Jun 16, 2021](https://twitter.com/brianmguay/status/1405224564196622338)) > Spending my afternoon using `srvyr` for tidy analysis of weighted survey data in #rstats and it's so elegant. Vignette here: https://CRAN.R-project.org/package=srvyr/vignettes/srvyr-vs-survey.html > > --Chris Skovron ([\@cskovron on Nov 20, 2018](https://twitter.com/cskovron/status/1065015904784842752)) > 1. Yay! > > --Thomas Lumley, [in the Biased and Inefficient blog](https://notstatschat.tumblr.com/post/161225885311/pipeable-survey-analysis-in-r) ## Contributing I do appreciate bug reports, suggestions and pull requests! I started this as a way to learn about R package development, and am still learning, so you'll have to bear with me. Please review the [Contributor Code of Conduct](https://github.com/gergness/srvyr/blob/main/CODE_OF_CONDUCT.md), as all participants are required to abide by its terms. If you're unfamiliar with contributing to an R package, I recommend the guides provided by Rstudio's tidyverse team, such as Jim Hester's [blog post](https://www.tidyverse.org/blog/2017/08/contributing/) or Hadley Wickham's [R packages book](https://r-pkgs.org/).

Owner

  • Name: Greg Freedman Ellis
  • Login: gergness
  • Kind: user
  • Location: St Paul, MN
  • Company: crunch.io

Interested in public health, statistics, and programming. Currently at the crunch.io, previously at CHR, MPC, Iowa DPH and IHME.

GitHub Events

Total
  • Issues event: 10
  • Watch event: 8
  • Issue comment event: 17
  • Push event: 10
  • Pull request review event: 2
  • Pull request event: 6
  • Fork event: 1
Last Year
  • Issues event: 10
  • Watch event: 8
  • Issue comment event: 17
  • Push event: 10
  • Pull request review event: 2
  • Pull request event: 6
  • Fork event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 666
  • Total Committers: 19
  • Avg Commits per committer: 35.053
  • Development Distribution Score (DDS): 0.141
Past Year
  • Commits: 31
  • Committers: 3
  • Avg Commits per committer: 10.333
  • Development Distribution Score (DDS): 0.161
Top Committers
Name Email Commits
gergness g****n@g****m 572
Ben Schneider b****r@g****m 31
Stephanie Zimmer s****r@g****m 12
carlganz c****z@g****m 9
Pavel N. Krivitsky p****y@u****u 8
tzoltak t****k@z****g 4
Etienne Bacher 5****r 4
BENSCHNEIDER\Ben Schneider b****r@i****m 4
olivroy o****1@h****m 3
Daniel Casey d****y@k****v 3
Anthony Damico a****o@g****m 3
gergness g****s@g****m 2
gergness g****g@G****l 2
Milan Bouchet-Valat n****n@c****r 2
Romain Francois r****n@r****m 2
florisvdh f****e@i****e 2
Hadley Wickham h****m@g****m 1
Lionel Henry l****y@g****m 1
Paul h****g@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 90
  • Total pull requests: 45
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 45
  • Total pull request authors: 13
  • Average comments per issue: 2.91
  • Average comments per pull request: 1.71
  • Merged pull requests: 37
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 7
  • Average time to close issues: 13 days
  • Average time to close pull requests: 22 days
  • Issue authors: 5
  • Pull request authors: 4
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gergness (17)
  • szimmer (9)
  • bschneidr (6)
  • dcaseykc (4)
  • krivit (4)
  • nalimilan (3)
  • yannsay-impact (3)
  • pachadotdev (2)
  • themichjam (2)
  • etiennebacher (2)
  • Hanzzman (2)
  • dankshan (2)
  • ABSOD (2)
  • jpferreira33 (1)
  • WaceroRuge (1)
Pull Request Authors
  • bschneidr (21)
  • gergness (10)
  • szimmer (4)
  • etiennebacher (3)
  • krivit (2)
  • olivroy (2)
  • romainfrancois (1)
  • zackarno (1)
  • hadley (1)
  • tzoltak (1)
  • nalimilan (1)
  • stephenashton-dhsc (1)
  • dcaseykc (1)
Top Labels
Issue Labels
help wanted (2) good first issue (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 4,789 last-month
  • Total docker downloads: 43,645
  • Total dependent packages: 19
    (may contain duplicates)
  • Total dependent repositories: 40
    (may contain duplicates)
  • Total versions: 40
  • Total maintainers: 1
cran.r-project.org: srvyr

'dplyr'-Like Syntax for Summary Statistics of Survey Data

  • Versions: 25
  • Dependent Packages: 19
  • Dependent Repositories: 40
  • Downloads: 4,789 Last month
  • Docker Downloads: 43,645
Rankings
Docker downloads count: 0.6%
Stargazers count: 2.1%
Forks count: 2.9%
Average: 3.5%
Dependent packages count: 3.7%
Dependent repos count: 4.1%
Downloads: 7.3%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/gergness/srvyr
  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1.2 depends
  • dplyr >= 1.0 imports
  • magrittr * imports
  • methods * imports
  • rlang * imports
  • survey >= 4.1 imports
  • tibble * imports
  • tidyr * imports
  • tidyselect * imports
  • vctrs >= 0.3.0 imports
  • DBI * suggests
  • Matrix * suggests
  • RSQLite * suggests
  • convey * suggests
  • dbplyr * suggests
  • ggplot2 * suggests
  • knitr * suggests
  • laeken * suggests
  • pander * suggests
  • rmarkdown >= 2.2.2 suggests
  • survival * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite