excluder
excluder: An R package that checks for exclusion criteria in online data - Published in JOSS (2021)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
datacleaning
exclusion
mturk
qualtrics
r
r-package
rstats
Last synced: 6 months ago
·
JSON representation
Repository
Checks for Exclusion Criteria in Online Data
Basic Info
- Host: GitHub
- Owner: ropensci
- License: gpl-3.0
- Language: R
- Default Branch: main
- Homepage: https://docs.ropensci.org/excluder/
- Size: 947 KB
Statistics
- Stars: 9
- Watchers: 2
- Forks: 5
- Open Issues: 0
- Releases: 14
Topics
datacleaning
exclusion
mturk
qualtrics
r
r-package
rstats
Created about 5 years ago
· Last pushed 8 months ago
Metadata Files
Readme
Changelog
Contributing
License
Codemeta
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# excluder
[](https://www.repostatus.org/#active)
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
[](https://cran.r-project.org/package=excluder)
[](https://CRAN.R-project.org/package=excluder)
[](https://github.com/ropensci/excluder/actions)
[](https://app.codecov.io/gh/ropensci/excluder)
[](https://github.com/ropensci/software-review/issues/455)
[](https://doi.org/10.21105/joss.03893)
[](https://doi.org/10.5281/zenodo.5648202)
The goal of [`{excluder}`](https://docs.ropensci.org/excluder/) is to facilitate checking for, marking, and excluding rows of data frames for common exclusion criteria. This package applies to data collected from [Qualtrics](https://www.qualtrics.com/) surveys, and default column names come from importing data with the [`{qualtRics}`](https://docs.ropensci.org/qualtRics/) package.
This may be most useful for [Mechanical Turk](https://www.mturk.com/) data to screen for duplicate entries from the same location/IP address or entries from locations outside of the United States. But it can be used more generally to exclude based on response durations, preview status, progress, or screen resolution.
More details are available on the package [website](https://docs.ropensci.org/excluder/) and the [getting started vignette](https://docs.ropensci.org/excluder/articles/excluder.html).
## Installation
You can install the stable released version of `{excluder}` from [CRAN](https://cran.r-project.org/package=excluder) with:
```{r eval = FALSE}
install.packages("excluder")
```
You can install developmental versions from [GitHub](https://github.com/) with:
```{r eval = FALSE}
# install.packages("remotes")
remotes::install_github("ropensci/excluder")
```
## Verbs
This package provides three primary verbs:
* `mark` functions add a new column to the original data frame that labels the rows meeting the exclusion criteria. This is useful to label the potential exclusions for future processing without changing the original data frame.
* `check` functions search for the exclusion criteria and output a message with the number of rows meeting the criteria and a data frame of the rows meeting the criteria. This is useful for viewing the potential exclusions.
* `exclude` functions remove rows meeting the exclusion criteria. This is safest to do after checking the rows to ensure the exclusions are correct.
## Exclusion types
This package provides seven types of exclusions based on Qualtrics metadata. If you have ideas for other metadata exclusions, please submit them as [issues](https://github.com/ropensci/excluder/issues). Note, the intent of this package is not to develop functions for excluding rows based on survey-specific data but on general, frequently used metadata.
* `duplicates` works with rows that have duplicate IP addresses and/or locations (latitude/longitude).
* `duration` works with rows whose survey completion time is too short and/or too long.
* `ip` works with rows whose IP addresses are not found in the specified country (note: this exclusion type requires an internet connection to download the country's IP ranges).
* `location` works with rows whose latitude and longitude are not found in the United States.
* `preview` works with rows that are survey previews.
* `progress` works with rows in which the survey was not complete.
* `resolution` works with rows whose screen resolution is not acceptable.
## Usage
The verbs and exclusion types combine with `_` to create the functions, such as [`check_duplicates()`](https://docs.ropensci.org/excluder/reference/check_duplicates.html), [`exclude_ip()`](https://docs.ropensci.org/excluder/reference/exclude_ip.html), and [`mark_duration()`](https://docs.ropensci.org/excluder/reference/mark_duration.html). Multiple functions can be linked together using the [`{magrittr}`](https://magrittr.tidyverse.org/) pipe `%>%`. For datasets downloaded directly from Qualtrics, use [`remove_label_rows()`](https://docs.ropensci.org/excluder/reference/remove_label_rows.html) to remove the first two rows of labels and convert date and numeric columns in the metadata, and use [`deidentify()`](https://docs.ropensci.org/excluder/reference/deidentify.html) to remove standard Qualtrics columns with identifiable information (e.g., IP addresses, geolocation).
### Marking
The `mark_*()` functions output the original data set with a new column specifying rows that meet the exclusion criteria. These can be piped together with `%>%` for multiple exclusion types.
```{r mark1}
library(excluder)
# Mark preview and short duration rows
df <- qualtrics_text %>%
mark_preview() %>%
mark_duration(min_duration = 200)
tibble::glimpse(df)
```
Use the [`unite_exclusions()`](https://docs.ropensci.org/excluder/reference/unite_exclusions.html) function to unite all of the marked columns into a single column.
```{r mark2}
# Collapse labels for preview and short duration rows
df <- qualtrics_text %>%
mark_preview() %>%
mark_duration(min_duration = 200) %>%
unite_exclusions()
tibble::glimpse(df)
```
### Checking
The `check_*()` functions output messages about the number of rows that meet the exclusion criteria. Because checks return only the rows meeting the criteria, they **should not be connected via pipes** unless you want to subset the second check criterion within the rows that meet the first criterion. Thus, in general, `check_*()` functions should be used individually. If you want to view the potential exclusions for multiple criteria, use the `mark_*()` functions.
```{r check1}
# Check for preview rows
qualtrics_text %>%
check_preview()
```
### Excluding
The `exclude_*()` functions remove the rows that meet exclusion criteria. These, too, can be piped together. Since the output of each function is a subset of the original data with the excluded rows removed, the order of the functions will influence the reported number of rows meeting the exclusion criteria.
```{r exclude1}
# Exclude preview then incomplete progress rows
df <- qualtrics_text %>%
exclude_duration(min_duration = 100) %>%
exclude_progress()
dim(df)
```
```{r exclude2}
# Exclude incomplete progress then preview rows
df <- qualtrics_text %>%
exclude_progress() %>%
exclude_duration(min_duration = 100)
dim(df)
```
Though the order of functions should not influence the final data set, it may speed up processing large files by removing preview and incomplete progress data first and waiting to check IP addresses and locations after other exclusions have been performed.
```{r exclude3}
# Exclude rows
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_progress() %>%
exclude_duplicates() %>%
exclude_duration(min_duration = 100) %>%
exclude_resolution() %>%
exclude_ip() %>%
exclude_location()
```
## Citing this package
To cite `{excluder}`, use:
> Stevens, J. R. (2021). excluder: An R package that checks for exclusion criteria in online data. _Journal of Open Source Software_, 6(67), 3893. https://doi.org/10.21105/joss.03893
## Contributing to this package
[Contributions](https://docs.ropensci.org/excluder/CONTRIBUTING.html) to `{excluder}` are most welcome! Feel free to check out [open issues](https://github.com/ropensci/excluder/issues) for ideas. And [pull requests](https://github.com/ropensci/excluder/pulls) are encouraged, but you may want to [raise an issue](https://github.com/ropensci/excluder/issues/new/choose) or [contact the maintainer](mailto:jeffrey.r.stevens@protonmail.com) first.
Please note that the excluder project is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.
## Acknowledgments
I thank [Francine Goh](https://orcid.org/0000-0002-7364-4398) and Billy Lim for comments on an early version of the package, as well as the insightful feedback from [rOpenSci](https://ropensci.org/) editor [Mauro Lepore](https://orcid.org/0000-0002-1986-7988) and reviewers [Joseph O'Brien](https://orcid.org/0000-0001-9851-5077) and [Julia Silge](https://orcid.org/0000-0002-3671-836X). This work was funded by US National Science Foundation grant NSF-1658837.
The border collie and sheep featured in the logo were created by [PrecisionK9Krafts](https://www.etsy.com/shop/PrecisionK9Krafts).
Owner
- Name: rOpenSci
- Login: ropensci
- Kind: organization
- Email: info@ropensci.org
- Location: Berkeley, CA
- Website: https://ropensci.org/
- Twitter: rOpenSci
- Repositories: 307
- Profile: https://github.com/ropensci
JOSS Publication
excluder: An R package that checks for exclusion criteria in online data
Published
November 05, 2021
Volume 6, Issue 67, Page 3893
Authors
Tags
data exclusion Mechanical Turk online survey data QualtricsCodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"identifier": "excluder",
"description": "Data that are collected through online sources such as Mechanical Turk may require excluding rows because of IP address duplication, geolocation, or completion duration. This package facilitates exclusion of these data for Qualtrics datasets.",
"name": "excluder: Checks for Exclusion Criteria in Online Data",
"relatedLink": "https://docs.ropensci.org/excluder/",
"codeRepository": "https://github.com/ropensci/excluder/",
"issueTracker": "https://github.com/ropensci/excluder/issues/",
"license": "https://spdx.org/licenses/GPL-3.0",
"version": "0.5.2",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
"url": "https://r-project.org"
},
"runtimePlatform": "R version 4.5.0 (2025-04-11)",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"author": [
{
"@type": "Person",
"givenName": "Jeffrey R.",
"familyName": "Stevens",
"email": "jeffrey.r.stevens@protonmail.com",
"@id": "https://orcid.org/0000-0003-2375-1360"
}
],
"copyrightHolder": [
{
"@type": "Person",
"givenName": "Jeffrey R.",
"familyName": "Stevens",
"email": "jeffrey.r.stevens@protonmail.com",
"@id": "https://orcid.org/0000-0003-2375-1360"
}
],
"maintainer": [
{
"@type": "Person",
"givenName": "Jeffrey R.",
"familyName": "Stevens",
"email": "jeffrey.r.stevens@protonmail.com",
"@id": "https://orcid.org/0000-0003-2375-1360"
}
],
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "covr",
"name": "covr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=covr"
},
{
"@type": "SoftwareApplication",
"identifier": "knitr",
"name": "knitr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=knitr"
},
{
"@type": "SoftwareApplication",
"identifier": "lifecycle",
"name": "lifecycle",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=lifecycle"
},
{
"@type": "SoftwareApplication",
"identifier": "readr",
"name": "readr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=readr"
},
{
"@type": "SoftwareApplication",
"identifier": "rmarkdown",
"name": "rmarkdown",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rmarkdown"
},
{
"@type": "SoftwareApplication",
"identifier": "testthat",
"name": "testthat",
"version": ">= 3.0.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=testthat"
},
{
"@type": "SoftwareApplication",
"identifier": "withr",
"name": "withr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=withr"
}
],
"softwareRequirements": {
"1": {
"@type": "SoftwareApplication",
"identifier": "R",
"name": "R",
"version": ">= 3.5.0"
},
"2": {
"@type": "SoftwareApplication",
"identifier": "cli",
"name": "cli",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=cli"
},
"3": {
"@type": "SoftwareApplication",
"identifier": "curl",
"name": "curl",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=curl"
},
"4": {
"@type": "SoftwareApplication",
"identifier": "dplyr",
"name": "dplyr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=dplyr"
},
"5": {
"@type": "SoftwareApplication",
"identifier": "ipaddress",
"name": "ipaddress",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=ipaddress"
},
"6": {
"@type": "SoftwareApplication",
"identifier": "janitor",
"name": "janitor",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=janitor"
},
"7": {
"@type": "SoftwareApplication",
"identifier": "lubridate",
"name": "lubridate",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=lubridate"
},
"8": {
"@type": "SoftwareApplication",
"identifier": "magrittr",
"name": "magrittr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=magrittr"
},
"9": {
"@type": "SoftwareApplication",
"identifier": "maps",
"name": "maps",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=maps"
},
"10": {
"@type": "SoftwareApplication",
"identifier": "rlang",
"name": "rlang",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rlang"
},
"11": {
"@type": "SoftwareApplication",
"identifier": "stringr",
"name": "stringr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=stringr"
},
"12": {
"@type": "SoftwareApplication",
"identifier": "tidyr",
"name": "tidyr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tidyr"
},
"13": {
"@type": "SoftwareApplication",
"identifier": "tidyselect",
"name": "tidyselect",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tidyselect"
},
"SystemRequirements": null
},
"fileSize": "362.21KB",
"citation": [
{
"@type": "ScholarlyArticle",
"datePublished": "2021",
"author": [
{
"@type": "Person",
"givenName": [
"Jeffrey",
"R."
],
"familyName": "Stevens"
}
],
"name": "excluder: An R package that checks for exclusion criteria in online data",
"identifier": "10.21105/joss.03893",
"url": "https://doi.org/10.21105/joss.03893",
"pagination": "3893",
"@id": "https://doi.org/10.21105/joss.03893",
"sameAs": "https://doi.org/10.21105/joss.03893",
"isPartOf": {
"@type": "PublicationIssue",
"issueNumber": "67",
"datePublished": "2021",
"isPartOf": {
"@type": [
"PublicationVolume",
"Periodical"
],
"volumeNumber": "6",
"name": "Journal of Open Source Software"
}
}
}
]
}
GitHub Events
Total
- Issues event: 2
- Watch event: 1
- Delete event: 1
- Push event: 10
- Pull request event: 1
- Create event: 2
Last Year
- Issues event: 2
- Watch event: 1
- Delete event: 1
- Push event: 10
- Pull request event: 1
- Create event: 2
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jeffrey R. Stevens | j****s@g****m | 285 |
| Jeffrey Stevens | 5****5 | 2 |
| Romain Francois | r****n@r****m | 1 |
| Mauro Lepore | m****e@g****m | 1 |
Committer Domains (Top 20 + Academic)
rstudio.com: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 11
- Total pull requests: 8
- Average time to close issues: 8 days
- Average time to close pull requests: about 8 hours
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.73
- Average comments per pull request: 0.5
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: about 6 hours
- Average time to close pull requests: 17 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JeffreyRStevens (10)
- maelle (1)
Pull Request Authors
- JeffreyRStevens (9)
- romainfrancois (1)
- maurolepore (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 651 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
cran.r-project.org: excluder
Checks for Exclusion Criteria in Online Data
- Homepage: https://docs.ropensci.org/excluder/
- Documentation: http://cran.r-project.org/web/packages/excluder/excluder.pdf
- License: GPL (≥ 3)
-
Latest release: 0.5.2
published 9 months ago
Rankings
Forks count: 11.3%
Stargazers count: 17.9%
Average: 27.1%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Downloads: 40.9%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- cli * imports
- curl * imports
- dplyr * imports
- iptools * imports
- janitor * imports
- lubridate * imports
- magrittr * imports
- maps * imports
- rlang * imports
- stringr * imports
- tidyr * imports
- tidyselect * imports
- covr * suggests
- knitr * suggests
- lifecycle * suggests
- readr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests
- withr * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/upload-artifact main composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
.github/workflows/test-coverage.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
