assertr

Assertive programming for R analysis pipelines

https://github.com/tonyfischetti/assertr

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 22 committers (4.5%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary

Keywords

analysis-pipeline assertion-library assertion-methods assertions peer-reviewed predicate-functions r r-package rstats

Keywords from Contributors

genome reproducibility ropensci http-mock taxize ebird spocc drake makefile biology

Last synced: 6 months ago · JSON representation

Repository

Assertive programming for R analysis pipelines

Basic Info

Host: GitHub
Owner: tonyfischetti
License: other
Language: R
Default Branch: master
Homepage: https://docs.ropensci.org/assertr
Size: 13.9 MB

Statistics

Stars: 480
Watchers: 15
Forks: 34
Open Issues: 14
Releases: 6

Topics

analysis-pipeline assertion-library assertion-methods assertions peer-reviewed predicate-functions r r-package rstats

Created about 11 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Changelog License Codemeta

assertr

assertr logo

What is it?

The assertr package supplies a suite of functions designed to verify assumptions about data early in an analysis pipeline so that data errors are spotted early and can be addressed quickly.

This package does not need to be used with the magrittr/dplyr piping mechanism but the examples in this README use them for clarity.

Installation

You can install the latest version on CRAN like this r install.packages("assertr")

or you can install the bleeding-edge development version like this: r install.packages("devtools") devtools::install_github("ropensci/assertr")

What does it look like?

This package offers five assertion functions, assert, verify, insist, assert_rows, and insist_rows, that are designed to be used shortly after data-loading in an analysis pipeline...

Let’s say, for example, that the R’s built-in car dataset, mtcars, was not built-in but rather procured from an external source that was known for making errors in data entry or coding. Pretend we wanted to find the average miles per gallon for each number of engine cylinders. We might want to first, confirm - that it has the columns "mpg", "vs", and "am" - that the dataset contains more than 10 observations - that the column for 'miles per gallon' (mpg) is a positive number - that the column for ‘miles per gallon’ (mpg) does not contain a datum that is outside 4 standard deviations from its mean, and - that the am and vs columns (automatic/manual and v/straight engine, respectively) contain 0s and 1s only - each row contains at most 2 NAs - each row is unique jointly between the "mpg", "am", and "wt" columns - each row's mahalanobis distance is within 10 median absolute deviations of all the distances (for outlier detection)

This could be written (in order) using assertr like this:

```r library(dplyr) library(assertr)

mtcars %>%
  verify(has_all_names("mpg", "vs", "am", "wt")) %>%
  verify(nrow(.) > 10) %>%
  verify(mpg > 0) %>%
  insist(within_n_sds(4), mpg) %>%
  assert(in_set(0,1), am, vs) %>%
  assert_rows(num_row_NAs, within_bounds(0,2), everything()) %>%
  assert_rows(col_concat, is_uniq, mpg, am, wt) %>%
  insist_rows(maha_dist, within_n_mads(10), everything()) %>%
  group_by(cyl) %>%
  summarise(avg.mpg=mean(mpg))

```

If any of these assertions were violated, an error would have been raised and the pipeline would have been terminated early.

Let's see what the error message look like when you chain a bunch of failing assertions together.

```r > mtcars %>% + chainstart %>% + assert(inset(1, 2, 3, 4), carb) %>% + assertrows(rowMeans, withinbounds(0,5), gear:carb) %>% + verify(nrow(.)==10) %>% + verify(mpg < 32) %>% + chainend There are 7 errors across 4 verbs: - verb reduxfn predicate column index value 1 assert inset(1, 2, 3, 4) carb 30 6.0 2 assert inset(1, 2, 3, 4) carb 31 8.0 3 assertrows rowMeans withinbounds(0, 5) ~gear:carb 30 5.5 4 assertrows rowMeans withinbounds(0, 5) ~gear:carb 31 6.5 5 verify nrow(.) == 10 1 NA 6 verify mpg < 32 18 NA 7 verify mpg < 32 20 NA

Error: assertr stopped execution

```

What does `assertr` give me?

verify - takes a data frame (its first argument is provided by the %>% operator above), and a logical (boolean) expression. Then, verify evaluates that expression using the scope of the provided data frame. If any of the logical values of the expression's result are FALSE, verify will raise an error that terminates any further processing of the pipeline.
assert - takes a data frame, a predicate function, and an arbitrary number of columns to apply the predicate function to. The predicate function (a function that returns a logical/boolean value) is then applied to every element of the columns selected, and will raise an error if it finds any violations. Internally, the assert function uses dplyr's select function to extract the columns to test the predicate function on.
insist - takes a data frame, a predicate-generating function, and an arbitrary number of columns. For each column, the the predicate-generating function is applied, returning a predicate. The predicate is then applied to every element of the columns selected, and will raise an error if it finds any violations. The reason for using a predicate-generating function to return a predicate to use against each value in each of the selected rows is so that, for example, bounds can be dynamically generated based on what the data look like; this the only way to, say, create bounds that check if each datum is within x z-scores, since the standard deviation isn't known a priori. Internally, the insist function uses dplyr's select function to extract the columns to test the predicate function on.
assert_rows - takes a data frame, a row reduction function, a predicate function, and an arbitrary number of columns to apply the predicate function to. The row reduction function is applied to the data frame, and returns a value for each row. The predicate function is then applied to every element of vector returned from the row reduction function, and will raise an error if it finds any violations. This functionality is useful, for example, in conjunction with the num_row_NAs() function to ensure that there is below a certain number of missing values in each row. Internally, the assert_rows function uses dplyr'sselect function to extract the columns to test the predicate function on.
insist_rows - takes a data frame, a row reduction function, a predicate-generating function, and an arbitrary number of columns to apply the predicate function to. The row reduction function is applied to the data frame, and returns a value for each row. The predicate-generating function is then applied to the vector returned from the row reduction function and the resultant predicate is applied to each element of that vector. It will raise an error if it finds any violations. This functionality is useful, for example, in conjunction with the maha_dist() function to ensure that there are no flagrant outliers. Internally, the assert_rows function uses dplyr'sselect function to extract the columns to test the predicate function on.

assertr also offers four (so far) predicate functions designed to be used with the assert and assert_rows functions:

not_na - that checks if an element is not NA
within_bounds - that returns a predicate function that checks if a numeric value falls within the bounds supplied, and
in_set - that returns a predicate function that checks if an element is a member of the set supplied. (also allows inverse for "not in set")
is_uniq - that checks to see if each element appears only once

and predicate generators designed to be used with the insist and insist_rows functions:

within_n_sds - used to dynamically create bounds to check vector elements with based on standard z-scores
within_n_mads - better method for dynamically creating bounds to check vector elements with based on 'robust' z-scores (using median absolute deviation)

and the following row reduction functions designed to be used with assert_rows and insist_rows:

num_row_NAs - counts number of missing values in each row
maha_dist - computes the mahalanobis distance of each row (for outlier detection). It will coerce categorical variables into numerics if it needs to.
col_concat - concatenates all rows into strings
duplicated_across_cols - checking if a row contains a duplicated value across columns

and, finally, some other utilities for use with verify

has_all_names - check if the data frame or list has all supplied names
has_only_names - check that a data frame or list have only the names requested
has_class - checks if passed data has a particular class

More info

For more info, check out the assertr vignette r > vignette("assertr") Or read it here

$ropensci\_footer$

Owner

Name: Tony Fischetti
Login: tonyfischetti
Kind: user
Location: NYC
Company: The New York Public Library

Website: onthelambda.com
Twitter: tonyfischetti
Repositories: 76
Profile: https://github.com/tonyfischetti

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "assertr",
  "description": "Provides functionality to assert conditions that have to be met so that errors in data used in analysis pipelines can fail quickly. Similar to 'stopifnot()' but more powerful, friendly, and easier for use in pipelines.",
  "name": "assertr: Assertive Programming for R Analysis Pipelines",
  "relatedLink": [
    "https://docs.ropensci.org/assertr/",
    "https://CRAN.R-project.org/package=assertr"
  ],
  "codeRepository": "https://github.com/ropensci/assertr",
  "issueTracker": "https://github.com/ropensci/assertr/issues",
  "license": "https://spdx.org/licenses/MIT",
  "version": "3.0.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 4.1.1 (2021-08-10)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Comprehensive R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Tony",
      "familyName": "Fischetti",
      "email": "tony.fischetti@gmail.com"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Tony",
      "familyName": "Fischetti",
      "email": "tony.fischetti@gmail.com"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=magrittr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "tibble",
      "name": "tibble",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=tibble"
    }
  ],
  "softwareRequirements": {
    "1": {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.1.0"
    },
    "2": {
      "@type": "SoftwareApplication",
      "identifier": "dplyr",
      "name": "dplyr",
      "version": ">= 0.7.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=dplyr"
    },
    "3": {
      "@type": "SoftwareApplication",
      "identifier": "MASS",
      "name": "MASS",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=MASS"
    },
    "4": {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    "5": {
      "@type": "SoftwareApplication",
      "identifier": "stats",
      "name": "stats"
    },
    "6": {
      "@type": "SoftwareApplication",
      "identifier": "utils",
      "name": "utils"
    },
    "7": {
      "@type": "SoftwareApplication",
      "identifier": "rlang",
      "name": "rlang",
      "version": ">= 0.3.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rlang"
    },
    "SystemRequirements": null
  },
  "fileSize": "301.95KB",
  "releaseNotes": "https://github.com/ropensci/assertr/blob/master/NEWS",
  "readme": "https://github.com/ropensci/assertr/blob/master/README.md",
  "contIntegration": "https://travis-ci.org/ropensci/assertr",
  "keywords": [
    "predicate-functions",
    "analysis-pipeline",
    "assertions",
    "assertion-methods",
    "assertion-library",
    "r",
    "rstats",
    "r-package",
    "peer-reviewed"
  ]
}

GitHub Events

Total

Issues event: 1
Watch event: 9
Issue comment event: 4

Last Year

Issues event: 1
Watch event: 9
Issue comment event: 4

Committers

Last synced: 9 months ago

All Time

Total Commits: 216
Total Committers: 22
Avg Commits per committer: 9.818
Development Distribution Score (DDS): 0.338

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Tony Fischetti	t**i@g**m	143
Bernie Gray	b**3@g**m	11
Bill Denney	w**y@h**m	11
Krystian Igras	k**7@g**m	10
karldw	k****w	7
Michael Quinn	m**n@g**m	6
Daniel Possenriede	p**e@a**e	5
Michael Chirico	m**4@g**m	3
Jakub Nowicki	k**a@a**m	2
Maëlle Salmon	m**n@y**e	2
Peter Wicks Stringfield	p**d@g**m	2
Scott Chamberlain	m**s@g**m	2
TMOD	t**l@g**m	2
Angela Lucaci-Timoce	a**e@g**m	2
Alexander Matrunich	a**r@m**m	1
Filipe Filardi	f**i@g**m	1
Grace Li	g**8@b**u	1
Jeroen Ooms	j**s@g**m	1
Joshua Sturm	2****m	1
Karthik Ram	k**m@g**m	1
Alex Axthelm	A**m@c**v	1
Lorenz Walthert	l**t@i**m	1

Committer Domains (Top 20 + Academic)

che.in.gov: 1 berkeley.edu: 1 matrunich.com: 1 appsilon.com: 1 analyse-konzepte.de: 1 google.com: 1 humanpredictions.com: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 122
Total pull requests: 42
Average time to close issues: 7 months
Average time to close pull requests: about 1 month
Total issue authors: 44
Total pull request authors: 17
Average comments per issue: 2.3
Average comments per pull request: 1.74
Merged pull requests: 37
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

tonyfischetti (19)
wjschne (1)
maelle (1)
heiku-jiqu (1)
sckott (1)
jankatins (1)
AnthonyEbert (1)
3styleJam (1)
ajschumacher (1)

Pull Request Authors

maelle (1)
JoshuaSturm (1)
datalove (1)

Top Labels

Issue Labels

help wanted (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 3

proxy.golang.org: github.com/tonyfischetti/assertr

Documentation: https://pkg.go.dev/github.com/tonyfischetti/assertr#section-documentation
License: other
Latest release: v3.0.1+incompatible
published about 2 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 7 months ago

Dependencies

DESCRIPTION cran

R >= 3.1.0 depends
MASS * imports
dplyr >= 0.7.0 imports
rlang >= 0.3.0 imports
stats * imports
utils * imports
knitr * suggests
magrittr * suggests
plyr * suggests
rmarkdown * suggests
testthat * suggests
tibble * suggests

.github/workflows/R-CMD-check.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/test-coverage.yaml actions

actions/checkout v3 composite
actions/upload-artifact v3 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

assertr

Science Score: 36.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

assertr

What is it?

Installation

What does it look like?

What does assertr give me?

More info

Owner

CodeMeta (codemeta.json)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/tonyfischetti/assertr

Rankings

Dependencies

What does `assertr` give me?