validate

Professional data validation for the R environment

https://github.com/data-cleaning/validate

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, wiley.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

data-cleaning r validation

Keywords from Contributors

tidy-data report
Last synced: 6 months ago · JSON representation

Repository

Professional data validation for the R environment

Basic Info
  • Host: GitHub
  • Owner: data-cleaning
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 6.31 MB
Statistics
  • Stars: 425
  • Watchers: 17
  • Forks: 42
  • Open Issues: 48
  • Releases: 0
Topics
data-cleaning r validation
Created almost 12 years ago · Last pushed 8 months ago
Metadata Files
Readme

README.md

CRAN Downloads status Mentioned in Awesome Official Statistics

Easy data validation for the masses.

The validate R-package makes it super-easy to check whether data lives up to expectations you have based on domain knowledge. It works by allowing you to define data validation rules independent of the code or data set. Next you can confront a dataset, or various versions thereof with the rules. Results can be summarized, plotted, and so on. Below is a simple example.

```r

library(validate) check_that(iris, Sepal.Width < 0.5*Sepal.Length) |> summary() rule items passes fails nNA error warning expression 1 V1 150 79 71 0 FALSE FALSE Sepal.Width < 0.5 * Sepal.Length ```

With validate, data validation rules are treated as first-class citizens. This means you can import, export, annotate, investigate and manipulate data validation rules in a meaninful way.

To get started: see our data validation cookbook.

Citing

Please cite the JSS article

@article{van2021data, title={Data validation infrastructure for R}, author={van der Loo, Mark PJ and de Jonge, Edwin}, journal={Journal of Statistical Software}, year={2021}, volume ={97}, issue = {10}, pages = {1-33}, doi={10.18637/jss.v097.i10}, url = {https://www.jstatsoft.org/article/view/v097i10} }

To cite the theory, please cite our Wiley StatsRef chapter.

@article{loo2020data, title = {Data Validation}, year = {2020}, journal = {Wiley StatsRef: Statistics Reference Online}, author = {M.P.J. van der Loo and E. de Jonge}, pages = {1--7}, doi = {https://doi.org/10.1002/9781118445112.stat08255}, url = {https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat08255} }

Other Resources

Installation

The latest release can be installed from the R command-line r install.packages("validate")

The development version can be installed as follows. bash git clone https://github.com/data-cleaning/validate cd validate make install

Note that the development version likely contain bugs (please report them!) and interfaces that may not be stable.

Owner

  • Name: Data cleaning for statistical purpose
  • Login: data-cleaning
  • Kind: organization

Software for cleaning data

GitHub Events

Total
  • Issues event: 9
  • Watch event: 21
  • Issue comment event: 10
  • Push event: 5
  • Pull request event: 4
  • Fork event: 3
Last Year
  • Issues event: 9
  • Watch event: 21
  • Issue comment event: 10
  • Push event: 5
  • Pull request event: 4
  • Fork event: 3

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 731
  • Total Committers: 13
  • Avg Commits per committer: 56.231
  • Development Distribution Score (DDS): 0.126
Past Year
  • Commits: 21
  • Committers: 2
  • Avg Commits per committer: 10.5
  • Development Distribution Score (DDS): 0.476
Top Committers
Name Email Commits
Mark van der Loo m****o@g****m 639
Edwin de Jonge e****e@g****m 71
Mayeul Kauffmann m****k 10
R. Mark Sharp r****p@m****m 2
Daniel Pritchard d****l@p****o 1
Evan Anway e****y@e****m 1
Mutahi Wachira m****a@g****m 1
newtux n****x@g****m 1
Jacqueline Tay j****y@J****l 1
flother f****r 1
Daniel Barnett 1****t 1
Etienne Bacher 5****r 1
Jon Calder j****r@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 112
  • Total pull requests: 18
  • Average time to close issues: 9 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 57
  • Total pull request authors: 15
  • Average comments per issue: 1.7
  • Average comments per pull request: 0.61
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 10
  • Pull requests: 6
  • Average time to close issues: 2 months
  • Average time to close pull requests: 9 days
  • Issue authors: 7
  • Pull request authors: 4
  • Average comments per issue: 0.6
  • Average comments per pull request: 0.67
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • markvanderloo (21)
  • matthiasgomolka (12)
  • edwindj (10)
  • flownt (4)
  • luenhchang (4)
  • grimmjulian (3)
  • ivy-yuan (2)
  • DJJ88 (2)
  • Wytzepakito (2)
  • bzlato (2)
  • dpritchard (2)
  • annennenne (2)
  • martinschmelzer (2)
  • PedroNSilva (1)
  • elikesprogramming (1)
Pull Request Authors
  • kyleGrealis (2)
  • earcanal (2)
  • SvenMeijs (2)
  • phorikx (2)
  • probjects (2)
  • MichaelChirico (2)
  • dpritchard (2)
  • lmeilibr (1)
  • mutahiwachira (1)
  • justjacqueline (1)
  • rmsharp (1)
  • mayeulk (1)
  • erm-eanway (1)
  • etiennebacher (1)
  • daniel-barnett (1)
Top Labels
Issue Labels
enhancement (26) question (22) bug (11) wontfix (3)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 2,166 last-month
  • Total docker downloads: 43,401
  • Total dependent packages: 12
    (may contain duplicates)
  • Total dependent repositories: 36
    (may contain duplicates)
  • Total versions: 28
  • Total maintainers: 1
cran.r-project.org: validate

Data Validation Infrastructure

  • Versions: 19
  • Dependent Packages: 12
  • Dependent Repositories: 36
  • Downloads: 2,166 Last month
  • Docker Downloads: 43,401
Rankings
Stargazers count: 0.9%
Forks count: 1.9%
Dependent repos count: 4.3%
Dependent packages count: 5.0%
Average: 7.9%
Downloads: 9.6%
Docker downloads count: 25.7%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-validate
  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 18.6%
Forks count: 27.1%
Average: 32.7%
Dependent repos count: 34.0%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

pkg/DESCRIPTION cran
  • R >= 3.5.0 depends
  • methods * depends
  • graphics * imports
  • grid * imports
  • settings * imports
  • stats * imports
  • yaml * imports
  • bookdown * suggests
  • knitr * suggests
  • lumberjack * suggests
  • rmarkdown * suggests
  • rsdmx * suggests
  • tinytest >= 0.9.6 suggests