validate
Professional data validation for the R environment
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, wiley.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Professional data validation for the R environment
Basic Info
Statistics
- Stars: 425
- Watchers: 17
- Forks: 42
- Open Issues: 48
- Releases: 0
Topics
Metadata Files
README.md
Easy data validation for the masses.
The validate R-package makes it super-easy to check whether data lives up to expectations you have based on domain knowledge. It works by allowing you to define data validation rules independent of the code or data set. Next you can confront a dataset, or various versions thereof with the rules. Results can be summarized, plotted, and so on. Below is a simple example.
```r
library(validate) check_that(iris, Sepal.Width < 0.5*Sepal.Length) |> summary() rule items passes fails nNA error warning expression 1 V1 150 79 71 0 FALSE FALSE Sepal.Width < 0.5 * Sepal.Length ```
With validate, data validation rules are treated as first-class citizens.
This means you can import, export, annotate, investigate and manipulate data
validation rules in a meaninful way.
To get started: see our data validation cookbook.
Citing
Please cite the JSS article
@article{van2021data,
title={Data validation infrastructure for R},
author={van der Loo, Mark PJ and de Jonge, Edwin},
journal={Journal of Statistical Software},
year={2021},
volume ={97},
issue = {10},
pages = {1-33},
doi={10.18637/jss.v097.i10},
url = {https://www.jstatsoft.org/article/view/v097i10}
}
To cite the theory, please cite our Wiley StatsRef chapter.
@article{loo2020data,
title = {Data Validation},
year = {2020},
journal = {Wiley StatsRef: Statistics Reference Online},
author = {M.P.J. van der Loo and E. de Jonge},
pages = {1--7},
doi = {https://doi.org/10.1002/9781118445112.stat08255},
url = {https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat08255}
}
Other Resources
- Tutorial material from the tutorial at uRos2024 (Greece)
- Tutorial material from our tutorial at useR!2021
- The Data Validation Cookbook
- Slides of the useR2016 talk (Stanford University, June 28 2016).
- Video of the satRdays talk (Hungarian Academy of Sciences, Sept 3 2016).
- Slides and exercises from the useR2018 tutorial.
- Materials for the uRos2018 workshop (The Hague, 2018)
- Materials for the ENBES|EESW workshop (Bilbao, 2019)
- Materials for the planned workshop at the Institute for Statistical Mathematics (Tokyo, 2020 - cancelled because of the COVID-19 situation)
Installation
The latest release can be installed from the R command-line
r
install.packages("validate")
The development version can be installed as follows.
bash
git clone https://github.com/data-cleaning/validate
cd validate
make install
Note that the development version likely contain bugs (please report them!) and interfaces that may not be stable.
Owner
- Name: Data cleaning for statistical purpose
- Login: data-cleaning
- Kind: organization
- Repositories: 34
- Profile: https://github.com/data-cleaning
Software for cleaning data
GitHub Events
Total
- Issues event: 9
- Watch event: 21
- Issue comment event: 10
- Push event: 5
- Pull request event: 4
- Fork event: 3
Last Year
- Issues event: 9
- Watch event: 21
- Issue comment event: 10
- Push event: 5
- Pull request event: 4
- Fork event: 3
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mark van der Loo | m****o@g****m | 639 |
| Edwin de Jonge | e****e@g****m | 71 |
| Mayeul Kauffmann | m****k | 10 |
| R. Mark Sharp | r****p@m****m | 2 |
| Daniel Pritchard | d****l@p****o | 1 |
| Evan Anway | e****y@e****m | 1 |
| Mutahi Wachira | m****a@g****m | 1 |
| newtux | n****x@g****m | 1 |
| Jacqueline Tay | j****y@J****l | 1 |
| flother | f****r | 1 |
| Daniel Barnett | 1****t | 1 |
| Etienne Bacher | 5****r | 1 |
| Jon Calder | j****r@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 112
- Total pull requests: 18
- Average time to close issues: 9 months
- Average time to close pull requests: 2 months
- Total issue authors: 57
- Total pull request authors: 15
- Average comments per issue: 1.7
- Average comments per pull request: 0.61
- Merged pull requests: 14
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 6
- Average time to close issues: 2 months
- Average time to close pull requests: 9 days
- Issue authors: 7
- Pull request authors: 4
- Average comments per issue: 0.6
- Average comments per pull request: 0.67
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- markvanderloo (21)
- matthiasgomolka (12)
- edwindj (10)
- flownt (4)
- luenhchang (4)
- grimmjulian (3)
- ivy-yuan (2)
- DJJ88 (2)
- Wytzepakito (2)
- bzlato (2)
- dpritchard (2)
- annennenne (2)
- martinschmelzer (2)
- PedroNSilva (1)
- elikesprogramming (1)
Pull Request Authors
- kyleGrealis (2)
- earcanal (2)
- SvenMeijs (2)
- phorikx (2)
- probjects (2)
- MichaelChirico (2)
- dpritchard (2)
- lmeilibr (1)
- mutahiwachira (1)
- justjacqueline (1)
- rmsharp (1)
- mayeulk (1)
- erm-eanway (1)
- etiennebacher (1)
- daniel-barnett (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- cran 2,166 last-month
- Total docker downloads: 43,401
-
Total dependent packages: 12
(may contain duplicates) -
Total dependent repositories: 36
(may contain duplicates) - Total versions: 28
- Total maintainers: 1
cran.r-project.org: validate
Data Validation Infrastructure
- Homepage: https://github.com/data-cleaning/validate
- Documentation: http://cran.r-project.org/web/packages/validate/validate.pdf
- License: GPL-3
-
Latest release: 1.1.5
published about 2 years ago
Rankings
Maintainers (1)
conda-forge.org: r-validate
- Homepage: https://github.com/data-cleaning/validate
- License: GPL-3.0-only
-
Latest release: 1.1.1
published almost 4 years ago
Rankings
Dependencies
- R >= 3.5.0 depends
- methods * depends
- graphics * imports
- grid * imports
- settings * imports
- stats * imports
- yaml * imports
- bookdown * suggests
- knitr * suggests
- lumberjack * suggests
- rmarkdown * suggests
- rsdmx * suggests
- tinytest >= 0.9.6 suggests