errorlocate
Find and replace erroneous fields in data using validation rules
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.6%) to scientific vocabulary
Keywords
data-cleaning
errors
invalidation
r
Last synced: 6 months ago
·
JSON representation
Repository
Find and replace erroneous fields in data using validation rules
Basic Info
- Host: GitHub
- Owner: data-cleaning
- Language: R
- Default Branch: master
- Homepage: http://data-cleaning.github.io/errorlocate/
- Size: 6.99 MB
Statistics
- Stars: 22
- Watchers: 3
- Forks: 3
- Open Issues: 14
- Releases: 0
Topics
data-cleaning
errors
invalidation
r
Created over 10 years ago
· Last pushed over 1 year ago
Metadata Files
Readme
Changelog
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
[](https://github.com/data-cleaning/errorlocate/actions)
[](https://CRAN.R-project.org/package=errorlocate)
[](http://www.r-pkg.org/pkg/errorlocate)
[](https://CRAN.R-project.org/package=errorlocate)
[](https://codecov.io/gh/data-cleaning/errorlocate?branch=master)
[](http://www.awesomeofficialstatistics.org)
# Error localization
Find errors in data given a set of validation rules.
The `errorlocate` helps to identify obvious errors in raw datasets.
It works in tandem with the package `validate`.
With `validate` you formulate data validation rules to which the data must comply.
For example:
- "age cannot be negative": `age >= 0`.
- "if a person is married, he must be older then 16 years": `if (married ==TRUE) age > 16`.
- "Profit is turnover minus cost": `profit == turnover - cost`.
While `validate` can check if a record is valid or not, it does not identify
which of the variables are responsible for the invalidation. This may seem a simple task,
but is actually quite tricky: a set of validation rules forms a web
of dependent variables: changing the value of an invalid record to repair for rule 1, may invalidate
the record for rule 2.
`errorlocate` provides a small framework for record based error detection and implements the Felligi Holt
algorithm. This algorithm assumes there is no other information available then the values of a record
and a set of validation rules. The algorithm minimizes the (weighted) number of values that need
to be adjusted to remove the invalidation.
# Installation
`errorlocate` can be installed from CRAN:
```r
install.packages("errorlocate")
```
Beta versions can be installed with `drat`:
```r
drat::addRepo("data-cleaning")
install.packages("errorlocate")
```
The latest development version of `errorlocate` can be installed from github with `devtools`:
```r
devtools::install_github("data-cleaning/errorlocate")
```
# Usage
```{r}
library(errorlocate)
rules <- validator( profit == turnover - cost
, cost >= 0.6 * turnover
, turnover >= 0
, cost >= 0 # is implied
)
data <- data.frame(profit=750, cost=125, turnover=200)
data_no_error <- replace_errors(data, rules)
# faulty data was replaced with NA
print(data_no_error)
er <- errors_removed(data_no_error)
print(er)
summary(er)
er$errors
```
Owner
- Name: Data cleaning for statistical purpose
- Login: data-cleaning
- Kind: organization
- Repositories: 34
- Profile: https://github.com/data-cleaning
Software for cleaning data
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Edwin de Jonge | e****e@g****m | 269 |
| Mark van der Loo | m****o@g****m | 1 |
Issues and Pull Requests
Last synced: over 2 years ago
All Time
- Total issues: 41
- Total pull requests: 0
- Average time to close issues: 5 months
- Average time to close pull requests: N/A
- Total issue authors: 4
- Total pull request authors: 0
- Average comments per issue: 1.27
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- edwindj (37)
- markvanderloo (2)
- smartie5 (1)
- nickforr (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (11)
bug (9)
question (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 413 last-month
- Total docker downloads: 43,390
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 12
- Total maintainers: 1
cran.r-project.org: errorlocate
Locate Errors with Validation Rules
- Homepage: https://github.com/data-cleaning/errorlocate
- Documentation: http://cran.r-project.org/web/packages/errorlocate/errorlocate.pdf
- License: GPL-3
-
Latest release: 1.1.2
published 7 months ago
Rankings
Stargazers count: 12.6%
Forks count: 17.8%
Average: 25.6%
Dependent packages count: 29.8%
Downloads: 32.3%
Dependent repos count: 35.5%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- validate * depends
- lpSolveAPI * imports
- methods * imports
- parallel * imports
- covr * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat >= 2.1.0 suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite