digittests
digitTests is an R package providing statistical tests for detecting irregular digit patterns. The package is also implemented with a graphical user interface in the Audit module of JASP (www.jasp-stats.org), a free and open-source statistical software program.
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
digitTests is an R package providing statistical tests for detecting irregular digit patterns. The package is also implemented with a graphical user interface in the Audit module of JASP (www.jasp-stats.org), a free and open-source statistical software program.
Basic Info
- Host: GitHub
- Owner: koenderks
- License: gpl-3.0
- Language: R
- Default Branch: development
- Homepage: https://koenderks.github.io/digitTests
- Size: 1.53 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
digitTests: Tests for Detecting Irregular Digit Patterns

digitTests is an R package providing statistical tests for detecting irregular digit patterns. Such irregular digit patterns can be an indication of potential data manipulation or fraud. Therefore, the type of tests that the package provides can be useful in (but not limited to) the field of auditing to assess whether data have potentially been tampered with. However, please note that real data will never be perfect, and therefore caution should be used when relying on the statistical decision metrics that the package provides.
The package is also implemented with a graphical user interface in the Audit module of JASP, a free and open-source statistical software program.
Overview
For complete documentation of the digitTests package download the package manual.
1. Installation
The most recently released version of digitTests can be downloaded from CRAN by running the following command in R:
r
install.packages('digitTests')
Alternatively, you can download the development version from GitHub using:
r
devtools::install_github('koenderks/digitTests')
After installation, the package can be loaded with:
r
library(digitTests)
2. Benchmarks
To validate the statistical results, digitTests's automated unit tests regularly verify the main output from the package against the following benchmarks:
- benford.analysis (R package version 0.1.5)
- BenfordTests (R package version 1.2.0)
- BeyondBenford (R package version 1.4)
3. Intended usage
Function: extract_digits()
The workhorse of the package is the extract_digits() function. This function takes a vector of numbers and returns the requested digits (with or without including 0's).
Full function with default arguments:
r
extract_digits(x, check = 'first', include.zero = FALSE)
Supported options for the check argument:
| check | Returns |
| :----------- | :----------- |
| fist | First digit |
| firsttwo | First and second digit |
| before | All digits before the decimal separator (.) |
| after | All digits after the decimal separator (.) |
| lasttwo | Last two digits |
| last | Last digit |
Example:
```r x <- c(0.00, 0.20, 1.23, 40.00, 54.04) extract_digits(x, check = 'first', include.zero = FALSE)
[1] NA 2 1 4 5
```
Functions: distr.test() & distr.btest()
The functions distr.test() and distr.btest() take a vector of numeric values, extract the requested digits, and compares the frequencies of these digits to a reference distribution. The function distr.test() performs a frequentist hypothesis test of the null hypothesis that the digits are distributed according to the reference distribution and produces a p value. The function distr.btest() performs a Bayesian hypothesis test of the null hypothesis that the digits are distributed according to the reference distribution against the alternative hypothesis (using the prior parameters specified in alpha) that the digits are not distributed according to the reference distribution and produces a Bayes factor (Kass & Raftery, 1995). The possible options for the check argument are taken over from extract_digits().
Full function with default arguments:
r
distr.test(x, check = 'first', reference = 'benford')
distr.btest(x, check = 'first', reference = 'benford', alpha = NULL, BF10 = TRUE, log = FALSE)
Supported options for the reference argument:
| check | Returns |
| :----------- | :----------- |
| benford | Benford's law |
| uniform | Uniform distribution |
| Vector of probabilities | Custom distribution |
Example:
Benford’s law (Benford, 1938) is a principle that describes a pattern in many naturally-occurring numbers. According to Benford's law, each possible leading digit d in a naturally occurring, or non-manipulated, set of numbers occurs with a probability:
The distribution of leading digits in a data set of financial transaction values (e.g., the sinoForest data) can be extracted and tested against the expected frequencies under Benford's law using the code below.
```r
Frequentist hypothesis test
distr.test(sinoForest$value, check = 'first', reference = 'benford')
Digit distribution test
data: sinoForest$value
n = 772, X-squared = 7.6517, df = 8, p-value = 0.4682
alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.
Bayesian hypothesis test using default prior
distr.btest(sinoForest$value, check = 'first', reference = 'benford', BF10 = FALSE)
Digit distribution test
data: sinoForest$value
n = 772, BF01 = 6899678
alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.
```
Function: rv.test()
The function rv.test() analyzes the frequency with which values get repeated within a set of numbers. Unlike Benford's law, and its generalizations, this approach examines the entire number at once, not only the first or last digit. For the technical details of this procedure, see Simohnsohn (2019). The possible options for the check argument are taken over from extract_digits().
Full function with default arguments:
r
rv.test(x, check = 'last', method = 'af', B = 2000)
Supported options for the method argument:
| check | Returns |
| :----------- | :----------- |
| af | Average frequency |
| entropy | Entropy |
Example:
In this example we analyze a data set from a (retracted) paper that describes three experiments run in Chinese factories, where workers were nudged to use more hand-sanitizer. These data were shown to exhibited two classic markers of data tampering: impossibly similar means and the uneven distribution of last digits (Yu, Nelson, & Simohnson, 2018). We can use the rv.test() function to test if these data also contain a greater amount of repeated values than expected if the data were not tampered with.
```r rv.test(sanitizer$value, check = 'lasttwo', B = 5000)
Repeated values test
data: sanitizer$value
n = 1600, AF = 1.5225, p-value = 0.0024
alternative hypothesis: frequencies of repeated values are greater than for random data.
```
4. References
- Benford, F. (1938). The law of anomalous numbers. In Proceedings of the American Philosophical Society, 551-572. - View online
- Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795. - View online
- Simohnsohn, U. (2019, May 25). Number-Bunching: A New Tool for Forensic Data Analysis. - View online
- Yo, F., Nelson, L., & Simonsohn, U. (2018, December 5). In Press at Psychological Science: A New 'Nudge' Supported by Implausible Data. - View online
Owner
- Name: Koen Derks
- Login: koenderks
- Kind: user
- Location: Amsterdam
- Company: Nyenrode Business University
- Website: https://koenderks.com
- Twitter: koenderks
- Repositories: 9
- Profile: https://github.com/koenderks
Assistant Professor at Nyenrode Business University & Developer at JASP (www.jasp-stats.org), free and open-source statistical software.
Citation (CITATION.cff)
# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v0.2.2
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
cff-version: 1.2.0
message: 'To cite package "digitTests" in publications use:'
type: software
license: GPL-3.0-or-later
title: 'digitTests: Tests for Detecting Irregular Digit Patterns'
version: 0.1.2
abstract: Provides statistical tests and support functions for detecting irregular
digit patterns in numerical data. The package includes tools for extracting digits
at various locations in a number, tests for repeated values, and (Bayesian) tests
of digit distributions.
authors:
- family-names: Derks
given-names: Koen
email: k.derks@nyenrode.nl
orcid: https://orcid.org/0000-0002-5533-9349
preferred-citation:
type: manual
title: 'digitTests: Tests for Detecting Irregular Data Patterns'
authors:
- family-names: Derks
given-names: Koen
email: k.derks@nyenrode.nl
orcid: https://orcid.org/0000-0002-5533-9349
year: '2022'
notes: R package version 0.1.2
url: https://CRAN.R-project.org/package=digitTests
repository: https://CRAN.R-project.org/package=digitTests
repository-code: https://github.com/koenderks/digitTests
url: https://koenderks.github.io/digitTests/
date-released: '2022-06-16'
contact:
- family-names: Derks
given-names: Koen
email: k.derks@nyenrode.nl
orcid: https://orcid.org/0000-0002-5533-9349
keywords:
- digit-analysis
- digits
- r
references:
- type: software
title: graphics
abstract: 'R: A Language and Environment for Statistical Computing'
notes: Imports
authors:
- name: R Core Team
location:
name: Vienna, Austria
year: '2022'
url: https://www.R-project.org/
institution:
name: R Foundation for Statistical Computing
- type: software
title: stats
abstract: 'R: A Language and Environment for Statistical Computing'
notes: Imports
authors:
- name: R Core Team
location:
name: Vienna, Austria
year: '2022'
url: https://www.R-project.org/
institution:
name: R Foundation for Statistical Computing
- type: software
title: benford.analysis
abstract: 'benford.analysis: Benford Analysis for Data Validation and Forensic Analytics'
notes: Suggests
authors:
- family-names: Cinelli
given-names: Carlos
year: '2022'
url: https://CRAN.R-project.org/package=benford.analysis
- type: software
title: BenfordTests
abstract: 'BenfordTests: Statistical Tests for Evaluating Conformity to Benford''s
Law'
notes: Suggests
authors:
- family-names: Joenssen
given-names: Dieter William
email: Dieter.Joenssen@googlemail.com
year: '2022'
url: https://CRAN.R-project.org/package=BenfordTests
- type: software
title: BeyondBenford
abstract: 'BeyondBenford: Compare the Goodness of Fit of Benford''s and Blondeau
Da Silva''s Digit Distributions to a Given Dataset'
notes: Suggests
authors:
- family-names: Stephane
given-names: Blondeau Da Silva
year: '2022'
url: https://CRAN.R-project.org/package=BeyondBenford
- type: software
title: knitr
abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
notes: Suggests
authors:
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
year: '2022'
url: https://CRAN.R-project.org/package=knitr
- type: software
title: rmarkdown
abstract: 'rmarkdown: Dynamic Documents for R'
notes: Suggests
authors:
- family-names: Allaire
given-names: JJ
email: jj@rstudio.com
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
- family-names: McPherson
given-names: Jonathan
email: jonathan@rstudio.com
- family-names: Luraschi
given-names: Javier
email: javier@rstudio.com
- family-names: Ushey
given-names: Kevin
email: kevin@rstudio.com
- family-names: Atkins
given-names: Aron
email: aron@rstudio.com
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
- family-names: Cheng
given-names: Joe
email: joe@rstudio.com
- family-names: Chang
given-names: Winston
email: winston@rstudio.com
- family-names: Iannone
given-names: Richard
email: rich@rstudio.com
orcid: https://orcid.org/0000-0003-3925-190X
year: '2022'
url: https://CRAN.R-project.org/package=rmarkdown
- type: software
title: testthat
abstract: 'testthat: Unit Testing for R'
notes: Suggests
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
year: '2022'
url: https://CRAN.R-project.org/package=testthat
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| github-actions[bot] | 4****] | 162 |
| Koen Derks | k****s@h****m | 45 |
| koenderks | k****s | 2 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 289 last-month
- Total dependent packages: 0
- Total dependent repositories: 3
- Total versions: 3
- Total maintainers: 1
cran.r-project.org: digitTests
Tests for Detecting Irregular Digit Patterns
- Homepage: https://koenderks.github.io/digitTests/
- Documentation: http://cran.r-project.org/web/packages/digitTests/digitTests.pdf
- License: GPL (≥ 3)
-
Latest release: 0.1.2
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- graphics * imports
- stats * imports
- BenfordTests * suggests
- BeyondBenford * suggests
- benford.analysis * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat * suggests