digittests

digitTests is an R package providing statistical tests for detecting irregular digit patterns. The package is also implemented with a graphical user interface in the Audit module of JASP (www.jasp-stats.org), a free and open-source statistical software program.

https://github.com/koenderks/digittests

Keywords

digit-analysis digits r rstats statistics

Keywords from Contributors

interpretability standardization animal hack autograder generative-art report

Last synced: 6 months ago · JSON representation ·

Repository

digitTests is an R package providing statistical tests for detecting irregular digit patterns. The package is also implemented with a graphical user interface in the Audit module of JASP (www.jasp-stats.org), a free and open-source statistical software program.

Basic Info

Host: GitHub
Owner: koenderks
License: gpl-3.0
Language: R
Default Branch: development
Homepage: https://koenderks.github.io/digitTests
Size: 1.53 MB

Statistics

Stars: 2
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 0

Archived

Topics

digit-analysis digits r rstats statistics

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Code of conduct Citation

README.md

digitTests: Tests for Detecting Irregular Digit Patterns

digitTests is an R package providing statistical tests for detecting irregular digit patterns. Such irregular digit patterns can be an indication of potential data manipulation or fraud. Therefore, the type of tests that the package provides can be useful in (but not limited to) the field of auditing to assess whether data have potentially been tampered with. However, please note that real data will never be perfect, and therefore caution should be used when relying on the statistical decision metrics that the package provides.

The package is also implemented with a graphical user interface in the Audit module of JASP, a free and open-source statistical software program.

Overview

For complete documentation of the digitTests package download the package manual.

Installation
Benchmarks
Intended usage
References

1. Installation

The most recently released version of digitTests can be downloaded from CRAN by running the following command in R:

r install.packages('digitTests')

Alternatively, you can download the development version from GitHub using:

r devtools::install_github('koenderks/digitTests')

After installation, the package can be loaded with:

r library(digitTests)

2. Benchmarks

To validate the statistical results, digitTests's automated unit tests regularly verify the main output from the package against the following benchmarks:

benford.analysis (R package version 0.1.5)
BenfordTests (R package version 1.2.0)
BeyondBenford (R package version 1.4)

3. Intended usage

Function: `extract_digits()`

The workhorse of the package is the extract_digits() function. This function takes a vector of numbers and returns the requested digits (with or without including 0's).

Full function with default arguments:

r extract_digits(x, check = 'first', include.zero = FALSE)

Supported options for the check argument:

Example:

```r x <- c(0.00, 0.20, 1.23, 40.00, 54.04) extract_digits(x, check = 'first', include.zero = FALSE)

[1] NA 2 1 4 5

```

Functions: `distr.test()` & `distr.btest()`

The functions distr.test() and distr.btest() take a vector of numeric values, extract the requested digits, and compares the frequencies of these digits to a reference distribution. The function distr.test() performs a frequentist hypothesis test of the null hypothesis that the digits are distributed according to the reference distribution and produces a p value. The function distr.btest() performs a Bayesian hypothesis test of the null hypothesis that the digits are distributed according to the reference distribution against the alternative hypothesis (using the prior parameters specified in alpha) that the digits are not distributed according to the reference distribution and produces a Bayes factor (Kass & Raftery, 1995). The possible options for the check argument are taken over from extract_digits().

Full function with default arguments:

r distr.test(x, check = 'first', reference = 'benford') distr.btest(x, check = 'first', reference = 'benford', alpha = NULL, BF10 = TRUE, log = FALSE)

Supported options for the reference argument:

Example:

Benford’s law (Benford, 1938) is a principle that describes a pattern in many naturally-occurring numbers. According to Benford's law, each possible leading digit d in a naturally occurring, or non-manipulated, set of numbers occurs with a probability:

$p(d_i) = \text{log}_{10}(\frac{1}{d_i})$

The distribution of leading digits in a data set of financial transaction values (e.g., the sinoForest data) can be extracted and tested against the expected frequencies under Benford's law using the code below.

```r

Frequentist hypothesis test

distr.test(sinoForest$value, check = 'first', reference = 'benford')

Digit distribution test

data: sinoForest$value

n = 772, X-squared = 7.6517, df = 8, p-value = 0.4682

alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.

Bayesian hypothesis test using default prior

distr.btest(sinoForest$value, check = 'first', reference = 'benford', BF10 = FALSE)

Digit distribution test

data: sinoForest$value

n = 772, BF01 = 6899678

alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.

```

Function: `rv.test()`

The function rv.test() analyzes the frequency with which values get repeated within a set of numbers. Unlike Benford's law, and its generalizations, this approach examines the entire number at once, not only the first or last digit. For the technical details of this procedure, see Simohnsohn (2019). The possible options for the check argument are taken over from extract_digits().

Full function with default arguments:

r rv.test(x, check = 'last', method = 'af', B = 2000)

Supported options for the method argument:

Example:

In this example we analyze a data set from a (retracted) paper that describes three experiments run in Chinese factories, where workers were nudged to use more hand-sanitizer. These data were shown to exhibited two classic markers of data tampering: impossibly similar means and the uneven distribution of last digits (Yu, Nelson, & Simohnson, 2018). We can use the rv.test() function to test if these data also contain a greater amount of repeated values than expected if the data were not tampered with.

```r rv.test(sanitizer$value, check = 'lasttwo', B = 5000)

Repeated values test

data: sanitizer$value

n = 1600, AF = 1.5225, p-value = 0.0024

alternative hypothesis: frequencies of repeated values are greater than for random data.

```

4. References

Benford, F. (1938). The law of anomalous numbers. In Proceedings of the American Philosophical Society, 551-572. - View online
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795. - View online
Simohnsohn, U. (2019, May 25). Number-Bunching: A New Tool for Forensic Data Analysis. - View online
Yo, F., Nelson, L., & Simonsohn, U. (2018, December 5). In Press at Psychological Science: A New 'Nudge' Supported by Implausible Data. - View online

Owner

Name: Koen Derks
Login: koenderks
Kind: user
Location: Amsterdam
Company: Nyenrode Business University

Website: https://koenderks.com
Twitter: koenderks
Repositories: 9
Profile: https://github.com/koenderks

Assistant Professor at Nyenrode Business University & Developer at JASP (www.jasp-stats.org), free and open-source statistical software.

Citation (CITATION.cff)

# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v0.2.2
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "digitTests" in publications use:'
type: software
license: GPL-3.0-or-later
title: 'digitTests: Tests for Detecting Irregular Digit Patterns'
version: 0.1.2
abstract: Provides statistical tests and support functions for detecting irregular
  digit patterns in numerical data. The package includes tools for extracting digits
  at various locations in a number, tests for repeated values, and (Bayesian) tests
  of digit distributions.
authors:
- family-names: Derks
  given-names: Koen
  email: k.derks@nyenrode.nl
  orcid: https://orcid.org/0000-0002-5533-9349
preferred-citation:
  type: manual
  title: 'digitTests: Tests for Detecting Irregular Data Patterns'
  authors:
  - family-names: Derks
    given-names: Koen
    email: k.derks@nyenrode.nl
    orcid: https://orcid.org/0000-0002-5533-9349
  year: '2022'
  notes: R package version 0.1.2
  url: https://CRAN.R-project.org/package=digitTests
repository: https://CRAN.R-project.org/package=digitTests
repository-code: https://github.com/koenderks/digitTests
url: https://koenderks.github.io/digitTests/
date-released: '2022-06-16'
contact:
- family-names: Derks
  given-names: Koen
  email: k.derks@nyenrode.nl
  orcid: https://orcid.org/0000-0002-5533-9349
keywords:
- digit-analysis
- digits
- r
references:
- type: software
  title: graphics
  abstract: 'R: A Language and Environment for Statistical Computing'
  notes: Imports
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2022'
  url: https://www.R-project.org/
  institution:
    name: R Foundation for Statistical Computing
- type: software
  title: stats
  abstract: 'R: A Language and Environment for Statistical Computing'
  notes: Imports
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2022'
  url: https://www.R-project.org/
  institution:
    name: R Foundation for Statistical Computing
- type: software
  title: benford.analysis
  abstract: 'benford.analysis: Benford Analysis for Data Validation and Forensic Analytics'
  notes: Suggests
  authors:
  - family-names: Cinelli
    given-names: Carlos
  year: '2022'
  url: https://CRAN.R-project.org/package=benford.analysis
- type: software
  title: BenfordTests
  abstract: 'BenfordTests: Statistical Tests for Evaluating Conformity to Benford''s
    Law'
  notes: Suggests
  authors:
  - family-names: Joenssen
    given-names: Dieter William
    email: Dieter.Joenssen@googlemail.com
  year: '2022'
  url: https://CRAN.R-project.org/package=BenfordTests
- type: software
  title: BeyondBenford
  abstract: 'BeyondBenford: Compare the Goodness of Fit of Benford''s and Blondeau
    Da Silva''s Digit Distributions to a Given Dataset'
  notes: Suggests
  authors:
  - family-names: Stephane
    given-names: Blondeau Da Silva
  year: '2022'
  url: https://CRAN.R-project.org/package=BeyondBenford
- type: software
  title: knitr
  abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
  notes: Suggests
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  year: '2022'
  url: https://CRAN.R-project.org/package=knitr
- type: software
  title: rmarkdown
  abstract: 'rmarkdown: Dynamic Documents for R'
  notes: Suggests
  authors:
  - family-names: Allaire
    given-names: JJ
    email: jj@rstudio.com
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  - family-names: McPherson
    given-names: Jonathan
    email: jonathan@rstudio.com
  - family-names: Luraschi
    given-names: Javier
    email: javier@rstudio.com
  - family-names: Ushey
    given-names: Kevin
    email: kevin@rstudio.com
  - family-names: Atkins
    given-names: Aron
    email: aron@rstudio.com
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  - family-names: Cheng
    given-names: Joe
    email: joe@rstudio.com
  - family-names: Chang
    given-names: Winston
    email: winston@rstudio.com
  - family-names: Iannone
    given-names: Richard
    email: rich@rstudio.com
    orcid: https://orcid.org/0000-0003-3925-190X
  year: '2022'
  url: https://CRAN.R-project.org/package=rmarkdown
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2022'
  url: https://CRAN.R-project.org/package=testthat

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 209
Total Committers: 3
Avg Commits per committer: 69.667
Development Distribution Score (DDS): 0.225

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
github-actions[bot]	4****]	162
Koen Derks	k**s@h**m	45
koenderks	k****s	2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 289 last-month

Total dependent packages: 0
Total dependent repositories: 3
Total versions: 3
Total maintainers: 1

cran.r-project.org: digitTests

Tests for Detecting Irregular Digit Patterns

Homepage: https://koenderks.github.io/digitTests/
Documentation: http://cran.r-project.org/web/packages/digitTests/digitTests.pdf
License: GPL (≥ 3)
Latest release: 0.1.2
published over 3 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 3
Downloads: 289 Last month

Rankings

Forks count: 14.2%

Dependent repos count: 16.5%

Stargazers count: 25.5%

Dependent packages count: 28.7%

Average: 29.2%

Downloads: 61.2%

Maintainers (1)

k.derks@nyenrode.nl

Last synced: 6 months ago

digittests

Science Score: 31.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

digitTests: Tests for Detecting Irregular Digit Patterns

Overview

1. Installation

2. Benchmarks

3. Intended usage

Function: extract_digits()

[1] NA 2 1 4 5

Functions: distr.test() & distr.btest()

Frequentist hypothesis test

Digit distribution test

data: sinoForest$value

n = 772, X-squared = 7.6517, df = 8, p-value = 0.4682

alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.

Bayesian hypothesis test using default prior

Digit distribution test

data: sinoForest$value

n = 772, BF01 = 6899678

alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.

Function: rv.test()

Repeated values test

data: sanitizer$value

n = 1600, AF = 1.5225, p-value = 0.0024

alternative hypothesis: frequencies of repeated values are greater than for random data.

4. References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: digitTests

Rankings

Maintainers (1)

Dependencies

Function: `extract_digits()`

Functions: `distr.test()` & `distr.btest()`

Function: `rv.test()`