unsum

Complete Listing of Original Samples of Underlying Raw Evidence

https://github.com/lhdjung/unsum

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Complete Listing of Original Samples of Underlying Raw Evidence

Basic Info
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```


# unsum: reconstruct raw data from summary statistics





The goal of unsum is to **un**do **sum**marization: reconstruct all possible samples that may underlie a given set of summary statistics. It currently supports sets of mean, SD, sample size, and scale bounds. This can be useful in forensic metascience to identify impossible or implausible reported numbers.

The package features *CLOSURE: Complete Listing of Original Samples of Underlying Raw Evidence*, a fast algorithm implemented in Rust. Go to [*Get started*](https://lhdjung.github.io/unsum/articles/unsum.html) to learn how to use it.

CLOSURE is exhaustive, which makes it computationally intensive. If your code takes too long to run, consider using [SPRITE](https://lukaswallrich.github.io/rsprite2//) instead (see *Previous work* below).

## Installation

You can install unsum with either of these:

``` r
install.packages("unsum")
# or
pak::pkg_install("unsum")
```

Your R version should be 4.2.0 or more recent.

## Demo

Start with `closure_generate()`, the package's main function. It creates all possible samples:

```{r example}
library(unsum)

data <- closure_generate(
  mean = "2.7",
  sd = "1.9",
  n = 130,
  scale_min = 1,
  scale_max = 5
)

data
```

Visualize the overall distribution of values found in the samples:

```{r}
#| fig.alt: >
#|   Barplot of `data`, the CLOSURE output.
#|   It specifically visualizes the `f_average` column of
#|   the `frequency` tibble, but also gives percentage figures,
#|   similar to the `f_relative` column. The overall shape is
#|   a somewhat polarized distribution.
closure_plot_bar(data)
```

## Previous work

[SPRITE](https://lukaswallrich.github.io/rsprite2/) generates random datasets that could have led to the reported statistics. CLOSURE is exhaustive, so it always finds all possible datasets, not just a random sample of them. For the same reason, SPRITE runs fast when CLOSURE may take too long.

[GRIM and GRIMMER](https://lhdjung.github.io/scrutiny/) test reported summary statistics for consistency, but CLOSURE is the ultimate consistency test: if it finds at least one distribution, the statistics are consistent; and if not, they cannot all be correct.

[CORVIDS](https://github.com/katherinemwood/corvids) deserves credit as the first technique to reconstruct all possible underlying datasets. However, it takes very long to run, often prohibitively so. This is partly because the code is written in Python, but the algorithm is also inherently much more complex than CLOSURE.

## About

The CLOSURE algorithm was originally written [in Python](https://github.com/larigaldie-n/CLOSURE-Python) by Nathanael Larigaldie. The R package unsum provides easy access to an optimized implementation in Rust, [closure-core](https://github.com/lhdjung/closure-core), via the amazing [extendr](https://extendr.github.io/) framework. Rust code tends to run much faster than R or Python code, which is required for many applications of CLOSURE.

Owner

  • Name: Lukas Jung
  • Login: lhdjung
  • Kind: user
  • Location: Heidelberg, Germany

R developer and master's student at Heidelberg University.

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Delete event: 7
  • Push event: 63
  • Pull request event: 12
  • Create event: 9
Last Year
  • Issues event: 2
  • Watch event: 2
  • Delete event: 7
  • Push event: 63
  • Pull request event: 12
  • Create event: 9

Issues and Pull Requests

Last synced: 10 months ago

Packages

  • Total packages: 1
  • Total downloads:
    • cran 165 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: unsum

Reconstruct Raw Data from Summary Statistics

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 165 Last month
Rankings
Dependent packages count: 26.2%
Dependent repos count: 32.3%
Average: 48.3%
Downloads: 86.4%
Maintainers (1)
Last synced: 10 months ago