ironseed

Improved Random Number Generator Seeding

https://github.com/reedacartwright/ironseed

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Improved Random Number Generator Seeding

Basic Info
  • Host: GitHub
  • Owner: reedacartwright
  • License: other
  • Language: R
  • Default Branch: main
  • Size: 319 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 1
Created 9 months ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# Ironseed


[![R-CMD-check](https://github.com/reedacartwright/ironseed/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/reedacartwright/ironseed/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/reedacartwright/ironseed/graph/badge.svg)](https://app.codecov.io/gh/reedacartwright/ironseed)
[![CRAN status](https://www.r-pkg.org/badges/version/ironseed)](https://CRAN.R-project.org/package=ironseed)


## Overview

Ironseed is an R package that improves seeding for R's built in random number
generators. An ironseed is a finite-entropy (or fixed-entropy) hash digest that
can be used to generate an unlimited sequence of seeds for initializing the
state of a random number generator. It is inspired by the work of M.E. O'Neill
and others
[[1](https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html),
[2](https://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html),
[3](https://gist.github.com/imneme/540829265469e673d045)].

An ironseed is a 256-bit hash digest constructed from a variable-length sequence
of 32-bit inputs. Each ironseed consists of eight 32-bit sub-digests. The
sub-digests are values of 32-bit multilinear hashes
[[4](https://arxiv.org/pdf/1202.4961)] that accumulate entropy from the input
sequence. Each input is included in every sub-digest. The coefficients for the
multilinear hashes are generated by a [Weyl
sequence](https://en.wikipedia.org/wiki/Weyl_sequence).

Multilinear hashes are also used to generate an output seed sequence from an
ironseed. Each 32-bit output value is generated by uniquely hashing the
sub-digests. The coefficients for the output are generated by a second Weyl
sequence.

To improve the observed randomness of each hash output, bits are mixed using
a finalizer adapted from SplitMix64
[[5](https://doi.org/10.1145/2714064.2660195)]. With the additional mixing from
the finalizer, the output seed sequence passes PractRand tests
[[6](https://pracrand.sourceforge.net/)].

## Installation

``` r
# Install the released version of the package from CRAN as usual:
install.packages("ironseed")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("reedacartwright/ironseed")
```

## Examples

### User Seeding

Ironseed can be used at the top of a script to robustly initialize R's builtin
random number generator. The resulting ironseed is returned invisibly, and a
message is generated notifying the user that initialization has occurred. This
message can be logged and later used to reproduce the run.

```{r}
#!/usr/bin/env -S Rscript --vanilla
ironseed::ironseed("Experiment", 20251031, 1)
runif(10)
```

```{r, include = FALSE}
ironseed:::rm_random_seed()
```

If your script is intended to be called multiple times as part of a large study,
you can also seed based on the command line arguments.

```{r}
#!/usr/bin/env -S Rscript --vanilla
args <- commandArgs(trailingOnly = TRUE)
ironseed::ironseed("A Simulation Script 1", args)
runif(10)
```

```{r, include = FALSE}
ironseed:::rm_random_seed()
```

Specific command line arguments can also be used. For large, nested studies, it
is useful for scripts to support seeding using multiple seeds. Ironseed makes
this easy to accomplish.

```{r}
#!/usr/bin/env -S Rscript --vanilla
args <- commandArgs(trailingOnly = TRUE)
ironseed::ironseed("A Simulation Script 2", args[grepl("--seed=", args)])
runif(10)
```

```{r, include = FALSE}
ironseed:::rm_random_seed()
```

### Automatic Seeding

Ironseed can also automatically initialize the random number generator using an
ironseed constructed from multiple sources of entropy. This occurs if no data
is passed to `ironseed()`.

```{r}
#!/usr/bin/env -S Rscript --vanilla
ironseed::ironseed()
runif(10)

# Since RNG initializing has occurred, the next call will simply
# return the ironseed used in previous seeding.
fe <- ironseed::ironseed()
fe
```

```{r, include = FALSE}
ironseed:::rm_random_seed()
```

Or achieving the same thing with one call. Note that the automatically generated
seed is different from the previous run.

```{r}
#!/usr/bin/env -S Rscript --vanilla
fe <- ironseed::ironseed()
runif(10)
fe
```

```{r, include = FALSE}
ironseed:::rm_random_seed()
```

### Reproducible Code

An ironseed can also be used directly to reproduce a previous initialization.
This is most useful when automatic seeding has been used, and the previously
generated seed has been logged.

```{r}
#!/usr/bin/env -S Rscript --vanilla
ironseed::ironseed("RW7vjwjeiHF-QG7RYPvrntR-6tGPoi65sVc-N1n5SQi5RH4")
runif(10)
```

```{r, include = FALSE}
ironseed:::rm_random_seed()
```

## Analysis 

### Avalanche

A good hash function has good avalanche properties. If we change one
bit of information in the input, our goal is to change 50% of the bits
in the output. To test this we, will first build a function to
construct a random pair of ironseeds that differ by a single input
bit.

```{r}
rand_fe_pair <- function(w) {
  x <- sample(0:1, w, replace=TRUE)
  n <- sample(seq_along(x), 1)
  y <- x
  y[n] <- if(y[n] == 1) 0L else 1L
  x <- packBits(x, "integer")
  y <- packBits(y, "integer")
  x <- ironseed::ironseed(x, set_seed = FALSE)
  y <- ironseed::ironseed(y, set_seed = FALSE)
  list(x = x, y = y)
}
```

Next we will generate 100,000 pairs using 32-bit inputs. We will use
R's built-in seeding algorithm so that the results are independent of
Ironseed's seeding algorithm. We will also measure how many hash bits
were flipped by flipping one input bit.

```{r}
set.seed(20251220)
z <- replicate(100000, rand_fe_pair(32), simplify = FALSE)
dat <- sapply(z, \(a) sum(intToBits(a$x) != intToBits(a$y)))
```

```{r analysis_32}
mean(dat) # expectation: 128
sd(dat) # expectation: 8
hist(dat, breaks = 86:170, main = NULL)
```

We will repeat the same analysis for 256-bit inputs.

```{r analysis_256}
set.seed(20251221)
z <- replicate(100000, rand_fe_pair(256), simplify = FALSE)
dat <- sapply(z, \(a) sum(intToBits(a$x) != intToBits(a$y)))
mean(dat) # expectation: 128
sd(dat) # expectation: 8
hist(dat, breaks = 86:170, main = NULL)
```

As one can see, the avalanche behavior of the input hash is excellent.

Owner

  • Name: Reed A. Cartwright
  • Login: reedacartwright
  • Kind: user
  • Location: Tempe, AZ
  • Company: Arizona State University

GitHub Events

Total
  • Create event: 3
  • Release event: 1
  • Issues event: 4
  • Watch event: 2
  • Issue comment event: 7
  • Push event: 63
  • Pull request event: 3
  • Fork event: 1
Last Year
  • Create event: 3
  • Release event: 1
  • Issues event: 4
  • Watch event: 2
  • Issue comment event: 7
  • Push event: 63
  • Pull request event: 3
  • Fork event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • cran 248 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: ironseed

Improved Random Number Generator Seeding

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 248 Last month
Rankings
Dependent packages count: 26.0%
Dependent repos count: 32.0%
Average: 48.0%
Downloads: 85.9%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • tinytest * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v4 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/rhub.yaml actions
  • r-hub/actions/checkout v1 composite
  • r-hub/actions/platform-info v1 composite
  • r-hub/actions/run-check v1 composite
  • r-hub/actions/setup v1 composite
  • r-hub/actions/setup-deps v1 composite
  • r-hub/actions/setup-r v1 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • codecov/codecov-action v5 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite