blocklength

R package with a set of functions to select the optimal block-length for a dependent bootstrap (block-bootstrap). Includes the Hall, Horowitz, and Jing (1995) cross-validation method and the Politis and White (2004) Spectral Density Plug-in method.

https://github.com/alec-stashevsky/blocklength

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 14 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.6%) to scientific vocabulary

Keywords

block-bootstrap block-resampling blocklength boot bootstrap cran depedent-bootstrap dependent horowitz inference meboot politis r resample stats time time-series time-series-analysis tseries

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: Alec-Stashevsky
License: gpl-2.0
Language: R
Default Branch: main
Homepage: https://alecstashevsky.com/r/blocklength/
Size: 2.8 MB

Statistics

Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 6

Topics

block-bootstrap block-resampling blocklength boot bootstrap cran depedent-bootstrap dependent horowitz inference meboot politis r resample stats time time-series time-series-analysis tseries

Created about 5 years ago · Last pushed 12 months ago

Metadata Files

Readme Changelog License Citation

README.Rmd

---
output: github_document
editor_options: 
  markdown: 
    wrap: 72
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  fig.align = "center",
  fig.ext = "svg",
  dev = "svg"
)

set.seed(32)
```


# blocklength 



[![R-CMD-check](https://github.com/Alec-Stashevsky/blocklength/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Alec-Stashevsky/blocklength/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/blocklength)](https://CRAN.R-project.org/package=blocklength)
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/blocklength?color=brightgreen)](https://CRAN.R-project.org/package=blocklength)
[![CRAN/METACRAN](https://img.shields.io/cran/l/blocklength)](https://CRAN.R-project.org/package=blocklength)
[![Codecov test coverage](https://codecov.io/gh/Alec-Stashevsky/blocklength/graph/badge.svg)](https://app.codecov.io/gh/Alec-Stashevsky/blocklength)



`blocklength` is an R package used to automatically select the
block-length parameter for a block-bootstrap. It is meant for use with
dependent data such as stationary time series.

## The Story

Regular bootstrap methods rely on assumptions that observations are
independent and identically distributed (*i.i.d.*), but this assumption
fails for many types of time series because we would expect the
observation in the previous period to have some explanatory power over
the current observation. This could occur in any time series from
unemployment rates, stock prices, biological data, etc. A time series
that is *i.i.d.* would look like white noise, since the following
observation would be totally independent of the previous one (random).

To get around this problem, we can retain some of this *time-dependence*
by breaking-up a time series into a number of blocks with length *l*.
Instead of sampling each observation randomly (with replacement) like a
regular bootstrap, we can resample these *blocks* at random. This way
within each block the time-dependence is preserved.

The problem with the block bootstrap is the high sensitivity to the
choice of block-length, or the number of blocks to break the time series
into.

The goal of `blocklength` is to simplify and automate the process of
selecting a block-length to perform a bootstrap on dependent data.
`blocklength` has several functions that take their name from the
authors who have proposed them. Currently, there are three methods
available:

1.  `hhj()` takes its name from the [Hall, Horowitz, and Jing
    (1995)](https://doi.org/10.1093/biomet/82.3.561)
    "HHJ" method to select the optimal block-length using a
    cross-validation algorithm which minimizes the mean squared error
    *(MSE)* incurred by the bootstrap at various block-lengths.

2.  `pwsd()` takes its name from the [Politis and White
    (2004)](https://doi.org/10.1081/ETC-120028836) Spectral Density
    "PWSD" Plug-in method to automatically select the optimal
    block-length using spectral density estimation via "flat-top" lag
    windows of [Politis and Romano
    (1995).](https://doi.org/10.1111/j.1467-9892.1995.tb00223.x)

3.  `nppi()` takes its name from the [Lahiri, Furukawa, and Lee
    (2007)](https://doi.org/10.1016/j.stamet.2006.08.002) Nonparametric
    Plug-In "NPPI" method to select the optimal block-length for block
    bootstrap procedures. The NPPI method estimates the leading term in
    the first-order expansion of the theoretically optimal block length
    by using resampling methods to construct consistent bias and
    variance estimators for the block-bootstrap. Specifically, this
    package implements the Moving Block Bootstrap (MBB) method of
    [Künsch (1989)](https://doi.org/10.1214/aos/1176347265) and
    the Moving Blocks Jackknife (MBJ) of [Liu and Singh
    (1992)](https://doi.org/10.1214/aos/1176348653) as the bias and
    variance estimators, respectively.

Under the hood, `hhj()` uses the moving block bootstrap (MBB) procedure
according to [Künsch
(1989)](https://projecteuclid.org/euclid.aos/1176347265) which resamples
blocks from a set of overlapping sub-samples with a fixed block-length.
However, the results of `hhj()` may be generalized to other block
bootstrap procedures such as the *stationary bootstrap* of [Politis and
Romano
(1994).](https://doi.org/10.1080/01621459.1994.10476870)

Compared to `pwsd()`, `hhj()` is more computationally intensive as it
relies on iterative sub-sampling processes that optimize the MSE
function over each possible block-length (or a select grid of
block-lengths), while `pwsd()` is a simpler "plug-in" rule that uses
auto-correlations, auto-covariance, and the spectral density of the
series to optimize the choice of block-length. Similarly, `nppi()` is
another "plug-in" rule, however, due to its heavy reliance on
resampling, it can also be computationally intensive compared to `pwsd()`.

For a detailed comparison, see the table below:

```{r table, echo=FALSE}
# Comparison table
table <- data.frame(
  rows = c(
    "**Method Type**",
    "**Computational Cost**",
    "**Primary Goal**", 
    "**Variance Estimation**",
    "**Bias Estimation**",
    "**Best for**", 
    "**Estimation Capacity**",
    "**Dependency\\***"
    ),
  NPPI = c(
    "Nonparametric resampling",
    "Medium (bootstrap resampling & jackknife)",
    "Minimize MSE of bootstrap estimator",
    "Moving Blocks Jackknife-After-Bootstrap (JAB)", 
    "Directly estimates bias from bootstrap",
    "General-purpose estimators, small sample sizes, and quantile estimation", 
    "Bootstrap bias, variance, distribution function, and quantile estimation",
    "User-defined parameters for initial block-length `l` and number of deletion blocks `m`"
    ),
  PWSD = c(
    "Spectral density estimation",
    "Low (direct ACF computation)", 
    "Estimate block length using spectral density",
    "Implicitly estimated via spectral density", 
    "Indirectly accounts for bias via ACF decay",
    "Block-length selection for circular and stationary bootstrap, time series with strong autocorrelation", 
    "Bootstrap sample mean only",
    "User-defined parameters for autocorrelation lag and implied hypothesis tests (4 total)"
    ),
  HHJ = c(
    "Subsampling-based cross-validation",
    "High (subsampling & cross-validation)", 
    "Minimize MSE via cross-validation",
    "Uses subsample-based variance estimation", 
    "Uses subsample-based bias estimation",
    "Estimating functionals with strong dependencies", 
    "Bootstrap variance and distribution function estimation", 
    "Requires user-defined parameters for `pilot_block_length` (*l\\**) and `sub_sample` size (*m*)"
    )
)

# Render table
knitr::kable(
  table,
  format = "markdown",
  col.names = c(
    "",
    "NPPI (Lahiri et al., 2007)",
    "PWSD (Politis & White, 2004)",
    "HHJ (Hall, Horowitz & Jing, 1995)"
    ),
  caption = "* All algorithms have default user-defined parameters recomended by the respective authors."
  )
```


## Installation

You can install the released version from
[CRAN](https://cran.r-project.org/package=blocklength) with:

``` r
install.packages("blocklength")
```

You can install the development version from
[GitHub](https://github.com/Alec-Stashevsky/blocklength) with:

``` r
# install.packages("devtools")
devtools::install_github("Alec-Stashevsky/blocklength")
```

## Use Case

We want to select the optimal block-length to perform a block bootstrap
on a simulated autoregressive *AR(1)* time series.

First we will generate the time series:

```{r series}
library(blocklength)

# Simulate AR(1) time series
series <- stats::arima.sim(model = list(order = c(1, 0, 0), ar = 0.5),
                           n = 500, rand.gen = rnorm)

# Coerce time series to data.frame (not necessary)
data <- data.frame("AR1" = series)
```

Now, we can find the optimal block-length to perform a block-bootstrap.
We do this using the three available methods.


### 1. The Hall, Horowitz, and Jing (1995) "HHJ" Method

```{r hhj}
## Using the HHJ Algorithm with overlapping subsamples of width 10
hhj(series, sub_sample = 10, k = "bias/variance")
```


### 2. The Politis and White (2004) Spectral Density Estimation "PWSD" Method

```{r pwsd}
# Using Politis and White (2004) Spectral Density Estimation
pwsd(data)
```

We can see that both methods produce similar results for a block-length
of 9 or 11 depending on the type of bootstrap method used.


### 3. The Lahiri, Furukawa, and Lee (2007) Nonparametric Plug-In "NPPI" Method

```{r nppi}
# Using Lahiri, Furukawa, and Lee (2007) Nonparametric Plug-In 
nppi(data, m = 8) 
```


## Acknowledgements

A big shoutout to Malina Cheeneebash for designing the `blocklength` hex
sticker! Also to Sergio Armella and [Simon P.
Couch](https://www.simonpcouch.com) for their help and feedback!

Owner

Name: Alec Stashevsky
Login: Alec-Stashevsky
Kind: user
Location: Los Angeles
Company: Fetch Rewards

Website: https://alecstashevsky.com/
Repositories: 2
Profile: https://github.com/Alec-Stashevsky

Applied Scientist with a passion for anthropology and interdisciplinary research. Currently working at the intersection of CV, NLP, and Graph ML.

GitHub Events

Total

Create event: 8
Release event: 3
Issues event: 6
Watch event: 1
Delete event: 5
Push event: 51
Pull request event: 8

Last Year

Create event: 8
Release event: 3
Issues event: 6
Watch event: 1
Delete event: 5
Push event: 51
Pull request event: 8

Committers

Last synced: over 2 years ago

All Time

Total Commits: 125
Total Committers: 3
Avg Commits per committer: 41.667
Development Distribution Score (DDS): 0.032

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Alec	a**y@p**m	121
Alec Stashevsky	a**c@a**m	2
Alec-Stashevsky	A****y	2

Committer Domains (Top 20 + Academic)

alecstashevsky.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 7
Average time to close issues: about 9 hours
Average time to close pull requests: about 16 hours
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 7
Average time to close issues: about 9 hours
Average time to close pull requests: about 16 hours
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Alec-Stashevsky (4)

Pull Request Authors

Alec-Stashevsky (7)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 669 last-month
Total docker downloads: 41,971

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 6
Total maintainers: 1

cran.r-project.org: blocklength

Select an Optimal Block-Length to Bootstrap Dependent Data (Block Bootstrap)

Homepage: https://alecstashevsky.com/r/blocklength
Documentation: http://cran.r-project.org/web/packages/blocklength/blocklength.pdf
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Latest release: 0.2.2
published 12 months ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 669 Last month
Docker Downloads: 41,971

Rankings

Stargazers count: 24.2%

Forks count: 28.8%

Dependent packages count: 29.8%

Average: 31.1%

Dependent repos count: 35.5%

Downloads: 37.4%

Maintainers (1)

alec@alecstashevsky.com

Last synced: 6 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/upload-artifact main composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v1 composite

.github/workflows/pkgdown.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v1 composite

.github/workflows/test-coverage.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v1 composite

.github/workflows/update-citation-cff.yaml actions

actions/checkout v2 composite
r-lib/actions/setup-r v1 composite
r-lib/actions/setup-r-dependencies v1 composite

DESCRIPTION cran

stats * imports
tseries * imports
covr * suggests
knitr * suggests
parallel * suggests
rmarkdown * suggests
testthat * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

blocklength

Science Score: 39.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: blocklength

Rankings

Maintainers (1)

Dependencies