tsbootstrap
tsbootstrap: generate bootstrapped time series samples in Python
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Repository
tsbootstrap: generate bootstrapped time series samples in Python
Basic Info
- Host: GitHub
- Owner: astrogilda
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://tsbootstrap.readthedocs.io/en/latest/
- Size: 2.5 MB
Statistics
- Stars: 81
- Watchers: 3
- Forks: 5
- Open Issues: 25
- Releases: 8
Metadata Files
README.md
<!-- ALL-CONTRIBUTORS-BADGE:END -->
📒 Table of Contents
- 🚀 Getting Started
- 🧩 Modules
- 🗺 Roadmap
- 🤝 Contributing
- 📄 License
- 📍 Time Series Bootstrapping Methods intro
- 👏 Contributors
🚀 Getting Started
⚡ Performance Update: 10-50x Faster with StatsForecast Backend
tsbootstrap now includes an optional high-performance backend powered by StatsForecast, delivering:
- 10-50x faster model fitting and forecasting
- 74% memory reduction for large-scale operations
- 100% backward compatibility with existing code
- Gradual rollout support with feature flags
Enable it with a simple environment variable:
bash
export TSBOOTSTRAP_USE_STATSFORECAST=true
Or configure programmatically:
python
model = TimeSeriesModel(X=data, model_type="arima", use_backend=True)
See the backend documentation for details.
🎮 Using tsbootstrap
tsbootstrap provides a unified, sklearn-like interface to all bootstrap methods.
Example using a MovingBlockBootstrap - all bootstrap algorithms follow
the same interface!
```python from tsbootstrap import MovingBlockBootstrap import numpy as np
Create custom time series data. While below is for univariate time series, the bootstraps can handle multivariate time series as well.
nsamples = 10 X = np.arange(nsamples)
Instantiate the bootstrap object
nbootstraps = 3 blocklength = 3 rng = 42 mbb = MovingBlockBootstrap( nbootstraps=nbootstraps, rng=rng, blocklength=blocklength )
Generate bootstrapped samples
returnindices = False bootstrappedsamples = mbb.bootstrap(X, returnindices=returnindices)
Collect bootstrap samples
Xbootstrapped = [] for data in bootstrappedsamples: X_bootstrapped.append(data)
Xbootstrapped = np.array(Xbootstrapped) ```
📦 Installation and Setup
tsbootstrap is installed via pip, either from PyPI or locally.
✔️ Prerequisites
- Python (3.9 or higher)
pip(latest version recommended), plus suitable environment manager (venv,conda)
You can also consider using uv to speed up environment setu.
Installing from PyPI
To install the latest release of tsbootstrap directly from PyPI, run:
sh
pip install tsbootstrap
To install with all optional dependencies:
``` pip install "tsbootstrap[all_extras]"
```
Bootstrap algorithms manage their own dependencies - if an extra is needed but not present, the object will raise this at construction.
🧩 Modules
The tsbootstrap package contains various modules that handle tasks such as bootstrapping, time series simulation, and utility functions. This modular approach ensures flexibility, extensibility, and ease of maintenance.
root
| File | Summary | | --- | --- | | [setup.sh](https://github.com/astrogilda/tsbootstrap/blob/main/setup.sh) | Shell script for initial setup and environment configuration. | | [commitlint.config.js](https://github.com/astrogilda/tsbootstrap/blob/main/commitlint.config.js) | Configuration for enforcing conventional commit messages. | | [CITATION.cff](https://github.com/astrogilda/tsbootstrap/blob/main/CITATION.cff) | Citation metadata for the project. | | [CODE_OF_CONDUCT.md](https://github.com/astrogilda/tsbootstrap/blob/main/CODE_OF_CONDUCT.md) | Guidelines for community conduct and interactions. | | [CONTRIBUTING.md](https://github.com/astrogilda/tsbootstrap/blob/main/CONTRIBUTING.md) | Instructions for contributing to the project. | | [.codeclimate.yml](https://github.com/astrogilda/tsbootstrap/blob/main/.codeclimate.yml) | Configuration for Code Climate quality checks. | | [.gitignore](https://github.com/astrogilda/tsbootstrap/blob/main/.gitignore) | Specifies files and folders to be ignored by Git. | | [.pre-commit-config.yaml](https://github.com/astrogilda/tsbootstrap/blob/main/.pre-commit-config.yaml) | Configuration for pre-commit hooks. | | [poetry.toml](https://github.com/astrogilda/tsbootstrap/blob/main/poetry.toml) | Configuration file for Poetry package management. | | [tsbootstrap_logo.png](https://github.com/astrogilda/tsbootstrap/blob/main/tsbootstrap_logo.png) | Project logo image. |tsbootstrap
| File | Summary | | --- | --- | | [block_generator.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/block_generator.py) | Generates blocks for bootstrapping. | | [markov_sampler.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/markov_sampler.py) | Implements sampling methods based on Markov models. | | [time_series_model.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/time_series_model.py) | Defines base and specific time series models. | | [block_length_sampler.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/block_length_sampler.py) | Samples block lengths for block bootstrapping methods. | | [base_bootstrap.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/bootstrap.py) | Contains the implementation for different types of base, abstract bootstrapping classes for time series data. | | [base_bootstrap_configs.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/bootstrap_configs.py) | Provides configuration classes for different base, abstract bootstrapping classes. | | [block_bootstrap.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/bootstrap.py) | Contains the implementation for different types of block bootstrapping methods for time series data. | | [block_bootstrap_configs.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/bootstrap_configs.py) | Provides configuration classes for different block bootstrapping methods. | | [bootstrap.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/bootstrap.py) | Contains the implementation for different types of bootstrapping methods for time series data, including residual, distribution, markov, statistic-preserving, and sieve. | | [time_series_simulator.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/time_series_simulator.py) | Simulates time series data based on various models. | | [block_resampler.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/block_resampler.py) | Implements methods for block resampling in time series. | | [best_lag.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/model_selection/best_lag.py) | Automatically selects optimal model orders for time series. | | [ranklags.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/ranklags.py) | Provides functionalities to rank lags in a time series. |utils
| File | Summary | | --- | --- | | [types.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/utils/types.py) | Defines custom types used across the project. | | [validate.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/utils/validate.py) | Contains validation utilities. | | [odds_and_ends.py](https://github.com/astrogilda/tsbootstrap/blob/main/src/tsbootstrap/utils/odds_and_ends.py) | Contains miscellaneous utility functions. |🗺 Roadmap
This is an abridged version; for the complete and evolving list of plans and improvements, see Issue #144.
- Performance and Scaling: handling large datasets, distributed backend integration (
Dask,Spark,Ray), profiling/optimization - Tuning and AutoML: adaptive block length, adaptive resampling, evaluation based parameter selection
- Real-time and Stream Data: stream bootstraps, data update interface
- Stage 2
sktimeIntegration: evaluation module, datasets, benchmarks, sktime forecasters in bootstraps - API and Capability Extension: panel/hierarchical data, exogenous data, update/stream, model state management
- Scope Extension (TBD): time series augmentation, fully probabilistic models
🤝 Contributing
Contributions are always welcome!
See our good first issues for getting started.
Below is a quick start guide to contributing.
Developer setup
Fork the tsbootstrap repository
Clone the fork to local:
sh git clone https://github.com/astrogilda/tsbootstrapIn the local repository root, set up a python environment, e.g.,
venvorconda.Editable install via
pip, including developer dependencies:pip install -e ".[dev]"
The editable install ensures that changes to the package are reflected in your environment.
- Set up git hooks and pre-commit: ```sh # Install pre-commit hooks pre-commit install
Configure git to use the project's hooks
git config core.hooksPath .githooks ```
This ensures that docs requirements stay in sync with pyproject.toml and
other code quality checks run automatically.
Verifying the Installation
After installation, you can verify that tsbootstrap has been installed correctly by checking its version or by trying to import it in Python:
python -c "import tsbootstrap; print(tsbootstrap.__version__)"
This command should output the version number of tsbootstrap without any errors, indicating that the installation was successful.
That's it! You are now set up and ready to go. You can start using tsbootstrap for your time series bootstrapping needs.
Contribution workflow
Contributions are always welcome! Please follow these steps:
- Create a new branch with a descriptive name (e.g.,
new-feature-branchorbugfix-issue-123).sh git checkout -b new-feature-branch - Make changes to the project's codebase.
- Commit your changes to your local branch with a clear commit message that explains the changes you've made.
sh git commit -m 'Implemented new feature.' - Push your changes to your forked repository on GitHub using the following command
sh git push origin new-feature-branch - Create a new pull request to the original project repository. In the pull request, describe the changes you've made and why they're necessary. The project maintainers will review your changes and provide feedback or merge them into the main branch.
🧪 Running Tests
To run all tests, in your developer environment, run:
sh
pytest tests/
Individual bootstrap algorithms can be tested as follows:
```python from tsbootstrap.utils import check_estimator
checkestimator(mybootstrap_algo) ```
Contribution guide
For more detailed information on how to contribute, please refer to our CONTRIBUTING.md guide.
📄 License
This project is licensed under the ℹ️ MIT License. See the LICENSE file for additional info.
👏 Contributors
Thanks goes to these wonderful people:
This project follows the all-contributors specification. Contributions of any kind welcome!
📍 Time Series Bootstrapping
tsbootstrap is a comprehensive project designed to implement an array of bootstrapping techniques specifically tailored for time series data. This project is targeted towards data scientists, statisticians, economists, and other professionals or researchers who regularly work with time series data and require robust methods for generating bootstrapped copies of univariate and multivariate time series data.
Overview
Time series bootstrapping is a nuanced resampling method that is applied to time-dependent data. Traditional bootstrapping methods often assume independence between data points, which is an assumption that does not hold true for time series data where a data point is often dependent on previous data points. Time series bootstrapping techniques respect the chronological order and correlations of the data, providing more accurate estimates of uncertainty or variability.
Bootstrapping Methodology
The tsbootstrap project offers a diverse set of bootstrapping techniques that can be applied to either the entire input time series (classes prefixed with Whole), or after partitioning the data into blocks (classes prefixed with Block). These methodologies can be applied directly to the raw input data or to the residuals obtained after fitting one of the five statistical models defined in time_series_model.py (classes with Residual in their names).
Block Bootstrap
Block Bootstrap is a prevalent approach in time series bootstrapping. It involves resampling blocks of consecutive data points, thus respecting the internal structures of the data. There are several techniques under Block Bootstrap, each with its unique approach. tsbootstrap provides highly flexible block bootstrapping, allowing the user to specify the block length sampling, block generation, and block resampling strategies. For additional details, refer to block_length_sampler.py, block_generator.py, and block_resampler.py.
The Moving Block Bootstrap, Circular Block Bootstrap, Stationary Block Bootstrap, and NonOverlapping Block Bootstrap methods are all variations of the Block Bootstrap that use different methods to sample the data, maintaining various types of dependencies.
Bartlett's, Blackman's, Hamming's, Hanning's, and Tukey's Bootstrap methods are specific implementations of the Block Bootstrap that use different window shapes to taper the data, reducing the influence of data points far from the center. In tsbootstrap, these methods inherit from MovingBlockBootstrap, but can easily be modified to inherit from any of the other three base block bootstrapping classes.
Each method comes with its distinct strengths and weaknesses. The choice of method should be based on the characteristics of the data and the specific requirements of the analysis.
(i) Moving Block Bootstrap
This method is implemented in MovingBlockBootstrap and is used for time series data where blocks of data are resampled to maintain the dependency structure within the blocks. It's useful when the data has dependencies that need to be preserved. It's not recommended when the data does not have any significant dependencies.
(ii) Circular Block Bootstrap
This method is implemented in CircularBlockBootstrap and treats the data as if it is circular (the end of the data is next to the beginning of the data). It's useful when the data is cyclical or seasonal in nature. It's not recommended when the data does not have a cyclical or seasonal component.
(iii) Stationary Block Bootstrap
This method is implemented in StationaryBlockBootstrap and randomly resamples blocks of data with block lengths that follow a geometric distribution. It's useful for time series data where the degree of dependency needs to be preserved, and it doesn't require strict stationarity of the underlying process. It's not recommended when the data has strong seasonality or trend components which violate the weak dependence assumption.
(iv) NonOverlapping Block Bootstrap
This method is implemented in NonOverlappingBlockBootstrap and resamples blocks of data without overlap. It's useful when the data has dependencies that need to be preserved and when overfitting is a concern. It's not recommended when the data does not have any significant dependencies or when the introduction of bias due to non-overlapping selection is a concern.
(v) Bartlett's Bootstrap
Bartlett's method is a time series bootstrap method that uses a window or filter that tapers off as you move away from the center of the window. It's useful when you have a large amount of data and you want to reduce the influence of the data points far away from the center. This method is not advised when the tapering of data points is not desired or when the dataset is small as the tapered data points might contain valuable information. It is implemented in BartlettsBootstrap.
(vi) Blackman Bootstrap
Similar to Bartlett's method, Blackman's method uses a window that tapers off as you move away from the center of the window. The key difference is the shape of the window (Blackman window has a different shape than Bartlett). It's useful when you want to reduce the influence of the data points far from the center with a different window shape. It's not recommended when the dataset is small or tapering of data points is not desired. It is implemented in BlackmanBootstrap.
(vii) Hamming Bootstrap
Similar to the Bartlett and Blackman methods, the Hamming method uses a specific type of window function. It's useful when you want to reduce the influence of the data points far from the center with the Hamming window shape. It's not recommended for small datasets or when tapering of data points is not desired. It is implemented in HammingBootstrap.
(viii) Hanning Bootstrap
This method also uses a specific type of window function. It's useful when you want to reduce the influence of the data points far from the center with the Hanning window shape. It's not recommended for small datasets or when tapering of data points is not desired. It is implemented in HanningBootstrap.
(ix) Tukey Bootstrap
Similar to the Bartlett, Blackman, Hamming, and Hanning methods, the Tukey method uses a specific type of window function. It's useful when you want to reduce the influence of the data points far from the center with the Tukey window shape. It's not recommended for small datasets or when tapering of data points is not desired. It is implemented in TukeyBootstrap.
Residual Bootstrap
Residual Bootstrap is a method designed for time series data where a model is fit to the data, and the residuals (the difference between the observed and predicted data) are bootstrapped. It's particularly useful when a good model fit is available for the data. However, it's not recommended when a model fit is not available or is poor. tsbootstrap provides time series models through its backend system, supporting AR, ARIMA, SARIMA, and VAR (for multivariate input time series data), as well as automatic model selection with AutoARIMA. For more details, refer to time_series_model.py and the backend system in backends/.
Statistic-Preserving Bootstrap
Statistic-Preserving Bootstrap is a unique method designed to generate bootstrapped time series data while preserving a specific statistic of the original data. This method can be beneficial in scenarios where it's important to maintain the original data's characteristics in the bootstrapped samples. It is implemented in StatisticPreservingBootstrap.
Distribution Bootstrap
Distribution Bootstrap generates bootstrapped samples by fitting a distribution to the residuals and then generating new residuals from the fitted distribution. The new residuals are then added to the fitted values to create the bootstrapped samples. This method is based on the assumption that the residuals follow a specific distribution (like Gaussian, Poisson, etc). It's not recommended when the distribution of residuals is unknown or hard to determine. It is implemented in DistributionBootstrap.
Markov Bootstrap
Markov Bootstrap is used for bootstrapping time series data where the residuals of the data are presumed to follow a Markov process. This method is especially useful in scenarios where the current residual primarily depends on the previous one, with little to no dependency on residuals from further in the past. Markov Bootstrap technique is designed to preserve this dependency structure in the bootstrapped samples, making it particularly valuable for time series data that exhibits Markov properties. However, it's not advisable when the residuals of the time series data exhibit long-range dependencies, as the Markov assumption of limited dependency may not hold true. It is implemented in MarkovBootstrap. See markov_sampler.py for implementation details.
Sieve Bootstrap
Sieve Bootstrap is designed for handling dependent data, where the residuals of the time series data follow an autoregressive process. This method aims to preserve and simulate the dependencies inherent in the original data within the bootstrapped samples. It operates by approximating the autoregressive process ofthe residuals using a finite order autoregressive model. The order of the model is determined based on the data, and the residuals are then bootstrapped. The Sieve Bootstrap technique is particularly valuable for time series data that exhibits autoregressive properties. However, it's not advisable when the residuals of the time series data do not follow an autoregressive process. It is implemented in SieveBootstrap. See time_series_simulator.py for implementations details.
Owner
- Name: Sankalp Gilda
- Login: astrogilda
- Kind: user
- Location: Gainesville, FL
- Website: www.linkedin.com/in/sankalp-gilda/
- Twitter: astrogilda
- Repositories: 141
- Profile: https://github.com/astrogilda
Machine Learning Engineer | Ph.D., Astronomy
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gilda" given-names: "Sankalp" orcid: "https://orcid.org/0000-0002-3645-4501" title: "tsbootstrap" version: 0.1.5 doi: 10.5281/zenodo.8226495 date-released: 2024/04/23 url: "https://github.com/astrogilda/tsbootstrap"
GitHub Events
Total
- Create event: 10
- Issues event: 23
- Release event: 1
- Watch event: 11
- Delete event: 76
- Issue comment event: 25
- Push event: 115
- Pull request review event: 2
- Pull request event: 16
- Fork event: 1
Last Year
- Create event: 10
- Issues event: 23
- Release event: 1
- Watch event: 11
- Delete event: 76
- Issue comment event: 25
- Push event: 115
- Pull request review event: 2
- Pull request event: 16
- Fork event: 1
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 67
- Total pull requests: 111
- Average time to close issues: 16 days
- Average time to close pull requests: 6 days
- Total issue authors: 5
- Total pull request authors: 4
- Average comments per issue: 0.75
- Average comments per pull request: 0.98
- Merged pull requests: 93
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 7
- Pull requests: 16
- Average time to close issues: about 3 hours
- Average time to close pull requests: 6 days
- Issue authors: 3
- Pull request authors: 2
- Average comments per issue: 0.14
- Average comments per pull request: 0.0
- Merged pull requests: 14
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- astrogilda (44)
- fkiraly (22)
- benHeid (3)
- oldrichsmejkal (1)
- greenguy33 (1)
Pull Request Authors
- astrogilda (124)
- fkiraly (69)
- benHeid (9)
- dependabot[bot] (6)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/upload-artifact v2 composite
- codecov/codecov-action v3 composite
- docker://pandoc/core 3.1 composite
- jwalton/gh-find-current-pr v1 composite
- marocchino/sticky-pull-request-comment v2 composite
- wagoid/commitlint-github-action v4 composite
- alabaster 0.7.13
- arch 5.6.0
- attrs 23.1.0
- babel 2.12.1
- black 23.7.0
- blacken-docs 1.15.0
- build 0.10.0
- cachetools 5.3.1
- certifi 2023.7.22
- cfgv 3.3.1
- chardet 5.2.0
- charset-normalizer 3.2.0
- click 8.1.6
- colorama 0.4.6
- contourpy 1.1.0
- coverage 7.2.7
- cycler 0.11.0
- cython 3.0.0
- distlib 0.3.7
- docutils 0.18.1
- exceptiongroup 1.1.2
- filelock 3.12.2
- fonttools 4.42.0
- github-actions 0.0.1
- hmmlearn 0.3.0
- hypothesis 6.82.3
- identify 2.5.26
- idna 3.4
- imagesize 1.4.1
- importlib-metadata 6.8.0
- importlib-resources 6.0.1
- iniconfig 2.0.0
- jinja2 3.1.2
- joblib 1.3.1
- kiwisolver 1.4.4
- llvmlite 0.40.1
- lxml 4.9.3
- markupsafe 2.1.3
- matplotlib 3.7.2
- mypy-extensions 1.0.0
- nodeenv 1.8.0
- numba 0.57.1
- numpy 1.24.4
- packaging 23.1
- pandas 2.0.3
- pathspec 0.11.2
- patsy 0.5.3
- pillow 10.0.0
- pip 23.2.1
- pip-tools 6.13.0
- platformdirs 3.10.0
- pluggy 1.2.0
- pre-commit 3.3.3
- property-cached 1.6.4
- pyclustering 0.10.1.2
- pycobertura 3.2.1
- pygments 2.16.1
- pyparsing 3.0.9
- pyproject-api 1.5.3
- pyproject-hooks 1.0.0
- pyright 1.1.320
- pytest 7.4.0
- pytest-cov 4.1.0
- python-dateutil 2.8.2
- pytz 2023.3
- pyyaml 6.0.1
- requests 2.31.0
- ruamel-yaml 0.17.32
- ruamel-yaml-clib 0.2.7
- ruff 0.0.283
- scikit-learn 1.3.0
- scikit-learn-extra 0.3.0
- scipy 1.9.3
- setuptools 68.0.0
- six 1.16.0
- snowballstemmer 2.2.0
- sortedcontainers 2.4.0
- sphinx 7.1.2
- sphinx-rtd-theme 1.3.0rc1
- sphinxcontrib-applehelp 1.0.4
- sphinxcontrib-devhelp 1.0.2
- sphinxcontrib-htmlhelp 2.0.1
- sphinxcontrib-jquery 4.1
- sphinxcontrib-jsmath 1.0.1
- sphinxcontrib-qthelp 1.0.3
- sphinxcontrib-serializinghtml 1.1.5
- statsmodels 0.14.0
- tabulate 0.9.0
- threadpoolctl 3.2.0
- tomli 2.0.1
- tox 4.6.4
- tox-gh-actions 3.1.3
- typing-extensions 4.7.1
- typos 1.16.2
- tzdata 2023.3
- urllib3 2.0.4
- virtualenv 20.24.2
- wheel 0.41.1
- zipp 3.16.2
- black ~23.7 develop
- blacken-docs ~1.15 develop
- github-actions ~0.0 develop
- hypothesis ~6.82 develop
- pip-tools ~6.13 develop
- pre-commit ~3.3 develop
- pycobertura ~3.2 develop
- pyright ~1.1 develop
- pytest ~7.4 develop
- pytest-cov ~4.1 develop
- ruff ~0.0 develop
- tox ~4.6 develop
- tox-gh-actions ~3.1 develop
- typos ~1.16 develop
- arch ~5.6
- cython ~3.0
- hmmlearn ~0.3
- importlib-metadata ~6.8
- numba ~0.57
- pyclustering ~0.10
- python ^3.8
- scikit_learn_extra ~0.3