theft
R package for Tools for Handling Extraction of Features from Time series (theft)
Science Score: 59.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov, springer.com, ieee.org, zenodo.org -
✓Committers with academic emails
1 of 6 committers (16.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.8%) to scientific vocabulary
Keywords
feature-extraction
machine-learning
r
time-series
Last synced: 6 months ago
·
JSON representation
Repository
R package for Tools for Handling Extraction of Features from Time series (theft)
Basic Info
- Host: GitHub
- Owner: hendersontrent
- License: other
- Language: R
- Default Branch: main
- Homepage: https://hendersontrent.github.io/theft/
- Size: 58 MB
Statistics
- Stars: 40
- Watchers: 1
- Forks: 6
- Open Issues: 7
- Releases: 58
Topics
feature-extraction
machine-learning
r
time-series
Created almost 5 years ago
· Last pushed 7 months ago
Metadata Files
Readme
License
README.Rmd
--- output: rmarkdown::github_document --- # theft[](https://www.r-pkg.org/pkg/theft) [](https://www.r-pkg.org/pkg/theft) [](https://zenodo.org/badge/latestdoi/351259952) Tools for Handling Extraction of Features from Time series (theft) ```{r, include = FALSE} knitr::opts_chunk$set( comment = NA, fig.width = 12, fig.height = 8, cache = FALSE) ``` ## Installation You can install the stable version of `theft` from CRAN: ```{r eval = FALSE} install.packages("theft") ``` You can install the development version of `theft` from GitHub using the following: ```{r eval = FALSE} devtools::install_github("hendersontrent/theft") ``` Please also check out our paper [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146) which discusses the motivation and theoretical underpinnings of `theft` and walks through all of its functionality using the [Bonn EEG dataset](https://pubmed.ncbi.nlm.nih.gov/11736210/) --- a well-studied neuroscience dataset. ## General purpose `theft` is a software package for R that facilitates user-friendly access to a consistent interface for the extraction of time-series features. The package provides a single point of access to $>1100$ time-series features from a range of existing R and Python packages as well as enabling users to calculate their own features. The packages which `theft` 'steals' features from currently are: * [catch22](https://link.springer.com/article/10.1007/s10618-019-00647-x) (R; [see `Rcatch22` for the native implementation on CRAN](https://github.com/hendersontrent/Rcatch22)) * [feasts](https://feasts.tidyverts.org) (R) * [tsfeatures](https://github.com/robjhyndman/tsfeatures) (R) * [Kats](https://facebookresearch.github.io/Kats/) (Python) * [tsfresh](https://tsfresh.com) (Python) * [TSFEL](https://tsfel.readthedocs.io/en/latest/) (Python) As of `v0.6.1`, users can also calculate their own individual features or sets of features too! In addition, two basic feature sets `"quantiles"` (a set of 100 quantiles) and `"moments"` (the first four moments of the distribution: mean, variance, skewness, and kurtosis) are also available for users seeking to compute simple baselines against which to compare the more sophisticated feature sets (see [this recent paper](https://arxiv.org/abs/2303.17809) for more discussion on this idea). Note that `Kats`, `tsfresh` and `TSFEL` are Python packages. `theft` has built-in functionality for helping you install these libraries---all you need to do is install Python on your machine (preferably Python >=3.10). If you wish to access the Python feature sets, please run `?install_python_pkgs` in R after downloading `theft` or consult the vignette in the package for more information. For a comprehensive comparison of these six feature sets across a range of domains (including computation speed, within-set feature composition, and between-set feature correlations), please refer to the paper [An Empirical Evaluation of Time-Series Feature Sets](https://ieeexplore.ieee.org/document/9679937). Also note that as of `v0.8.2` parallelisation is supported for `"tsfresh"` and `"tsfel"` (see the vignette for more information)! ## Package extensibility The companion package [`theftdlc`](https://github.com/hendersontrent/theftdlc) ('`theft` downloadable content'---just like you get [DLCs and expansions](https://en.bandainamcoent.eu/elden-ring/elden-ring/shadow-of-the-erdtree) for video games) contains an extensive suite of functions for analysing, interpreting, and visualising time-series features calculated from `theft`. Collectively, these packages are referred to as the '`theft` ecosystem'.
A high-level overview of how the `theft` ecosystem for R is typically accessed by users is shown below. Note that prior to `v0.6.1` of, many of the `theftdlc` functions were contained in `theft` but under other names. To ensure the `theft` ecosystem is as user-friendly as possible and can scale to meet future demands, `theft` has been refactored to just perform feature extraction, while `theftdlc` handles all the processing, analysis, and visualisation of the extracted features.
Many more functions and options for customisation are available within the packages and users are encouraged to explore the vignettes and helper files for more information. ## Quick tour `theft` and `theftdlc` combine to create an intuitive and efficient workflow consistent with the broader [`tidyverts`](https://tidyverts.org) collection of packages for tidy time-series analysis. Here is a single code chunk that calculates features for a [`tsibble`](https://tsibble.tidyverts.org) (tidy temporal data frame) of some simulated time series processes, including Gaussian noise, AR(1), ARMA(1,1), MA(1), noisy sinusoid, and a random walk. `simData` comes with `theft`. We'll just use the [`catch22`](https://github.com/hendersontrent/Rcatch22) feature set and a custom set of mean and standard deviation for now. Using tidy principles and pipes, we can, in the same code chunk, feed the calculated features straight into `theftdlc`'s `project` function to project the 24-dimensional feature space into an interpretable two-dimensional space using principal components analysis: ```{r, message = FALSE, warning = FALSE, fig.height=6, fig.width=6} library(dplyr) library(theft) library(theftdlc) calculate_features(data = theft::simData, feature_set = "catch22", features = list("mean" = mean, "sd" = sd)) %>% project(norm_method = "RobustSigmoid", unit_int = TRUE, low_dim_method = "PCA") %>% plot() ``` In that example, `calculate_features` comes from `theft`, while `project` and the `plot` generic come from `theftdlc`. Similarly, we can perform time-series classification using a similar workflow to compare the performance of `catch22` against our custom set of the first two moments of the distribution: ```{r, message = FALSE, warning = FALSE} calculate_features(data = theft::simData, feature_set = "catch22", features = list("mean" = mean, "sd" = sd)) %>% classify(by_set = TRUE, n_resamples = 10, use_null = TRUE) %>% compare_features(by_set = TRUE, hypothesis = "pairwise") %>% head() ``` In this example, `classify` and `compare_features` come from `theftdlc`. We can also easily see how each set performs relative to an empirical null distribution (i.e., how much better does each set do than we would expect due to chance?): ```{r, message = FALSE, warning = FALSE} calculate_features(data = theft::simData, feature_set = "catch22", features = list("mean" = mean, "sd" = sd)) %>% classify(by_set = TRUE, n_resamples = 10, use_null = TRUE) %>% compare_features(by_set = TRUE, hypothesis = "null") %>% head() ``` Please see the vignette for more information and the full functionality of both packages. ## Citation If you use `theft` or `theftdlc` in your own work, please cite both the paper: T. Henderson and Ben D. Fulcher. [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146). arXiv, (2022). and the software: ```{r, echo = FALSE} citation("theft") citation("theftdlc") ``` ## Acknowledgements Big thanks to [Joshua Moore](https://github.com/joshuabmoore) for his assistance in solving issues with the Python side of things, including the correct specification of dependencies for the `install_python_pkgs` function.
Owner
- Name: Trent Henderson
- Login: hendersontrent
- Kind: user
- Location: Canberra, Australia
- Company: Nous Group
- Website: https://www.orbisantanalytics.com/
- Twitter: trentlikesstats
- Repositories: 29
- Profile: https://github.com/hendersontrent
Senior data scientist and statistics PhD student. Mostly coding in R, Julia, and Stan. Interested in genetic programming, time series, and data vis
GitHub Events
Total
- Create event: 3
- Release event: 1
- Issues event: 6
- Watch event: 2
- Push event: 16
- Pull request event: 2
- Fork event: 2
Last Year
- Create event: 3
- Release event: 1
- Issues event: 6
- Watch event: 2
- Push event: 16
- Pull request event: 2
- Fork event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Trent Henderson | t****1@o****m | 330 |
| Trent Henderson | t****n@T****l | 91 |
| Trent Henderson | t****n@v****u | 23 |
| Annie G. Bryant | a****t@g****m | 8 |
| Cumol | m****y@g****m | 2 |
| Trent Henderson | t****n@n****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 54
- Total pull requests: 76
- Average time to close issues: 2 months
- Average time to close pull requests: 1 day
- Total issue authors: 7
- Total pull request authors: 3
- Average comments per issue: 0.57
- Average comments per pull request: 0.14
- Merged pull requests: 74
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 3
- Average time to close issues: 6 days
- Average time to close pull requests: 10 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hendersontrent (44)
- MislavSag (5)
- teunbrand (1)
- Steviey (1)
- IssyMiddleton (1)
- windwine (1)
- Cumol (1)
Pull Request Authors
- hendersontrent (78)
- anniegbryant (1)
- Cumol (1)
Top Labels
Issue Labels
enhancement (17)
question (12)
bug (11)
documentation (2)
not-urgent (2)
CRAN (2)
breaking-change (1)
Pull Request Labels
documentation (19)
enhancement (16)
CRAN (11)
breaking-change (7)
Packages
- Total packages: 2
-
Total downloads:
- cran 504 last-month
- Total docker downloads: 41,971
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 63
- Total maintainers: 1
proxy.golang.org: github.com/hendersontrent/theft
- Documentation: https://pkg.go.dev/github.com/hendersontrent/theft#section-documentation
- License: other
-
Latest release: v0.8.1
published 8 months ago
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced:
7 months ago
cran.r-project.org: theft
Tools for Handling Extraction of Features from Time Series
- Homepage: https://hendersontrent.github.io/theft/
- Documentation: http://cran.r-project.org/web/packages/theft/theft.pdf
- License: MIT + file LICENSE
-
Latest release: 0.8.2
published 7 months ago
Rankings
Stargazers count: 12.2%
Forks count: 12.8%
Average: 23.8%
Downloads: 28.6%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Maintainers (1)
Last synced:
6 months ago
Dependencies
.github/workflows/pkgdown.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
- r-lib/actions/setup-r-dependencies v1 composite
DESCRIPTION
cran
- R >= 3.5.0 depends
- R.matlab * imports
- Rcatch22 * imports
- Rtsne * imports
- broom * imports
- caret * imports
- dplyr * imports
- fabletools * imports
- feasts * imports
- ggplot2 * imports
- janitor * imports
- plotly * imports
- purrr * imports
- reshape2 * imports
- reticulate * imports
- rlang * imports
- scales * imports
- stats * imports
- tibble * imports
- tidyr * imports
- tsfeatures * imports
- tsibble * imports
- bslib * suggests
- cachem * suggests
- knitr * suggests
- lifecycle * suggests
- markdown * suggests
- pkgdown * suggests
- rmarkdown * suggests
- testthat * suggests
[](https://www.r-pkg.org/pkg/theft)
[](https://www.r-pkg.org/pkg/theft)
[](https://zenodo.org/badge/latestdoi/351259952)
Tools for Handling Extraction of Features from Time series (theft)
```{r, include = FALSE}
knitr::opts_chunk$set(
comment = NA, fig.width = 12, fig.height = 8, cache = FALSE)
```
## Installation
You can install the stable version of `theft` from CRAN:
```{r eval = FALSE}
install.packages("theft")
```
You can install the development version of `theft` from GitHub using the following:
```{r eval = FALSE}
devtools::install_github("hendersontrent/theft")
```
Please also check out our paper [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146) which discusses the motivation and theoretical underpinnings of `theft` and walks through all of its functionality using the [Bonn EEG dataset](https://pubmed.ncbi.nlm.nih.gov/11736210/) --- a well-studied neuroscience dataset.
## General purpose
`theft` is a software package for R that facilitates user-friendly access to a consistent interface for the extraction of time-series features. The package provides a single point of access to $>1100$ time-series features from a range of existing R and Python packages as well as enabling users to calculate their own features. The packages which `theft` 'steals' features from currently are:
* [catch22](https://link.springer.com/article/10.1007/s10618-019-00647-x) (R; [see `Rcatch22` for the native implementation on CRAN](https://github.com/hendersontrent/Rcatch22))
* [feasts](https://feasts.tidyverts.org) (R)
* [tsfeatures](https://github.com/robjhyndman/tsfeatures) (R)
* [Kats](https://facebookresearch.github.io/Kats/) (Python)
* [tsfresh](https://tsfresh.com) (Python)
* [TSFEL](https://tsfel.readthedocs.io/en/latest/) (Python)
As of `v0.6.1`, users can also calculate their own individual features or sets of features too! In addition, two basic feature sets `"quantiles"` (a set of 100 quantiles) and `"moments"` (the first four moments of the distribution: mean, variance, skewness, and kurtosis) are also available for users seeking to compute simple baselines against which to compare the more sophisticated feature sets (see [this recent paper](https://arxiv.org/abs/2303.17809) for more discussion on this idea).
Note that `Kats`, `tsfresh` and `TSFEL` are Python packages. `theft` has built-in functionality for helping you install these libraries---all you need to do is install Python on your machine (preferably Python >=3.10). If you wish to access the Python feature sets, please run `?install_python_pkgs` in R after downloading `theft` or consult the vignette in the package for more information. For a comprehensive comparison of these six feature sets across a range of domains (including computation speed, within-set feature composition, and between-set feature correlations), please refer to the paper [An Empirical Evaluation of Time-Series Feature Sets](https://ieeexplore.ieee.org/document/9679937).
Also note that as of `v0.8.2` parallelisation is supported for `"tsfresh"` and `"tsfel"` (see the vignette for more information)!
## Package extensibility
The companion package [`theftdlc`](https://github.com/hendersontrent/theftdlc) ('`theft` downloadable content'---just like you get [DLCs and expansions](https://en.bandainamcoent.eu/elden-ring/elden-ring/shadow-of-the-erdtree) for video games) contains an extensive suite of functions for analysing, interpreting, and visualising time-series features calculated from `theft`. Collectively, these packages are referred to as the '`theft` ecosystem'.
A high-level overview of how the `theft` ecosystem for R is typically accessed by users is shown below. Note that prior to `v0.6.1` of, many of the `theftdlc` functions were contained in `theft` but under other names. To ensure the `theft` ecosystem is as user-friendly as possible and can scale to meet future demands, `theft` has been refactored to just perform feature extraction, while `theftdlc` handles all the processing, analysis, and visualisation of the extracted features.
Many more functions and options for customisation are available within the packages and users are encouraged to explore the vignettes and helper files for more information.
## Quick tour
`theft` and `theftdlc` combine to create an intuitive and efficient workflow consistent with the broader [`tidyverts`](https://tidyverts.org) collection of packages for tidy time-series analysis. Here is a single code chunk that calculates features for a [`tsibble`](https://tsibble.tidyverts.org) (tidy temporal data frame) of some simulated time series processes, including Gaussian noise, AR(1), ARMA(1,1), MA(1), noisy sinusoid, and a random walk. `simData` comes with `theft`. We'll just use the [`catch22`](https://github.com/hendersontrent/Rcatch22) feature set and a custom set of mean and standard deviation for now. Using tidy principles and pipes, we can, in the same code chunk, feed the calculated features straight into `theftdlc`'s `project` function to project the 24-dimensional feature space into an interpretable two-dimensional space using principal components analysis:
```{r, message = FALSE, warning = FALSE, fig.height=6, fig.width=6}
library(dplyr)
library(theft)
library(theftdlc)
calculate_features(data = theft::simData,
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
project(norm_method = "RobustSigmoid",
unit_int = TRUE,
low_dim_method = "PCA") %>%
plot()
```
In that example, `calculate_features` comes from `theft`, while `project` and the `plot` generic come from `theftdlc`.
Similarly, we can perform time-series classification using a similar workflow to compare the performance of `catch22` against our custom set of the first two moments of the distribution:
```{r, message = FALSE, warning = FALSE}
calculate_features(data = theft::simData,
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
classify(by_set = TRUE,
n_resamples = 10,
use_null = TRUE) %>%
compare_features(by_set = TRUE,
hypothesis = "pairwise") %>%
head()
```
In this example, `classify` and `compare_features` come from `theftdlc`.
We can also easily see how each set performs relative to an empirical null distribution (i.e., how much better does each set do than we would expect due to chance?):
```{r, message = FALSE, warning = FALSE}
calculate_features(data = theft::simData,
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
classify(by_set = TRUE,
n_resamples = 10,
use_null = TRUE) %>%
compare_features(by_set = TRUE,
hypothesis = "null") %>%
head()
```
Please see the vignette for more information and the full functionality of both packages.
## Citation
If you use `theft` or `theftdlc` in your own work, please cite both the paper:
T. Henderson and Ben D. Fulcher. [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146). arXiv, (2022).
and the software:
```{r, echo = FALSE}
citation("theft")
citation("theftdlc")
```
## Acknowledgements
Big thanks to [Joshua Moore](https://github.com/joshuabmoore) for his assistance in solving issues with the Python side of things, including the correct specification of dependencies for the `install_python_pkgs` function.