theft

R package for Tools for Handling Extraction of Features from Time series (theft)

https://github.com/hendersontrent/theft

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov, springer.com, ieee.org, zenodo.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.8%) to scientific vocabulary

Keywords

feature-extraction machine-learning r time-series
Last synced: 6 months ago · JSON representation

Repository

R package for Tools for Handling Extraction of Features from Time series (theft)

Basic Info
Statistics
  • Stars: 40
  • Watchers: 1
  • Forks: 6
  • Open Issues: 7
  • Releases: 58
Topics
feature-extraction machine-learning r time-series
Created almost 5 years ago · Last pushed 7 months ago
Metadata Files
Readme License

README.Rmd

---
output: rmarkdown::github_document
---

# theft 

[![CRAN version](https://www.r-pkg.org/badges/version/theft)](https://www.r-pkg.org/pkg/theft)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/theft)](https://www.r-pkg.org/pkg/theft)
[![DOI](https://zenodo.org/badge/351259952.svg)](https://zenodo.org/badge/latestdoi/351259952)

Tools for Handling Extraction of Features from Time series (theft)

```{r, include = FALSE}
knitr::opts_chunk$set(
comment = NA, fig.width = 12, fig.height = 8, cache = FALSE)
```

## Installation

You can install the stable version of `theft` from CRAN:

```{r eval = FALSE}
install.packages("theft")
```

You can install the development version of `theft` from GitHub using the following:

```{r eval = FALSE}
devtools::install_github("hendersontrent/theft")
```

Please also check out our paper [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146) which discusses the motivation and theoretical underpinnings of `theft` and walks through all of its functionality using the [Bonn EEG dataset](https://pubmed.ncbi.nlm.nih.gov/11736210/) --- a well-studied neuroscience dataset.

## General purpose

`theft` is a software package for R that facilitates user-friendly access to a consistent interface for the extraction of time-series features. The package provides a single point of access to $>1100$ time-series features from a range of existing R and Python packages as well as enabling users to calculate their own features. The packages which `theft` 'steals' features from currently are:

* [catch22](https://link.springer.com/article/10.1007/s10618-019-00647-x) (R; [see `Rcatch22` for the native implementation on CRAN](https://github.com/hendersontrent/Rcatch22))
* [feasts](https://feasts.tidyverts.org) (R)
* [tsfeatures](https://github.com/robjhyndman/tsfeatures) (R)
* [Kats](https://facebookresearch.github.io/Kats/) (Python)
* [tsfresh](https://tsfresh.com) (Python)
* [TSFEL](https://tsfel.readthedocs.io/en/latest/) (Python)

As of `v0.6.1`, users can also calculate their own individual features or sets of features too! In addition, two basic feature sets `"quantiles"` (a set of 100 quantiles) and `"moments"` (the first four moments of the distribution: mean, variance, skewness, and kurtosis) are also available for users seeking to compute simple baselines against which to compare the more sophisticated feature sets (see [this recent paper](https://arxiv.org/abs/2303.17809) for more discussion on this idea).

Note that `Kats`, `tsfresh` and `TSFEL` are Python packages. `theft` has built-in functionality for helping you install these libraries---all you need to do is install Python on your machine (preferably Python >=3.10). If you wish to access the Python feature sets, please run `?install_python_pkgs` in R after downloading `theft` or consult the vignette in the package for more information. For a comprehensive comparison of these six feature sets across a range of domains (including computation speed, within-set feature composition, and between-set feature correlations), please refer to the paper [An Empirical Evaluation of Time-Series Feature Sets](https://ieeexplore.ieee.org/document/9679937). 

Also note that as of `v0.8.2` parallelisation is supported for `"tsfresh"` and `"tsfel"` (see the vignette for more information)!

## Package extensibility

The companion package [`theftdlc`](https://github.com/hendersontrent/theftdlc) ('`theft` downloadable content'---just like you get [DLCs and expansions](https://en.bandainamcoent.eu/elden-ring/elden-ring/shadow-of-the-erdtree) for video games) contains an extensive suite of functions for analysing, interpreting, and visualising time-series features calculated from `theft`. Collectively, these packages are referred to as the '`theft` ecosystem'.

Hex stickers of the theft and theftdlc packages for R

A high-level overview of how the `theft` ecosystem for R is typically accessed by users is shown below. Note that prior to `v0.6.1` of, many of the `theftdlc` functions were contained in `theft` but under other names. To ensure the `theft` ecosystem is as user-friendly as possible and can scale to meet future demands, `theft` has been refactored to just perform feature extraction, while `theftdlc` handles all the processing, analysis, and visualisation of the extracted features.

Schematic of the theft ecosystem in R

Many more functions and options for customisation are available within the packages and users are encouraged to explore the vignettes and helper files for more information.

## Quick tour

`theft` and `theftdlc` combine to create an intuitive and efficient workflow consistent with the broader [`tidyverts`](https://tidyverts.org) collection of packages for tidy time-series analysis. Here is a single code chunk that calculates features for a [`tsibble`](https://tsibble.tidyverts.org) (tidy temporal data frame) of some simulated time series processes, including Gaussian noise, AR(1), ARMA(1,1), MA(1), noisy sinusoid, and a random walk. `simData` comes with `theft`. We'll just use the [`catch22`](https://github.com/hendersontrent/Rcatch22) feature set and a custom set of mean and standard deviation for now. Using tidy principles and pipes, we can, in the same code chunk, feed the calculated features straight into `theftdlc`'s `project` function to project the 24-dimensional feature space into an interpretable two-dimensional space using principal components analysis:

```{r, message = FALSE, warning = FALSE, fig.height=6, fig.width=6}
library(dplyr)
library(theft)
library(theftdlc)

calculate_features(data = theft::simData, 
                   feature_set = "catch22",
                   features = list("mean" = mean, "sd" = sd)) %>%
  project(norm_method = "RobustSigmoid",
          unit_int = TRUE,
          low_dim_method = "PCA") %>%
  plot()
```

In that example, `calculate_features` comes from `theft`, while `project` and the `plot` generic come from `theftdlc`.

Similarly, we can perform time-series classification using a similar workflow to compare the performance of `catch22` against our custom set of the first two moments of the distribution:

```{r, message = FALSE, warning = FALSE}
calculate_features(data = theft::simData, 
                   feature_set = "catch22",
                   features = list("mean" = mean, "sd" = sd)) %>%
  classify(by_set = TRUE,
           n_resamples = 10,
           use_null = TRUE) %>%
  compare_features(by_set = TRUE,
                   hypothesis = "pairwise") %>%
  head()
```

In this example, `classify` and `compare_features` come from `theftdlc`.

We can also easily see how each set performs relative to an empirical null distribution (i.e., how much better does each set do than we would expect due to chance?):

```{r, message = FALSE, warning = FALSE}
calculate_features(data = theft::simData, 
                   feature_set = "catch22",
                   features = list("mean" = mean, "sd" = sd)) %>%
  classify(by_set = TRUE,
           n_resamples = 10,
           use_null = TRUE) %>%
  compare_features(by_set = TRUE,
                   hypothesis = "null") %>%
  head()
```

Please see the vignette for more information and the full functionality of both packages.

## Citation

If you use `theft` or `theftdlc` in your own work, please cite both the paper:

T. Henderson and Ben D. Fulcher. [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146). arXiv, (2022).

and the software:

```{r, echo = FALSE}
citation("theft")
citation("theftdlc")
```

## Acknowledgements

Big thanks to [Joshua Moore](https://github.com/joshuabmoore) for his assistance in solving issues with the Python side of things, including the correct specification of dependencies for the `install_python_pkgs` function.

Owner

  • Name: Trent Henderson
  • Login: hendersontrent
  • Kind: user
  • Location: Canberra, Australia
  • Company: Nous Group

Senior data scientist and statistics PhD student. Mostly coding in R, Julia, and Stan. Interested in genetic programming, time series, and data vis

GitHub Events

Total
  • Create event: 3
  • Release event: 1
  • Issues event: 6
  • Watch event: 2
  • Push event: 16
  • Pull request event: 2
  • Fork event: 2
Last Year
  • Create event: 3
  • Release event: 1
  • Issues event: 6
  • Watch event: 2
  • Push event: 16
  • Pull request event: 2
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 455
  • Total Committers: 6
  • Avg Commits per committer: 75.833
  • Development Distribution Score (DDS): 0.275
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Trent Henderson t****1@o****m 330
Trent Henderson t****n@T****l 91
Trent Henderson t****n@v****u 23
Annie G. Bryant a****t@g****m 8
Cumol m****y@g****m 2
Trent Henderson t****n@n****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 54
  • Total pull requests: 76
  • Average time to close issues: 2 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 7
  • Total pull request authors: 3
  • Average comments per issue: 0.57
  • Average comments per pull request: 0.14
  • Merged pull requests: 74
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 3
  • Average time to close issues: 6 days
  • Average time to close pull requests: 10 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hendersontrent (44)
  • MislavSag (5)
  • teunbrand (1)
  • Steviey (1)
  • IssyMiddleton (1)
  • windwine (1)
  • Cumol (1)
Pull Request Authors
  • hendersontrent (78)
  • anniegbryant (1)
  • Cumol (1)
Top Labels
Issue Labels
enhancement (17) question (12) bug (11) documentation (2) not-urgent (2) CRAN (2) breaking-change (1)
Pull Request Labels
documentation (19) enhancement (16) CRAN (11) breaking-change (7)

Packages

  • Total packages: 2
  • Total downloads:
    • cran 504 last-month
  • Total docker downloads: 41,971
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 63
  • Total maintainers: 1
proxy.golang.org: github.com/hendersontrent/theft
  • Versions: 50
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 7 months ago
cran.r-project.org: theft

Tools for Handling Extraction of Features from Time Series

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 504 Last month
  • Docker Downloads: 41,971
Rankings
Stargazers count: 12.2%
Forks count: 12.8%
Average: 23.8%
Downloads: 28.6%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/pkgdown.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite
DESCRIPTION cran
  • R >= 3.5.0 depends
  • R.matlab * imports
  • Rcatch22 * imports
  • Rtsne * imports
  • broom * imports
  • caret * imports
  • dplyr * imports
  • fabletools * imports
  • feasts * imports
  • ggplot2 * imports
  • janitor * imports
  • plotly * imports
  • purrr * imports
  • reshape2 * imports
  • reticulate * imports
  • rlang * imports
  • scales * imports
  • stats * imports
  • tibble * imports
  • tidyr * imports
  • tsfeatures * imports
  • tsibble * imports
  • bslib * suggests
  • cachem * suggests
  • knitr * suggests
  • lifecycle * suggests
  • markdown * suggests
  • pkgdown * suggests
  • rmarkdown * suggests
  • testthat * suggests