tsfeaturex

tsfeaturex: An R Package for Automating Time Series Feature Extraction - Published in JOSS (2019)

https://github.com/nelsonroque/tsfeaturex

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Economics Social Sciences - 60% confidence
Last synced: 4 months ago · JSON representation

Repository

Extract many time series features. Inspired by https://github.com/blue-yonder/tsfresh

Basic Info
  • Host: GitHub
  • Owner: nelsonroque
  • License: gpl-3.0
  • Language: HTML
  • Default Branch: master
  • Size: 1.26 MB
Statistics
  • Stars: 18
  • Watchers: 2
  • Forks: 6
  • Open Issues: 1
  • Releases: 2
Created almost 7 years ago · Last pushed over 5 years ago
Metadata Files
Readme License

README.md

R package: tsfeaturex

DOI DOI

Description

Calculate many features (over 50) of a time series. Click here to view the full feature list

Dependencies

  • tidyverse (https://www.tidyverse.org/)

Imports

  • stats (https://www.R-project.org/)
  • psych (https://CRAN.R-project.org/package=psych)
  • e1071 (https://CRAN.R-project.org/package=e1071)
  • entropy (https://CRAN.R-project.org/package=entropy)
  • Langevin (https://CRAN.R-project.org/package=Langevin)
  • Hmisc (https://CRAN.R-project.org/package=Hmisc)
  • forecast (https://CRAN.R-project.org/package=forecast)
  • zoo (https://CRAN.R-project.org/package=zoo)
  • viridis (https://CRAN.R-project.org/package=viridis)

Acknowledgements

Special Thanks

  • Inspiration for automatic feature extraction: https://github.com/blue-yonder/tsfresh
  • Dr. Nilam Ram for code on probability of acute change
  • Github user 'stas-g' for peak-finding code: https://github.com/stas-g/findPeaks

Funding

  • Nelson Roque was supported by National Institute on Aging Grant T32 AG049676 to The Pennsylvania State University.

Roadmap

  • Push to CRAN (June 2019)
  • Extracting numerical features from text data (Q2 2019)
  • More features (Fast Fourier Transform (FFT), Time-Series Components (Seasonality, trend, random), Friedrich coefficients (Q3 2019)
  • Extracting numerical features from image data (Q4 2019)

Statement of Need

In today's digital world, data collection and storage costs are quite low. Humans are collectively outputting 2.5 quintillion bytes of data every day; by 2020, each person will generate ~ 1.7 MB every second [@ibmstats]. At this scale, intensive longitudinal data about humans' behavior facilitates new discovery about the patterning of thought and action and potentially better prediction and optimization of health and well-being. In raw, form the 2.5 quintillion bytes of raw data generated daily are difficult to interpret -- noisy time-series. Extraction of features from the time-series, however, allows:

  1. Researchers to reduce the dimensionality of their time-series data (e.g., reducing millions of time-stamped observations to, for example, summary feature vector of length 100);

  2. Summary characterizations of time-series data that may be used as predictors, correlates, or outcomes in study of between-person differences; and

  3. Improved and detailed description of human behavior streams (e.g., characterizing a behavioral time series in terms of its features; the mean is 'X', the range is 'Y', the peaks are at 'T12' and 'T30').

Short data streams are easily summarized using basic features (e.g., mean, standard deviation, IQR). However, as the time-series get longer, numerous other features may be needed and/or can be accessed. Study of intraindividual variability has outlined the wide variety of time-series features that can be used to characterize between-person differences and within-person change - with features such as probability of acute change (PAC) or mean square of successive differences (MSSD) providing useful information about individuals' cognitive, emotional, and behavioral dynamics.

Using tsfeaturex

Changelog

Click here to view the change log

Installation:

devtools::install_github("nelsonroque/tsfeaturex")

Usage:

```r{echo=true}

load library

library(tsfeaturex)

for reproducibility of this example

set.seed(516)

create test data

dat <- data.frame(expand.grid(day=c(1:7),id=c(1:100))) dat$y <- rnorm(nrow(dat),5,1.5) dat$y[1:3] <- NA # introduce NAs to check

run function

out.list <- extractfeatures(df=dat,groupvar="id",value_var="y",features="all")

convert list to data.frame (MapReduce)

final.df <- featurestodf(out.list, data.format="wide", group_var = "id")

get feature correlations

cor.df <- featurecorrelations(final.df, data.format="wide", idvar = "id")

view results

View(final.df) ```

Report a bug

Click here to file an issue on Github or feel free to reach out directly

Request a New Feature

Click here to request a new feature on Github or feel free to reach out directly

Owner

  • Name: Nelson Roque
  • Login: nelsonroque
  • Kind: user
  • Location: Orlando, FL

Assistant Professor - Human Factors & Cognitive Psychology

JOSS Publication

tsfeaturex: An R Package for Automating Time Series Feature Extraction
Published
May 31, 2019
Volume 4, Issue 37, Page 1279
Authors
Nelson A. Roque ORCID
T32 Postdoctoral Fellow, Pennsylvania State University, Center for Healthy Aging, University Park, PA, USA
Nilam Ram ORCID
Professor, Pennsylvania State University, Human Development & Family Studies, University Park, PA, USA
Editor
Juanjo Bazán ORCID
Tags
time series dynamics variability intra-individual variability

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 95
  • Total Committers: 2
  • Avg Commits per committer: 47.5
  • Development Distribution Score (DDS): 0.021
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Nelson Roque n****r@g****m 93
Kyle Niemeyer k****r@g****m 2

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 0
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • kyleniemeyer (2)
  • uribo (1)
Top Labels
Issue Labels
Pull Request Labels