https://github.com/filippomb/time-series-cluster-kernel

Kernel similarity for classification and clustering of multi-variate time series with missing values.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary

Keywords

kernel-methods missing-data multivariate-timeseries time-series time-series-classification time-series-clustering

Last synced: 5 months ago · JSON representation

Repository

Kernel similarity for classification and clustering of multi-variate time series with missing values.

Basic Info

Host: GitHub
Owner: FilippoMB
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://doi.org/10.1016/j.patcog.2017.11.030
Size: 4.63 MB

Statistics

Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

kernel-methods missing-data multivariate-timeseries time-series time-series-classification time-series-clustering

Created almost 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

The Time Series Cluster Kernel (TCK) is a kernel similarity for multivariate time series with missing values. Once computed, the kernel can be used to perform tasks such as classification, clustering, and dimensionality reduction.

TCK is based on an ensemble of Gaussian Mixture Models (GMMs) for time series. The GMMs use time-varying means to handle time dependencies and informative Bayesian priors to handle missing values. The similarity between two time series is proportional to the number of times they are assigned to the same mixtures.

Installation

The recommended installation is with pip:

bash pip install tck

Alternatively, you can install the library from source: bash git clone https://github.com/FilippoMB/https://github.com/FilippoMB/Time-Series-Cluster-Kernel.git cd https://github.com/FilippoMB/Time-Series-Cluster-Kernel pip install -e .

Quick start

The following scripts provide minimalistic examples that illustrate how to use the library for different tasks.

To run them, download the project and cd to the root folder:

bash git clone https://github.com/FilippoMB/https://github.com/FilippoMB/Time-Series-Cluster-Kernel.git cd https://github.com/FilippoMB/Time-Series-Cluster-Kernel

Classification

bash python examples/classification.py

Clustering

bash python examples/clustering.py

The following notebooks illustrate more advanced use-cases.

Perform time series dimensionality reduction, cluster analysis, and visualize the results: or

⚠ Running on Windows

TCK uses multiprocessing. While using multiprocessing in Python on Windows, it is necessary to protect the entry point of the program by using

python if __name__ == '__main__':

Please, refer to the following examples.

Classification

bash python examples/classification_windows.py

Clustering

bash python examples/clustering_windows.py

Datasets

Data format - TCK works both with univariate and multivariate time series. The dataset must be stored in a numpy array of shape [N, T, V], where N is the number of variables, T is the number of time steps, and V is the number of variables (V=1 in the univariate case). - If the time series in the same dataset have a different number of time steps, T corresponds to the maximum length of the time series in the dataset. All the time series shorter than T should be padded with trailing zeros to match the dimension T. Alternatively, one can use interpolation to stretch the shorter time series up to length T. - The time series can contain missing data. Missing data dare indicated by entries np.nan in the data array.

Available datasets

There are several univariate and multivariate time series classification datasets immediately available for test and benchmarking purposes.

To list of available datasets can be retrieved as follows python from tck.datasets import DataLoader downloader.available_datasets(details=True) # Leave at False to just get the names

A dataset can be loaded as follows python Xtr, Ytr, Xte, Yte = downloader.get_data('Japanese_Vowels')

Configuration and detailed usage

There are few hyperparameters that can be tuned to modify the TCK behavior.

python tck = TCK(G, C) - G is the number of GMMs. - C is the number of components in the GMMs.

Usually, the higher the better but the computations take longer.

python tck.fit(X, minN, minV, maxV, minT, maxT, I)

minN: Minimum percentage of samples to be used in the training of the GMMs. minV: Minimum number of attributes to be sampled from the dataset. maxV: Maximum number of attributes to be sampled from the dataset. minT: Minimum length of time segments to be sampled from the dataset. maxT: Maximum length of time segments to be sampled from the dataset. I: Number of iterations for the MAP-EM algorithm.

These parameters are usually less sensitive and can be left to their default value in most cases.

python Ktr = tck.predict(mode='tr-tr') Kte = tck.predict(Xte=Xte, mode='tr-te') - If mode='tr-tr', returns the similarity matrix between training samples, i.e., Ktr[i,j] is the similarity between time series i and j in the training set. - If mode='tr-te', it is necessary to pass the test set Xte as additional imput. The returned similarity matrix Kte[i,j] is the similarity between time series i in the test set and time series j in the training set.

Citation

Please, consider citing the original paper if you are using TCK in your reasearch.

bibtex @article{mikalsen2018time, title={Time series cluster kernel for learning similarities between multivariate time series with missing data}, author={Mikalsen, Karl {\O}yvind and Bianchi, Filippo Maria and Soguero-Ruiz, Cristina and Jenssen, Robert}, journal={Pattern Recognition}, volume={76}, pages={569--581}, year={2018}, publisher={Elsevier} }

Owner

Name: Filippo Maria Bianchi
Login: FilippoMB
Kind: user
Location: Tromsø
Company: UiT the Arctic University of Norway

Website: https://sites.google.com/view/filippombianchi/home
Repositories: 8
Profile: https://github.com/FilippoMB

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Packages

Total packages: 1
Total downloads:
- pypi 87 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 8
Total maintainers: 1

pypi.org: tck

Kernel similarity for classification and clustering of multi-variate time series with missing values.

Homepage: https://github.com/FilippoMB/Time-Series-Cluster-Kernel
Documentation: https://tck.readthedocs.io/
License: MIT License
Latest release: 1.0.0
published almost 2 years ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 87 Last month

Rankings

Dependent packages count: 9.7%

Average: 36.7%

Dependent repos count: 63.8%

Maintainers (1)

FilippoMB

Last synced: 7 months ago

Dependencies

setup.py pypi

numpy >1.19.5
requests *
scikit_learn >=1.4
scipy *
tqdm *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/filippomb/time-series-cluster-kernel

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Installation

Quick start

⚠ Running on Windows

Datasets

Configuration and detailed usage

Citation

Owner

GitHub Events

Total

Last Year

Packages

pypi.org: tck

Rankings

Maintainers (1)

Dependencies