dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)

https://github.com/wannesm/dtaidistance

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 19 committers (5.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary

Keywords

c clustering distance-measure dtw dynamic-time-warping python timeseries

Keywords from Contributors

interactive graph-libraries notebook network-simulation hacking multi-agents application agents embedded optim
Last synced: 6 months ago · JSON representation ·

Repository

Time series distances: Dynamic Time Warping (fast DTW implementation in C)

Basic Info
  • Host: GitHub
  • Owner: wannesm
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 1.42 MB
Statistics
  • Stars: 1,185
  • Watchers: 28
  • Forks: 188
  • Open Issues: 21
  • Releases: 8
Topics
c clustering distance-measure dtw dynamic-time-warping python timeseries
Created about 9 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation Authors

README.md

PyPi Version Conda Version Documentation Status DOI

Time Series Distances

Library for time series distances (e.g. Dynamic Time Warping) used in the DTAI Research Group. The library offers a pure Python implementation and a fast implementation in C. The C implementation has only Cython as a dependency. It is compatible with Numpy and Pandas and implemented such that unnecessary data copy operations are avoided.

Documentation: http://dtaidistance.readthedocs.io

Example:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0.0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2)

Citing this work:

Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck, Pieter Robberechts, Hendrik Blockeel & Jesse Davis.
DTAIDistance (Version v2). Zenodo.
http://doi.org/10.5281/zenodo.5901139

New in v2:

  • Numpy is now an optional dependency, also to compile the C library (only Cython is required).
  • Small optimizations throughout the C code to improve speed.
  • The consistent use of ssize_t instead of int allows for larger data structures on 64 bit machines and be more compatible with Numpy.
  • The parallelization is now implemented directly in C (included if OpenMP is installed).
  • The max_dist argument turned out to be similar to Silva and Batista's work on PrunedDTW [7]. The toolbox now implements a version that is equal to PrunedDTW since it prunes more partial distances. Additionally, a use_pruning argument is added to automatically set max_dist to the Euclidean distance, as suggested by Silva and Batista, to speed up the computation (a new method ub_euclidean is available).
  • Support in the C library for multi-dimensional sequences in the dtaidistance.dtw_ndim package.
  • DTW Barycenter Averaging for clustering (v2.2).
  • Subsequence search and local concurrences (v2.3).
  • Support for N-dimensional time series (v2.3.7).

Installation

$ pip install dtaidistance

or

$ conda install -c conda-forge dtaidistance

The pip installation requires Numpy as a dependency to compile Numpy-compatible C code (using Cython). However, this dependency is optional and can be removed.

The source code is available at github.com/wannesm/dtaidistance.

If you encounter any problems during compilation (e.g. the C-based implementation or OpenMP is not available), see the documentation for more options.

Usage

Dynamic Time Warping (DTW) Distance Measure

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import numpy as np
s1 = np.array([0., 0, 1, 2, 1, 0, 1, 0, 0, 2, 1, 0, 0])
s2 = np.array([0., 1, 2, 3, 1, 0, 0, 0, 2, 1, 0, 0, 0])
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path, filename="warp.png")

Dynamic Time Warping (DTW) Example

DTW Distance Measure Between Two Series

Only the distance measure based on two sequences of numbers:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance = dtw.distance(s1, s2)
print(distance)

The fastest version (30-300 times) uses c directly but requires an array as input (with the double type), and (optionally) also prunes computations by setting max_dist to the Euclidean upper bound:

from dtaidistance import dtw
import array
s1 = array.array('d',[0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = array.array('d',[0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Or you can use a numpy array (with dtype double or float):

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Check the __doc__ for information about the available arguments:

print(dtw.distance.__doc__)

A number of options are foreseen to early stop some paths the dynamic programming algorithm is exploring or tune the distance measure computation:

  • window: Only allow for shifts up to this amount away from the two diagonals.
  • max_dist: Stop if the returned distance measure will be larger than this value.
  • max_step: Do not allow steps larger than this value.
  • max_length_diff: Return infinity if difference in length of two series is larger.
  • penalty: Penalty to add if compression or expansion is applied (on top of the distance).
  • psi: Psi relaxation to ignore begin and/or end of sequences (for cylical sequences) [2].
  • use_pruning: Prune computations based on the Euclidean upper bound.

DTW Distance Measure all warping paths

If, next to the distance, you also want the full matrix to see all possible warping paths:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance, paths = dtw.warping_paths(s1, s2)
print(distance)
print(paths)

The matrix with all warping paths can be visualised as follows:

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import random
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
random.seed(1)
for idx in range(len(s2)):
    if random.random() < 0.05:
        s2[idx] += (random.random() - 0.5) / 2
d, paths = dtw.warping_paths(s1, s2, window=25, psi=2)
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

DTW Example

Notice the psi parameter that relaxes the matching at the beginning and end. In this example this results in a perfect match even though the sine waves are slightly shifted.

DTW Distance Measures Between Set of Series

To compute the DTW distance measures between all sequences in a list of sequences, use the method dtw.distance_matrix. You can set variables to use more or less c code (use_c and use_nogil) and parallel or serial execution (parallel).

The distance_matrix method expects a list of lists/arrays:

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

or a matrix (in case all series have the same length):

from dtaidistance import dtw
import numpy as np
series = np.matrix([
    [0.0, 0, 1, 2, 1, 0, 1, 0, 0],
    [0.0, 1, 2, 0, 0, 0, 0, 0, 0],
    [0.0, 0, 1, 2, 1, 0, 0, 0, 0]])
ds = dtw.distance_matrix_fast(series)

DTW Distance Measures Between Set of Series, limited to block

You can instruct the computation to only fill part of the distance measures matrix. For example to distribute the computations over multiple nodes, or to only compare source series to target series.

from dtaidistance import dtw
import numpy as np
series = np.matrix([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])
ds = dtw.distance_matrix_fast(series, block=((1, 4), (3, 5)))

The output in this case will be:

#  0     1    2    3       4       5
[[ inf   inf  inf     inf     inf  inf]    # 0
 [ inf   inf  inf  1.4142  0.0000  inf]    # 1
 [ inf   inf  inf  2.2360  1.7320  inf]    # 2
 [ inf   inf  inf     inf  1.4142  inf]    # 3
 [ inf   inf  inf     inf     inf  inf]    # 4
 [ inf   inf  inf     inf     inf  inf]]   # 5

Clustering

A distance matrix can be used for time series clustering. You can use existing methods such as scipy.cluster.hierarchy.linkage or one of two included clustering methods (the latter is a wrapper for the SciPy linkage method).

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)
# Augment Hierarchical object to keep track of the full tree
model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

For models that keep track of the full clustering tree (HierarchicalTree or LinkageTree), the tree can be visualised:

model.plot("myplot.png")

Dynamic Time Warping (DTW) hierarchical clusteringt

Subsequence search

DTAIDistance supports various subsequence search algorithms like Subsequence Alignment, Subsequence KNN Search and Local Concurrences. See the documentation for more information.

Motif Discovery

While methods such as dtw.distance_matrix and subsequence.subsequencesearch can be used for motif discovery in time series (after windowing), a more efficient and effective algorithm based on time warping is available in the LoCoMotif package.

Dependencies

Optional:

Development:

Contact

  • https://people.cs.kuleuven.be/wannes.meert

References

  1. T. K. Vintsyuk, Speech discrimination by dynamic programming. Kibernetika, 4:81–88, 1968.
  2. H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.
  3. C. S. Myers and L. R. Rabiner, A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal, 60(7):1389–1409, Sept 1981.
  4. Mueen, A and Keogh, E, Extracting Optimal Performance from Dynamic Time Warping, Tutorial, KDD 2016
  5. D. F. Silva, G. E. A. P. A. Batista, and E. Keogh. On the effect of endpoints on dynamic time warping, In SIGKDD Workshop on Mining and Learning from Time Series, II. Association for Computing Machinery-ACM, 2016.
  6. C. Yanping, K. Eamonn, H. Bing, B. Nurjahan, B. Anthony, M. Abdullah and B. Gustavo. The UCR Time Series Classification Archive, 2015.
  7. D. F. Silva and G. E. Batista. Speeding up all-pairwise dynamic time warping matrix calculation, In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 837–845. SIAM, 2016.

License

DTAI distance code.

Copyright 2016-2022 KU Leuven, DTAI Research Group

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Owner

  • Login: wannesm
  • Kind: user
  • Location: Leuven, Belgium
  • Company: KU Leuven

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Meert"
  given-names: "Wannes"
  orcid: "https://orcid.org/0000-0001-9560-3872"
- family-names: "Hendrickx"
  given-names: "Kilian"
  orcid: "https://orcid.org/0000-0002-7335-4542"
- family-names: "Van Craenendonck"
  given-names: "Toon"
  orcid: "https://orcid.org/0000-0002-2175-3293"
- family-names: "Robberechts"
  given-names: "Pieter"
  orcid: "https://orcid.org/0000-0002-3734-0047"
- family-names: "Blockeel"
  given-names: "Hendrik"
  orcid: "https://orcid.org/0000-0003-0378-3699"
- family-names: "Davis"
  given-names: "Jesse"
  orcid: "https://orcid.org/0000-0002-3748-9263"
title: "DTAIDistance"
version: 2
doi: 10.5281/zenodo.3981067
date-released: 2020-08-11
url: "https://github.com/wannesm/dtaidistance"

GitHub Events

Total
  • Issues event: 11
  • Watch event: 96
  • Issue comment event: 9
  • Push event: 29
  • Pull request event: 1
  • Fork event: 4
  • Create event: 3
Last Year
  • Issues event: 11
  • Watch event: 96
  • Issue comment event: 9
  • Push event: 29
  • Pull request event: 1
  • Fork event: 4
  • Create event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 743
  • Total Committers: 19
  • Avg Commits per committer: 39.105
  • Development Distribution Score (DDS): 0.066
Past Year
  • Commits: 29
  • Committers: 3
  • Avg Commits per committer: 9.667
  • Development Distribution Score (DDS): 0.069
Top Committers
Name Email Commits
Wannes Meert w****t@c****e 694
Pieter Robberechts p****s@m****m 14
Kilian Hendrickx k****x@s****m 10
Aras Yurtman 4****y 8
Gust Verbruggen g****t@M****l 3
Nick Seeuws n****s@k****e 1
Dany Vohl m****e 1
Eric Ma e****l 1
Gust Verbruggen g****n@c****e 1
Marco Rossi d****r@m****m 1
Mazhar WPC m****4@g****m 1
Muhammad Yasirroni 4****i 1
Rietesh r****5@g****m 1
Sporrer, Sebastian s****r@d****e 1
Todd t****8@g****m 1
Wojciech Zieliński w****i@m****m 1
dependabot[bot] 4****] 1
toon t****k@g****m 1
wusai2333 w****3@h****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 111
  • Total pull requests: 25
  • Average time to close issues: 4 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 85
  • Total pull request authors: 15
  • Average comments per issue: 3.12
  • Average comments per pull request: 0.36
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 6
  • Pull requests: 3
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 11 hours
  • Issue authors: 6
  • Pull request authors: 2
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • tommedema (9)
  • probberechts (4)
  • bartmch (3)
  • m-rossi (3)
  • kingwe-stack (3)
  • Ne-oL (3)
  • asyates (2)
  • macrocosme (2)
  • miladad8 (2)
  • Pranay144 (2)
  • ZihanChen1995 (2)
  • eikmeier-cn (2)
  • HendrikHuel (2)
  • JeongJuhyeon (1)
  • rrikit (1)
Pull Request Authors
  • aras-y (5)
  • probberechts (4)
  • nseeuws (3)
  • yasirroni (2)
  • laurence6 (2)
  • m-rossi (2)
  • damonhayhurst (2)
  • codeman6688 (2)
  • dependabot[bot] (2)
  • rietesh (1)
  • w4bo (1)
  • ssporrer-dlr (1)
  • joclement (1)
  • bramiozo (1)
  • Baael (1)
Top Labels
Issue Labels
bug (2) enhancement (1)
Pull Request Labels
dependencies (2)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 115,854 last-month
  • Total docker downloads: 526
  • Total dependent packages: 23
    (may contain duplicates)
  • Total dependent repositories: 63
    (may contain duplicates)
  • Total versions: 105
  • Total maintainers: 1
pypi.org: dtaidistance

Distance measures for time series (Dynamic Time Warping, fast C implementation)

  • Versions: 47
  • Dependent Packages: 23
  • Dependent Repositories: 62
  • Downloads: 115,854 Last month
  • Docker Downloads: 526
Rankings
Dependent packages count: 0.9%
Downloads: 1.2%
Dependent repos count: 1.9%
Stargazers count: 2.1%
Average: 2.1%
Docker downloads count: 2.9%
Forks count: 3.8%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/wannesm/dtaidistance
  • Versions: 51
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 6 months ago
conda-forge.org: dtaidistance
  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Stargazers count: 13.6%
Forks count: 14.1%
Dependent repos count: 24.1%
Average: 25.8%
Dependent packages count: 51.5%
Last synced: 6 months ago

Dependencies

requirements-dev.txt pypi
  • pytest * development
  • pytest-benchmark * development
  • pytest-env * development
requirements.txt pypi
  • Cython *
setup.py pypi
  • numpy *
.github/workflows/deploy.yml actions
  • actions/checkout v2 composite
  • actions/checkout v1 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
  • pypa/gh-action-pypi-publish release/v1 composite