https://github.com/baggepinnen/matrixprofile.jl

Time-series analysis using the Matrix profile in Julia

Keywords

anomaly-detection change-point-detection clustering data-mining dsp matrix-profile motif-analysis motif-discovery outlier-detection predictive-maintenance signal-processing similarity-search time-series time-series-analysis

Keywords from Contributors

particle-filter dynamical-systems computer-algebra-system mathematics bayesian-optimization bohb global-optimization hyperband hyperparameter-optimization parameter-tuning

Last synced: 5 months ago · JSON representation

Repository

Time-series analysis using the Matrix profile in Julia

Basic Info

Host: GitHub
Owner: baggepinnen
License: mit
Language: Julia
Default Branch: master
Homepage:
Size: 227 KB

Statistics

Stars: 34
Watchers: 5
Forks: 6
Open Issues: 1
Releases: 10

Topics

anomaly-detection change-point-detection clustering data-mining dsp matrix-profile motif-analysis motif-discovery outlier-detection predictive-maintenance signal-processing similarity-search time-series time-series-analysis

Created almost 6 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

README.md

MatrixProfile

Time-series analysis using the matrix profile. The matrix profile P tells you which sub-sequences of a time series T are similar to each other, and which are most dissimilar from all other. This will allow you to find repeated patterns, or motifs, as well as finding outliers and anomalies. Here's a blog post that introduces the matrix profile with some nice figures and examples: https://medium.com/towards-data-science/introduction-to-matrix-profiles-5568f3375d90 (sorry for linking to a Medium article, the one I used to link to has vanished).

Installation

This package is registered and can be installed with julia using Pkg pkg"add MatrixProfile"

Usage

The function matrix_profile returns the matrix profile and profile indices. Here's an example where we insert a repeated pattern in an otherwise random time series. julia using MatrixProfile, Plots t = range(0, stop=1, step=1/10) y0 = sin.(2pi .* t) T = [randn(20); y0; randn(20); y0; randn(20)] window_length = length(y0) profile = matrix_profile(T, window_length) plot(profile) # Should have minima at 21 and 52

The matrix profile have two sharp minima at the onsets of the repeated pattern. The parameter window_length determines how long pattern to search for.

Analysis across different time-series

If called like julia profile = matrix_profile(A, B, m, [dist]) consecutive windows of A will be compared to the entire B. The resulting matrix profile will have a length that depends on B, and indicate with small values when a window of A appeared in B, and with large values when no window in A matched the corresponding window in B. This is not a symmetric function, in general, matrix_profile(A, B) != matrix_profile(B, A).

Runtime

matrix_profile benefits greatly in speed from the use of Float32 instead of Float64, but may accumulate some error for very long time series (> 10⁶ perhaps). The computational time scales as the square of the length of T, but is invariant to the window length. Calculating the matrix profile of 2^17 ≈ 100k points takes less than minute on a laptop.

If dist is provided, a generic (slow) method is used. If dist is not provided and the inputs A,B are one dimensional vectors of numbers, a fast method is used. The fast method handles long time series, length(A) = length(B) = 100k takes less than 30s.

If the time-series is sampled very fast in relation to the time scale on which interesting things happen, you may try the function resample(T, fraction::Real) to reduce the amount of data to process. Example, resample(T, desired_samplerate/original_samplerate).

Motif grouping

Using the fake data from the example above, we can do ```julia k = 2 mot = motifs(profile, k; r=2, th=5) plot(profile, mot)

plot(mot) # Motifs can be plotted on their own for a different view.

``-kis the number of motifs to extract -rcontrols how similar two windows must be to belong to the same motif. A higher value leads to more windows being grouped together. -th` is a threshold on how nearby in time two motifs are allowed to be. motif_plot

Also see the function anomalies(profile) to find anomalies (or outliers) in the data, sometimes called discords.

Arbitrary metrics and input types

The matrix profile can be computed for any sequence of things that has a "time axis" and a notion of distance. The examples so far have dealt with one-dimensional arrays of real numbers with the Euclidean metric, for which the matrix profile is particularly efficient to compute. We do not have to limit ourselves to this setting, though, and matrix_profile accepts any array-like object and any distance function on the form dist(x,y). The interface looks like this julia profile = matrix_profile(T, m, dist) If T is a high-dimensional array, time is considered to be the last axis. T can also be a vector of any arbitrary julia objects for which the function dist(x,y) is defined. Note that if T has a long time dimensions, the matrix profile will be expensive to compute, 𝒪(n²log(n)). This method does not make use of the STOMP algorithm, since this is limited to one-dimensional data under the Euclidean metric.

MP distance

See mpdist(A,B,m).

Segmentation / change-point detection

The most likely segmentation of a time series into two is calculated using segment(p::Profile). A more detailed analysis can be performed using sp = segment_profile(p::Profile) which returns a vector of the same length as p, where a low value at index i indicates that few nearest-neighbor arcs pass over index i, sp thus form sort-of a "segmentation profile".

Time series snippets

To summarize a time series in the form of a small number of snippets, we have the function snippets. julia snips = snippets(T, 3, 100) plot(snips)

The arguments to snippets are - The time series - The desired number of snippets - The length of each snippet - Optional m: the length of a small subsequence to be used internally, defaults to 10% of the snippet length.

This function can take a while to run for long time-series, for length(T) = 15k, it takes less than a minute on a laptop. The time depends strongly on the internal window length parameter.

MASS

mass(x::AbstractVector{T}, y::AbstractVector{T}, k) - x: Data - y: query - k: window size, must be at least length(y)

DAMP

damp(T, m, ind = length(T) ÷ 10)

DAMP algorithm for anomaly detection in time series with repeated patterns. - T: Time series - m: Subsequence length (choose as approximate period of repeating patterns) - ind: Location of split point between training and test data, defaults to 10% of the data.

Returns the left-approximate Matrix Profile. Large values indicate anomalies.

References

The STOMP algorithm used in matrix_profile is detailed in the paper Matrix profile II.
The algorithm used in segment and segment_profile comes from Matrix Profile VIII
The MP distance is described in Matrix profile XII
The algorithm for extraction of time-series snippets comes from Matrix profile XIII
The DAMP algorithm comes from https://www.cs.ucr.edu/~eamonn/DAMPlongversion.pdf

Owner

Name: Fredrik Bagge Carlson
Login: baggepinnen
Kind: user
Location: Lund, Sweden

Website: baggepinnen.github.io
Twitter: baggepinnen
Repositories: 59
Profile: https://github.com/baggepinnen

Control systems, system identification, signal processing and machine learning

GitHub Events

Total

Watch event: 5
Push event: 1
Fork event: 1

Last Year

Watch event: 5
Push event: 1
Fork event: 1

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 61
Total Committers: 3
Avg Commits per committer: 20.333
Development Distribution Score (DDS): 0.443

Top Committers

Name	Email	Commits
Fredrik Bagge Carlson	b**n@g**m	34
Fredrik Bagge Carlson	c**b@u**g	21
github-actions[bot]	4**]@u**m	6

Committer Domains (Top 20 + Academic)

ulund.org: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 3
Total pull requests: 16
Average time to close issues: 4 days
Average time to close pull requests: 28 days
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 10.0
Average comments per pull request: 0.31
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 11

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

baggepinnen (2)
JuliaTagBot (1)

Pull Request Authors

github-actions[bot] (11)
baggepinnen (5)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- julia 5 total

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 10

juliahub.com: MatrixProfile

Time-series analysis using the Matrix profile in Julia

Documentation: https://docs.juliahub.com/General/MatrixProfile/stable/
License: MIT
Latest release: 1.1.1
published over 2 years ago

Versions: 10
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 5 Total

Rankings

Dependent repos count: 9.9%

Stargazers count: 19.2%

Forks count: 19.4%

Average: 21.9%

Dependent packages count: 38.9%

Last synced: 6 months ago

https://github.com/baggepinnen/matrixprofile.jl

Science Score: 13.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

MatrixProfile

Installation

Usage

Analysis across different time-series

Runtime

Motif grouping

plot(mot) # Motifs can be plotted on their own for a different view.

Arbitrary metrics and input types

MP distance

Segmentation / change-point detection

Time series snippets

MASS

DAMP

References

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

juliahub.com: MatrixProfile

Rankings