https://github.com/datasig-ac-uk/signature_mahalanobis_knn

Methodology for anomaly detection on multivariate streams using path signatures and the variance norm.

https://github.com/datasig-ac-uk/signature_mahalanobis_knn

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization datasig-ac-uk has institutional domain (datasig.web.ox.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary

Keywords

anomaly-detection anomaly-detection-algorithm hut23-1132 hut23-1376 machine-learning path-signature rough-paths time-series time-series-anomaly-detection
Last synced: 4 months ago · JSON representation

Repository

Methodology for anomaly detection on multivariate streams using path signatures and the variance norm.

Basic Info
  • Host: GitHub
  • Owner: datasig-ac-uk
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 25.5 MB
Statistics
  • Stars: 7
  • Watchers: 0
  • Forks: 3
  • Open Issues: 1
  • Releases: 0
Topics
anomaly-detection anomaly-detection-algorithm hut23-1132 hut23-1376 machine-learning path-signature rough-paths time-series time-series-anomaly-detection
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License

README.md

SigMahaKNN - Signature Mahalanobis KNN method

Anamoly detection on multivariate streams with Variance Norm and Path Signature

Actions Status Documentation Status PyPI version PyPI platforms GitHub Discussion

SigMahaKNN (signature_mahalanobis_knn) combines the variance norm (a generalisation of the Mahalanobis distance) with path signatures for anomaly detection for multivariate streams. The signature_mahalanobis_knn library is a Python implementation of the SigMahaKNN method described in Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature.

To find the examples from the paper, please see the paper-examples folder which includes notebooks for downloading and running the experiments.

The key contributions of this library are:

  • A simple and efficient implementation of the variance norm distance as provided by the signature_mahalanobis_knn.Mahalanobis class. The class has two main methods:
    • The fit method to fit the variance norm distance to a training datase
    • The distance method to compute the distance between two numpy arrays x1 and x2
  • A simple and efficient implementation of the SigMahaKNN method as provided by the signature_mahalanobis_knn.SignatureMahalanobisKNN class. The class has two main methods:
    • The fit method to fit a model to a training dataset
    • The fit method can take in a corpus of streams as its input (where we will compute path signatures of using the sktime library with esig or iisignature) or a corpus of path signatures as its input. This also opens up the possibility of using other feature represenations and applications of using the variance norm distance for anomaly detection
    • Currently, the library uses either sklearn's NearestNeighbors class or pynndescent's NNDescent class to efficiently compute the nearest neighbour distances of a new data point to the corpus training data
    • The conformance method to compute the conformance score for a set of new data points
    • Similarly to the fit method, the conformance method can take in a corpus of streams as its input (where we will compute path signatures of using the sktime library with esig or iisignature) or a corpus of path signatures as its input

Installation

The SigMahaKNN library is available on PyPI and can be installed with pip:

bash pip install signature_mahalanobis_knn

Usage

As noted above, the signature_mahalanobis_knn library has two main classes: Mahalanobis, a class for computing the variance norm distance, and SignatureMahalanobisKNN, a class for computing the conformance score for a set of new data points.

Computing the variance norm distance

To compute the variance norm (a generalisation of the Mahalanobis distance) for a pair of data points x1 and x2 given a corpus of training data X (a two-dimensional numpy array), you can use the Mahalanobis class as follows:

```python import numpy as np from signaturemahalanobisknn import Mahalanobis

create a corpus of training data

X = np.random.rand(100, 10)

initialise the Mahalanobis class

mahalanobis = Mahalanobis() mahalanobis.fit(X)

compute the variance norm distance between two data points

x1 = np.random.rand(10) x2 = np.random.rand(10) distance = mahalanobis.distance(x1, x2) ```

Here we provided an example with the default initialisation of the Mahalanobis class. There are also a few parameters that can be set when initialising the class (see details in Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature):

  • subspace_thres: (float) threshold for deciding whether or not a point is in the subspace, default is 1e-3
  • svd_thres: (float) threshold for deciding the numerical rank of the data matrix, default is 1e-12
  • zero_thres: (float) threshold for deciding whether the distance should be set to zero, default is 1e-12

Using the SigMahaKNN method for anomaly detection

To use the SigMahaKNN method for anomaly detection of multivariate streams, you can use the SignatureMahalanobisKNN class by first initialising the class and then using the fit and conformance methods to fit a model to a training dataset of streams and compute the conformance score for a set of new data streams, respectively:

```python import numpy as np from signaturemahalanobisknn import SignatureMahalanobisKNN

create a corpus of training data

X is a three-dimensional numpy array with shape (n_samples, length, channels)

X = np.random.rand(100, 10, 3)

initialise the SignatureMahalanobisKNN class

sigmahaknn = SignatureMahalanobisKNN() sigmahaknn.fit( knnlibrary="sklearn", Xtrain=X, signature_kwargs={"depth": 3}, )

create a set of test data streams

Y = np.random.rand(10, 10, 3)

compute the conformance score for the test data streams

conformancescores = sigmahaknn.conformance(Xtest=Y, n_neighbors=5) ```

Note here, we have provided an example whereby you pass in a corpus of streams to fit and compute the conformance scores. We use the sktime library to compute path signatures of the streams.

However, if you already have computed signatures or you are using another feature representation method, you can pass in the corpus of signatures to the fit and conformance methods instead of the streams. You do this by passing in arguments signatures_train and signatures_test to the fit and conformance methods, respectively.

```python import numpy as np from signaturemahalanobisknn import SignatureMahalanobisKNN

create a corpus of training data (signatures or other feature representations)

X is a two-dimensional numpy array with shape (nsamples, nfeatures)

features = np.random.rand(100, 10)

initialise the SignatureMahalanobisKNN class

sigmahaknn = SignatureMahalanobisKNN() sigmahaknn.fit( knnlibrary="sklearn", signaturestrain=features, )

create a set of test features

features_y = np.random.rand(10, 10)

compute the conformance score for the test features

conformancescores = sigmahaknn.conformance(signaturestest=featuresy, nneighbors=5) ```

Repo structure

The core implementation of the SigMahaKNN method is in the src/signature_mahalanobis_knn folder:

  • mahal_distance.py contains the implementation of the Mahalanobis class to compute the variance norm distance
  • sig_maha_knn.py contains the implementation of the SignatureMahalanobisKNN class to compute the conformance scores for a set of new data points against a corpus of training data
  • utils.py contains some utility functions that are useful for the library
  • baselines/ is a folder containing some of the baseline methods we look at in the paper - see paper-examples/README.md for more details

Examples

There are various examples in paper-examples folder:

Contributing

To take advantage of pre-commit, which will automatically format your code and run some basic checks before you commit:

pip install pre-commit # or brew install pre-commit on macOS pre-commit install # will install a pre-commit hook into the git repo

After doing this, each time you commit, some linters will be applied to format the codebase. You can also/alternatively run pre-commit run --all-files to run the checks.

See CONTRIBUTING.md for more information on running the test suite using nox.

Owner

  • Name: DataSig
  • Login: datasig-ac-uk
  • Kind: organization

A rough path between mathematics and data science

GitHub Events

Total
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 1
  • Pull request event: 1
Last Year
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 1
  • Pull request event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 21 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
pypi.org: signature-mahalanobis-knn

Using Nearest Neighbour-Variance Norm with Path Signatures for anomaly detection of streams

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 21 Last month
Rankings
Dependent packages count: 7.3%
Forks count: 29.9%
Average: 36.1%
Stargazers count: 38.9%
Dependent repos count: 68.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/cd.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v3 composite
  • actions/upload-artifact v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3.1.4 composite
  • pre-commit/action v3.0.0 composite
pyproject.toml pypi
  • numba *
  • numpy *
  • scikit-learn *
  • sktime @git+https://github.com/sz85512678/sktime