Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: HPI-Information-Systems
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Size: 956 KB
Statistics
  • Stars: 7
  • Watchers: 7
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Series2Graph++ logo # Series2Graph++ [![release info](https://img.shields.io/badge/Release-1.1.0-blue)](https://gitlab.hpi.de/phillip.wenig/s2gpp/-/releases/1.1.0) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![pipeline status](https://gitlab.hpi.de/akita/s2gpp/badges/main/pipeline.svg)](https://gitlab.hpi.de/akita/s2gpp/-/commits/main) [![dependency status](https://deps.rs/crate/s2gpp/1.1.0/status.svg)](https://deps.rs/crate/s2gpp/1.1.0)

Series2Graph++ (S2G++) is a time series anomaly detection algorithm based on the Series2Graph (S2G) and the DADS algorithms. S2G++ can handle multivariate time series whereas S2G and DADS can cope with only univariate time series. Moreover, S2G++ takes ideas from DADS to run distributedly in a computer cluster. S2G++ is written in Rust and leverages the actix and actix-telepathy libraries.

Quick Start

Requirements

  • Rust 1.58
  • openblas
  • (Docker)

To have openblas available to the Rust build process, do the following on Debian (Linux):

shell sudo apt install build-essential gfortran libopenblas-base libopenblas-dev gcc

Installation

From source

shell git pull https://gitlab.hpi.de/akita/s2gpp cd s2gpp cargo build

Docker

The base image akita/rust-base must be available to your machine.

shell git pull https://gitlab.hpi.de/akita/s2gpp cd s2gpp docker build s2gpp .

Usage (bin)

Parameters

Pattern: shell s2gpp --local-host <IP:Port> --pattern-length <Int> --latent <Int> --query-length <Int> --rate <Int> --threads <Int> --cluster-nodes <Int> --score-output-path <Path> [main --data-path <Path> | sub --mainhost <IP:Port>]

S2G++ expects one of two sub-commands with its specific parameters:

  • main (The head computer in a cluster)
    • data-path (The path to the input time series)
  • sub (The other computers in a cluster; only necessary in a distributed setting)
    • mainhost (The ip-address to the main computer in a cluster)

Before these sub-commands are used, general parameters must be defined:

  • local-host (The ip-address with port to bind the listener on.)
  • pattern-length (Size of the sliding window, independent of anomaly length, but should in the best case be larger.)
  • latent (Size of latent embedding space. This space is the input for the PCA calculation afterwards.)
  • query-length (Size of the sliding windows used to find anomalies (query subsequences). query-length must be >= pattern-length!)
  • rate (Number of angles used to extract pattern nodes. A higher value will lead to high precision, but at the cost of increased computation time.)
  • threads (Number of helper threads started besides the main thread. (min=1))
  • cluster-nodes (Size of the computer cluster.)
  • score-output-path (Path the score are written to.)
  • column-start-idx (How many columns to skip)
  • column-end-idx (Until which column to use (exclusive). Can also take negative numbers to count from the end.)
  • self-correction (Whether S2G++ will correct the direction of the time embedding if too few transactions are available)

Input Format

The input format of the time series is expected to be a CSV with header. Each column represents a channel of the timeseries. Sometimes, time series files include also the labels and an index. You can skip columns with the column-start-idx / column-end-idx range pattern. It behave like Python ranges.

Usage (lib)

Cargo.toml toml [dependencies] s2gpp = "1.1.0"

your Rust app

rust fn some_fn(timeseries: Array2<f32>) -> Result<Array1<f32>, ()> { let params = s2gpp::Parameters::default(); let anomaly_score = s2gpp::s2gpp(params, Some(timeseries))?.unwrap(); Ok(anomaly_score) }

Python

We have wrapped the Rust code in a Python package, that can be used without installing Rust.

Installation

PyPI

shell pip install s2gpp

Build with Docker

shell make build-docker pip install wheels/s2gpp-*.whl

Build from Source

shell make install

Usage

Single Machine

```python from s2gpp import Series2GraphPP import pandas as pd

ts = pd.readcsv("data/ts0.csv").values

model = Series2GraphPP(patternlength=100) anomalyscores = model.fit_predict(ts) ```

Distributed

```python from s2gpp import DistributedSeries2GraphPP from pathlib import Path

run on one machine

def mainnode(): datasetpath = Path("data/ts_0.csv")

model = DistributedSeries2GraphPP.main(local_host="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
model.fit_predict(dataset_path)

run on other machine

def subnode(): model = DistributedSeries2GraphPP.sub(localhost="127.0.0.1:1993", mainhost="127.0.0.1:1992", nclusternodes=2, patternlength=100) model.fitpredict() ```

Cite

Please cite this work, when using it!

bibtex @inproceedings{wenig2024s2gpp, author = {Wenig, Phillip and Papenbrock, Thorsten}, title = {{Series2Graph++: Distributed Detection of Correlation Anomalies in Multivariate Time Series}}, booktitle = {{DaWaK 2024}}, year = {2024}, doi = {10.1007/978-3-031-68323-7_17} }

References

[1] P. Boniol and T. Palpanas, Series2Graph: Graph-based Subsequence Anomaly Detection in Time Series, PVLDB (2020) link

[2] Schneider, J., Wenig, P. & Papenbrock, T. Distributed detection of sequential anomalies in univariate time series. The VLDB Journal 30, 579–602 (2021). link

Owner

  • Name: Information Systems at HPI
  • Login: HPI-Information-Systems
  • Kind: organization
  • Email: felix.naumann@hpi.de
  • Location: Germany

Ensuring reproducibility for all of our research

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Wenig
    given-names: Phillip
    orcid: https://orcid.org/0000-0002-8942-4322
title: "Series2Graph++"
version: 1.1.0
# doi: ...
date-released: 2023-12-01

GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 7 last-month
    • cargo 3,844 total
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 8
  • Total maintainers: 2
pypi.org: s2gpp

Algorithm for Highly Efficient Detection of Correlation Anomalies in Multivariate Time Series

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 7 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 21.6%
Stargazers count: 27.9%
Average: 29.5%
Forks count: 29.8%
Downloads: 58.0%
Maintainers (1)
Last synced: 7 months ago
crates.io: s2gpp

Algorithm for Highly Efficient Detection of Correlation Anomalies in Multivariate Time Series

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3,844 Total
Rankings
Dependent repos count: 29.3%
Dependent packages count: 33.8%
Forks count: 37.5%
Average: 45.2%
Stargazers count: 50.6%
Downloads: 74.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

Cargo.toml cargo
  • actix-rt 2.2.0 development
  • port_scanner 0.1.5 development
  • rayon 1.5.0 development
  • actix =0.12.0
  • actix-broker 0.4.1
  • actix-telepathy =0.4.1
  • anyhow 1.0.41
  • console 0.15.0
  • csv 1.1.6
  • env_logger 0.9.0
  • futures-sink 0.3.21
  • indexmap 1.8
  • indicatif 0.16.2
  • itertools 0.10.3
  • kdtree 0.6.0
  • log 0.4
  • meanshift-rs 0.8.0
  • ndarray 0.15
  • ndarray-linalg 0.14
  • ndarray-stats 0.5
  • ndarray_einsum_beta 0.7.0
  • num-integer 0.1.44
  • num-traits 0.2.14
  • numpy 0.16
  • pyo3 0.16
  • serde 1.0
  • serde_with 1.9.1
  • sortedvec 0.5.0
  • structopt 0.3
  • tokio 1.12
Dockerfile docker
  • registry.gitlab.hpi.de/akita/i/rust-base latest build
requirements.txt pypi
  • maturin ==0.12.14
  • numpy ==1.21.6
  • patchelf ==0.14.5
  • scikit-learn ==1.1
  • twine ==4.0.0
pyproject.toml pypi
  • scikit-learn ~=1.1