contextual-anomaly-detector

Contextual anomaly detection tool application in building energy field based on Matrix Profile algorithm

https://github.com/baeda-polito/contextual-anomaly-detector

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.7%) to scientific vocabulary

Keywords

data-analytics docker energy energy-consumption matrix-profile python
Last synced: 6 months ago · JSON representation ·

Repository

Contextual anomaly detection tool application in building energy field based on Matrix Profile algorithm

Basic Info
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 3
  • Open Issues: 1
  • Releases: 4
Topics
data-analytics docker energy energy-consumption matrix-profile python
Created almost 5 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Contextual Matrix Profile Calculation Tool

Matrix Profile is an algorithm capable to discover motifs and discords in time series data. It is a powerful tool that by calculating the (z-normalized) Euclidean distance between any subsequence within a time series and its nearest neighbor it is able to provide insights on potential anomalies and/or repetitive patterns. In the field of building energy management it can be employed to detect anomalies in electrical load timeseries.

This tool is a Python implementation of the Matrix Profile algorithm that employs contextual information (such as external air temperature) to identify abnormal pattens in electrical load subsequences that start in predefined sub daily time windows, as shown in the following figure.

Table of Contents

Usage

The tool comes with a CLI that helps you to execute the script with the desired commands

```console $ python -m src.cmp.main -h

Matrix profile

positional arguments: inputfile Path to file variablename Variable name output_file Path to the output file

options: -h, --help show this help message and exit -country Country code (ex: IT, US, ...) ```

The arguments to pass to the script are the following:

  • input_file: The input dataset via an HTTP URL. The tool should then download the dataset from that URL; since it's a pre-signed URL, the tool would not need to deal with authentication—it can just download the dataset directly.
  • variable_name: The variable name to be used for the analysis (i.e., the column of the csv that contains the electrical load under analysis).
  • output_file: The local path to the output HTML report. The platform would then get that HTML report and upload it to the object storage service for the user to review later.
  • country: The country code of the location where the building is located. This is used to get the holidays for that country.

You can run the main script through the console using either local files or download data from an external url. This repository comes with a sample dataset (data.csv) that you can use to generate a report and you can pass the local path as input_file argument as follows:

Data format

The tool requires the user to provide a csv file as input that contains electrical power timeseries for a specific building, meter or energy system (e.g., whole building electrical power timeseries). The csv is a wide table format as follows:

csv timestamp,column_1,temp 2019-01-01 00:00:00,116.4,-0.6 2019-01-01 00:15:00,125.6,-0.9 2019-01-01 00:30:00,119.2,-1.2

The csv must have the following columns:

  • timestamp [case sensitive]: The timestamp of the observation in the format YYYY-MM-DD HH:MM:SS. This column is supposed to be in UTC timezone string format. It will be internally transformed by the tool into the index of the dataframe.
  • temp [case sensitive]: Contains the external air temperature in Celsius degrees. This column is required to perform thermal sensitive analysis on the electrical load.
  • column_1: Then the dataframe may have N arbitrary columns that refers to electrical load time series. The user has to specify the column name that refers to the electrical load time series in the variable_name argument.

Run locally

Create virtual environment and activate it and install dependencies:

  • Makefile bash make setup

  • Linux: bash python3 -m venv .venv source .venv/bin/activate pip install poetry poetry install

  • Windows: bash python -m venv venv venv\Scripts\activate pip install poetry poetry install

Now you can run the script from the console by passing the desired arguments. In the following we pass the sample dataset data.csv as input file and the variable Total_Power as the variable name to be used for the analysis. The output file will be saved in the results folder.

```console $ python -m src.cmp.main src/cmp/data/data.csv Total_Power src/cmp/results/reports/report.html

2024-08-13 12:45:42,821 INFO ⬇️ Downloading file from 2024-08-13 12:45:43,070 INFO 📊 Data processed successfully


CONTEXT 1 : Subsequences of 05:45 h (m = 23) that start in [00:00,01:00) (ctxfrom0000to0100m0545) 99.997% 0.0 sec

  • Cluster 1 (1.660 s) -> 1 anomalies
  • Cluster 2 (0.372 s) -> 3 anomalies
  • Cluster 3 (0.389 s) -> 4 anomalies
  • Cluster 4 (0.593 s) -> 5 anomalies
  • Cluster 5 (-) -> no anomalies green

[...]

2024-08-13 12:46:27,187 INFO TOTAL 0 min 44 s 2024-08-13 12:46:32,349 INFO 🎉 Report generated successfully on src/cmp/results/reports/report.html

```

At the end of the execution you can find the report in the path specified by the output_file argument, in this case you will find it in the results folder.

Run with Docker

Build the docker image.

  • Makefile bash make docker-build
  • Linux: bash docker build -t cmp .

Run the docker image with the same arguments as before

  • Makefile bash make docker-run
  • Linux: bash docker run cmp data/data.csv Total_Power results/reports/report.html

At the end of the execution you can find the results in the results folder inside the docker container.

Cite

You can cite this work by using the following reference or either though this Bibtex file or the following plain text citation

Chiosa, Roberto, et al. "Towards a self-tuned data analytics-based process for an automatic context-aware detection and diagnosis of anomalies in building energy consumption timeseries." Energy and Buildings 270 (2022): 112302.

Contributors

References

  • Series Distance Matrix repository (https://github.com/predict-idlab/seriesdistancematrix)
  • Stumpy Package (https://stumpy.readthedocs.io/en/latest/)

License

This code is licensed under the MIT License - see the LICENSE file for details.

Owner

  • Name: BAEDA
  • Login: baeda-polito
  • Kind: organization
  • Email: baeda.lab@gmail.com
  • Location: Turin, Italy

Building Automation and Energy Data Analytics Lab

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Roberto"
  given-names: "Chiosa"
  orcid: "https://orcid.org/0000-0002-9896-526X"
- family-names: "Marco Savino"
  given-names: "Piscitelli"
- family-names: "Alfonso"
  given-names: "Capozzoli"
title: "Towards a self-tuned data analytics-based process for an automatic context-aware detection and diagnosis of anomalies in building energy consumption timeseries"
version: 2.0.4
doi: https://doi.org/10.1016/j.enbuild.2022.112302
date-released: 2021-09-01
url: "https://github.com/baeda-polito/matrix-profile"

GitHub Events

Total
  • Create event: 5
  • Issues event: 1
  • Release event: 2
  • Watch event: 3
  • Delete event: 1
  • Member event: 2
  • Push event: 13
  • Pull request review event: 1
  • Pull request review comment event: 1
  • Pull request event: 5
  • Fork event: 2
Last Year
  • Create event: 5
  • Issues event: 1
  • Release event: 2
  • Watch event: 3
  • Delete event: 1
  • Member event: 2
  • Push event: 13
  • Pull request review event: 1
  • Pull request review comment event: 1
  • Pull request event: 5
  • Fork event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 5
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 9 hours
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 2.6
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 2
  • Pull requests: 5
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 9 hours
  • Issue authors: 2
  • Pull request authors: 3
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
Pull Request Authors
  • RobertoChiosa (2)
  • dependabot[bot] (2)
  • Vincenzo-26 (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2) python (2) enhancement (2)

Dependencies

Dockerfile docker
  • python ${PYTHON_VERSION}-slim build
pyproject.toml pypi
  • kneed *
  • pandas *
  • seaborn *
requirements.txt pypi
  • Jinja2 ==3.1.4
  • MarkupSafe ==2.1.5
  • contourpy ==1.2.1
  • cycler ==0.12.1
  • fonttools ==4.52.4
  • kiwisolver ==1.4.5
  • kneed ==0.8.5
  • matplotlib ==3.9.0
  • numpy *
  • numpy ==1.26.4
  • packaging ==24.0
  • pandas ==2.2.2
  • pdfkit ==1.0.0
  • pillow ==10.3.0
  • plotly *
  • plotly ==5.22.0
  • pyparsing ==3.1.2
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • scipy ==1.13.1
  • seaborn ==0.13.2
  • six ==1.16.0
  • tenacity ==8.3.0
  • tzdata ==2024.1