icse-seip2025-anomaly-detector-public
This repository contains the code implementation for the ICSE SEIP 2025 paper titled "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset."
https://github.com/msi-ru-cs/icse-seip2025-anomaly-detector-public
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary
Repository
This repository contains the code implementation for the ICSE SEIP 2025 paper titled "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset."
Basic Info
- Host: GitHub
- Owner: msi-ru-cs
- License: apache-2.0
- Language: Python
- Default Branch: master
- Size: 49.8 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
Anomaly Detection in Large-Scale Cloud Systems
This repository contains the code implementation for the ICSE SEIP 2025 paper titled "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset." The preprint for the paper is available here.
The repository includes scripts and modules for anomaly detection using Autoencoders (ANN and GRU models), NAB scoring, and related preprocessing tasks. The pipeline is modularized for flexibility and ease of use.
Project Setup: Anomaly Data Preparation
This guide helps you set up the environment for the "Anomaly Detection" project. Follow these steps to ensure you have the correct Python version and dependencies installed for replicability.
1 Requirements
- Python Version: Python 3.11.0
- Operating System: Windows, macOS, or Linux
- Tools:
- Python installed on your system
pipfor package management- A terminal or command-line interface
2 Setup Instructions
2.1 Clone the Repository
Clone the repository to your local machine:
bash
git clone <repository-url>
Navigate to the project directory:
bash
cd icse-seip2025-anomaly-detector-public
2.2 Verify Python Version
Ensure you have Python 3.11.0 installed:
bash
python --version
If Python 3.11.0 is not installed, download it from the official Python website and install it.
Setup Options
Using Docker:
All dependencies, directory structures, and data downloads are handled automatically. Refer to Section 5.1 Using Docker for execution details.Using a Virtual Environment:
Continue with the steps below and refer to Section 5.2 Using Virtual Environment for execution instructions.
2.3 Create a Virtual Environment
Create a virtual environment using Python 3.11.0:
bash
python -m venv venv
2.4 Activate the Virtual Environment
Activate the virtual environment:
- On Windows:
bash venv\Scripts\activate - On macOS/Linux:
bash source venv/bin/activate
Verify that the virtual environment is using Python 3.11.0:
bash
python --version
2.5 Install Dependencies
Install all required libraries from the requirements.txt file:
bash
pip install -r requirements.txt
2.6 Run Tests
Run a test script or a few commands from the project to ensure everything is working correctly.
2.7 Optional: Update Dependencies
If additional libraries are needed, install them and update requirements.txt:
bash
pip install <library-name>
pip freeze > requirements.txt
2.8 Directory Structure
Use the following directory structure for your project:
plaintext
icse-seip2025-anomaly-detector-public/
├── conf/ # Configuration files (e.g., config.yaml)
├── src/ # Source code files
├── data/
│ ├── massaged/ # Pivoted input data
│ ├── labels/ # Anomaly window labels
├── results/
│ ├── model_experiments/ # Experiment results
├── trained_models/ # Saved trained models
You can create these directories using the following shell script:
bash
mkdir -p conf src data/massaged data/labels results/model_experiments trained_models
3 Data Source
The data required for this project is provided in the following dataset:
Islam, M. S., Rakha, M. S., Pourmajidi, W., Sivaloganathan, J., Steinbacher, J., & Miranskyy, A. (2024). Dataset for the paper "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset" (v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14062900
3.1 Input Data
- Place pivoted data files (e.g.,
pivoted_data_all.parquet) indata/massaged/. - Place anomaly window labels
anomaly_windows.csvindata/labels/.
You can achieve this using the following commands in the terminal (on macOS/Linux):
shell
curl -L -o data/labels/anomaly_windows.csv https://zenodo.org/records/14062900/files/anomaly_windows.csv?download=1
curl -L -o data/massaged/pivoted_data_all.parquet https://zenodo.org/records/14062900/files/pivoted_data_all.parquet?download=1
For manual setup using a virtual environment, after downloading the files, proceed to Section 5.2 Using Virtual Environment for execution details.
4 File Descriptions
4.1 Configuration
config.yaml: Stores the configuration for file paths, training/testing parameters, and model settings.
4.2 Scripts
run_experiment__multi_models_GRU_ANN.py: The main script for coordinating anomaly detection experiments. It handles data preprocessing, model training, evaluation, and grid search for optimizing parameters.preprocessing.py: Handles data preparation, including loading 5XX features, adding time-related features (e.g., sine/cosine transformations), and filtering training/testing anomaly windows.anomaly_likelihood.py: Contains functions to compute anomaly likelihood using reconstruction errors and statistical analysis.nab_scoring.py: Implements NAB (Numenta Anomaly Benchmark) scoring, including options for standard scoring or custom profiles likereward_fn.plotting_module.py: Provides utilities for visualizing the results of anomaly detection, such as normalized 5XX counts and detected anomalies.
4.3 Results Directory (./results/model_experiments/)
Files:
unweighted__<Model>_anomaly_detection_results.csv
Contains detailed results from the unweighted anomaly detection experiments using the specified model (e.g., ANN or GRU), including metrics such as true positives, false positives, and anomaly windows, for the entire test period.unweighted__<Model>_anomaly_detection_results.png
A graphical visualization of the results from the unweighted anomaly detection experiment, depicting anomalies and their corresponding NAB scores. The plot includes:- 5XX Count (Normalized): The normalized count of 5XX errors.
- Predicted Anomalies (Red X): Anomalies detected by the model.
- Ground Truth Anomalies: Highlighted areas based on categories like IssueTracker, InstantMessenger, and TestLog.
unweighted__<Model>_results.csv
Consolidated results of the unweighted experiments with the specified model, providing a summary of key performance metrics.
5 Execution
5.1 Using Docker
5.1.1 Build the Docker image using the provided Dockerfile:
bash
docker build -t anomaly-detector .
5.1.2 Run the Docker container:
bash
docker run -it anomaly-detector
5.2 Using Virtual Environment
The workflow is configured using the Hydra framework.
5.2.1 Configuring the Model Type
The model type (e.g., ANN or GRU) is configurable in the conf.yaml file. Modify the following parameters under train_test_config:
yaml
train_test_config:
use_model: ANN # Options: ANN or GRU
5.2.2 Configuring NAB Scoring Profile
The NAB scoring profile used for evaluation can be configured in the conf.yaml file. Update the following parameters under the evaluation section:
yaml
evaluation: # Default parameters
nab_scoring_profile: "reward_fn" # Options: "standard" or "reward_fn"
5.2.3 Run the Project
To run the project, execute the main script after setting up the required directories and input files:
bash
python src/run_experiment__multi_models_GRU_ANN.py
Note that you can pass configuration parameters directly via the command line. For example:
bash
python src/run_experiment__multi_models_GRU_ANN.py train_test_config.use_model=GRU evaluation.nab_scoring_profile=reward_fn
Notes for Replicability
- Use Python 3.11.0 to avoid compatibility issues.
- Keep the
requirements.txtfile up-to-date if new dependencies are added. - Use the exact steps mentioned above to ensure a consistent environment across different setups.
Citation
If you use or study the code, please cite it as follows.
bibtex
@article{islam2024anomaly,
title={Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset},
author={Islam, Mohammad Saiful and Rakha, Mohamed Sami and Pourmajidi, William and Sivaloganathan, Janakan and Steinbacher, John and Miranskyy, Andriy},
journal={arXiv preprint arXiv:2411.09047},
year={2024},
doi={10.48550/arXiv.2411.09047}
}
If you encounter any issues, please feel free to reach out for support by opening an issue.
Owner
- Name: Mohammad Saiful Islam
- Login: msi-ru-cs
- Kind: user
- Location: Toronto, Canada
- Company: Ryerson University
- Repositories: 3
- Profile: https://github.com/msi-ru-cs
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
Anomaly Detection in Large-Scale Cloud Systems: An
Industry Case and Dataset
doi: 10.48550/arXiv.2411.09047
authors:
- family-names: Islam
given-names: Mohammad Saiful
- family-names: Rakha
given-names: Mohamed Sami
- family-names: Pourmajidi
given-names: William
- family-names: Sivaloganathan
given-names: Janakan
- family-names: Steinbacher
given-names: John
- family-names: Miranskyy
given-names: Andriy
year: 2024
journal: arXiv preprint arXiv:2411.09047
url: https://arxiv.org/abs/2411.09047
abstract: >-
As Large-Scale Cloud Systems (LCS) become increasingly
complex, effective anomaly detection is critical for
ensuring system reliability and performance. However,
there is a shortage of large-scale, real-world datasets
available for benchmarking anomaly detection methods.
To address this gap, we introduce a new high-dimensional
dataset from IBM Cloud, collected over 4.5 months from the
IBM Cloud Console. This dataset comprises 39,365 rows and
117,448 columns of telemetry data. Additionally, we
demonstrate the application of machine learning models for
anomaly detection and discuss the key challenges faced in
this process.
This study and the accompanying dataset provide a resource
for researchers and practitioners in cloud system
monitoring. It facilitates more efficient testing of
anomaly detection methods in real-world data, helping to
advance the development of robust solutions to maintain
the health and performance of large-scale cloud
infrastructures.
keywords:
- ibm-cloud-console
- anomaly-detection
- software-engineering
- cloud-computing
- deep-learning
GitHub Events
Total
- Release event: 3
- Watch event: 4
- Member event: 2
- Push event: 10
- Pull request review comment event: 2
- Pull request review event: 8
- Pull request event: 10
- Fork event: 2
- Create event: 7
Last Year
- Release event: 3
- Watch event: 4
- Member event: 2
- Push event: 10
- Pull request review comment event: 2
- Pull request review event: 8
- Pull request event: 10
- Fork event: 2
- Create event: 7
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: about 12 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: about 12 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- msi-ru-cs (3)
- janakan2466 (2)