ipal_datasets

Industrial datasets - datasets for evaluating industrial intrusion detection systems on IPAL.

https://github.com/fkie-cad/ipal_datasets

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 18 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary

Keywords

datasets electra elegant hai ids iec-104 ipal lemay modbus s7 swat wadi
Last synced: 6 months ago · JSON representation ·

Repository

Industrial datasets - datasets for evaluating industrial intrusion detection systems on IPAL.

Basic Info
  • Host: GitHub
  • Owner: fkie-cad
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 7.05 MB
Statistics
  • Stars: 42
  • Watchers: 4
  • Forks: 4
  • Open Issues: 0
  • Releases: 0
Topics
datasets electra elegant hai ids iec-104 ipal lemay modbus s7 swat wadi
Created over 4 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

IPAL - Datasets

Logo

This repository is part of IPAL - an Industrial Protocol Abstraction Layer. IPAL aims to establish an abstract representation of industrial network traffic for subsequent unified and protocol-independent industrial intrusion detection. IPAL consists of a transcriber to automatically translate industrial traffic into the IPAL representation, an IDS Framework implementing various industrial intrusion detection systems (IIDSs), and a collection of evaluation datasets. For details about IPAL, please refer to our publications listed down below.

This repository contains a collection of datasets for evaluating industrial IDS. Therefore, this repository contains scripts to convert (transcribe) existing datasets into IPAL format. It does not contain the raw datasets nor the datasets transcribed into IPAL. We merely use placeholders which can be replaced after obtaining the original datasets at the respective publishers (see link in the table below).

| Dataset | Type | Notes | Link | | ----------------------- | ------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | BATADAL | State | Dataset from the BATtle of the Attack Detection ALgorithms against a Water Distribution Sytem | BATADAL | | ELEGANT | Packet (Modbus) | The ELEGANT dataset consists of a MiTM and a DoS part. Until now we consider only the MiTM dataset and not the DoS dataset. | IEEE Dataport | | Electra | Packet (Modbus, S7) | Not all IPAL features are present, e.g., crc or length are missing. Also the request data/address fields are not always correct. We skip few duplicated packets. | Webseite | | Energy Dataset | Packet (IEC-104) | A short PCAP of the WATTSON simulator from Fraunhofer FKIE. We use the manipulateTraces tool from the DTMC IDS paper to add attacks to the WATTSON PCAP. | Paper, manipulateTraces DTMC Paper | | GeekLounge | Packet (S7) | The dataset does not contain any attacks. We added attacks according to the description of a paper. This results in 6 datasets with 3 attacks types each on requests and responses of S7 packets. | Website , Paper | | HAI | State | Dataset contains three training and five test files. Train and test are not in linear time order and have overlapping time-regions. | Github | | IEC61850SecurityDataset | Packet (Goose) | | Github | | Lemay | Packet (Modbus) | Most attacks are not performed with Modbus and use different protocols not relevant for the transcriber. | Paper Github | | MorrisDS1 | State | There exist different versions of the datset (binary, ternary, or multiclass labels). We use the multi-class dataset. | Website | | MorrisDS4 | Packet (Modbus) | There are minor differences between the Raw and Arff dataset. These differences affect only the attack packets. Default: Use the Arff dataset. | Website | | PowerDuck | Packet (GOOSE) | | Paper | | QUT_DNP3 | Packet (DNP3, GOOSE) | | Git Thesis | | QUT_S7_Myers | Packet (S7). | TODO: Check Rules | Dataset Paper | | QUT_S7comm | Packet (S7) | | Dataset Paper | | Sherlock (v1) | State (and IEC-104 | | Contains three differently sized scenarios of power grids. | Website Paper | | SWaT | State | Attack dataset has a 81s gap which we fill with the previous state. The first 1800s are often skipped in literature. The version 0 of SWaT has a slightly different start of the training data. | iTrust | | TEP-PASAD | State | The dataset consists of 5 different scenarios. Each scenario has its own training and test part combined in one single file. | Github | | WADI | State | WADI has a large gap in the training data of ~73h. Note: we use the row number as index for the timestamp since WADI has a challenging time notation. | iTrust | | WDT | Packet & State (Modbus) | | Paper |

Publications
  • Konrad Wolsing, Eric Wagner, Antoine Saillard, and Martin Henze. 2022. IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems. In 25th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2022), October 26–28, 2022, Limassol, Cyprus. ACM, New York, NY, USA, 17 pages. https://doi.org/10.1145/3545948.3545968
  • Wolsing, Konrad, Eric Wagner, and Martin Henze. "Poster: Facilitating Protocol-independent Industrial Intrusion Detection Systems." Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 2020 https://doi.org/10.1145/3372297.3420019

Getting started

If you are new to IPAL and want to learn about the general idea or try out our tutorials, please refer to IPAL's main repository: https://github.com/fkie-cad/ipal.

Prerequisites

Transcribing the datasets requires the ipal-transcriber and tshark to be installed (see IPAL - Transcriber and https://tshark.dev/setup/install/).

On certain operating systems running all available scripts might require additional dependencies. Ensure that the following commands are available: - pv - gzip and gunzip from the gzip project or an alternative implementation with similar features - bash from the Bash project

Install
  • After cloning the repository, initialise Git's submodules with git submodule init and git submodule update

  • To transcribe a dataset into IPAL, one needs to obtain copy of the original datasets, e.g., from the source listed in table above. This dataset needs to be placed under [dataset-name]/raw/.

  • Use the transcribe.sh or transcribe.py scripts to convert the dataset into IPAL. The dataset will be exported to [datset-name]/ipal.

Development

Tooling

The set of tools used for development, code formatting, style checking, and testing can be installed with the following command:

bash python3 -m pip install -r requirements-dev.txt

All tools can be executed manually with the following commands and report errors if encountered:

bash black . flake8 python3 -m pytest

A black and flake8 check of modified files before any commit can also be forced using Git's pre-commit hook functionality:

bash pre-commit install

More information on the black and flake8 setup can be found at https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/

Contributors

  • Konrad Wolsing (Fraunhofer FKIE & RWTH Aachen University)
  • Sven Zemanek (Fraunhofer FKIE)
  • Dominik Kus (RWTH Aachen University)

License

MIT License. See LICENSE for details.

Owner

  • Name: FKIE-CAD
  • Login: fkie-cad
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Wolsing"
  given-names: "Konrad"
  orcid: "https://orcid.org/0000-0002-7571-0555"
- family-names: "Wagner"
  given-names: "Eric"
  orcid: "https://orcid.org/0000-0003-3211-1015"
- family-names: "Saillard"
  given-names: "Antoine"
  orcid: "https://orcid.org/0000-0002-8376-2726"
- family-names: "Henze"
  given-names: "Martin"
  orcid: "https://orcid.org/0000-0001-8717-2523"
title: "IPAL - Datasets"
version: 1.3.8
doi: 10.1145/3545948.3545968
date-released: 2022-04-20
url: "https://github.com/fkie-cad/ipal_datasets"
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Wolsing"
    given-names: "Konrad"
    orcid: "https://orcid.org/0000-0002-7571-0555"
  - family-names: "Wagner"
    given-names: "Eric"
    orcid: "https://orcid.org/0000-0003-3211-1015"
  - family-names: "Saillard"
    given-names: "Antoine"
    orcid: "https://orcid.org/0000-0002-8376-2726"
  - family-names: "Henze"
    given-names: "Martin"
    orcid: "https://orcid.org/0000-0001-8717-2523"
  doi: 10.1145/3545948.3545968
  journal: In 25th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2022)
  month: 10
  title: "IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems"
  year: 2022

GitHub Events

Total
  • Watch event: 13
  • Push event: 5
  • Create event: 1
Last Year
  • Watch event: 13
  • Push event: 5
  • Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
requirements-dev.txt pypi
  • black * development
  • cffconvert * development
  • coverage * development
  • flake8 * development
  • isort * development
  • pre-commit * development
  • pytest * development
  • pytest-cov * development