idps-comparison-tool

https://github.com/stratosphereips/idps-comparison-tool

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: stratosphereips
License: gpl-2.0
Language: Python
Default Branch: main
Size: 47.1 MB

Statistics

Stars: 1
Watchers: 3
Forks: 0
Open Issues: 1
Releases: 0

Created almost 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Citation

Installation

pip3 install -r requirements.txt

Usage

python3 main.py -s -e -g

for testing, use this command:

Example of using labeled ground truth dir

python3 main.py -gtd $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/ground_truth/ -s $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/slips/flows.sqlite -e $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/suricata/eve.json

Example of using ground truth file

python3 main.py -gtf $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/ground_truth/conn.log.labeled -s $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/slips/flows.sqlite -e $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/suricata/eve.json

python3 main.py -s $(pwd)/dataset/2023-02-20/slips/flows.sqlite -e $(pwd)/dataset/2023-02-20/suricata/eve.json -gtf $(pwd)/dataset/2023-02-20/zeek_labeled/conn.log.labeled

Comparison Tool Input

The tool needs the following 3 to run: 1. Slips db 2. suricata eve.json 3. a labeled conn.log file

Slips DB

Slips stores the AID hash for each conn.log flow in the sqlite db

The SQL table with the AID and label in Slips is called 'flows' inside the flows.sqlite db

This tool reads the flows.sqlite db, extracts the labels and AIDs, and stores them in its' own db stored in output/<date-time>/db.sqlite

it calculates the aid of each read flow on the fly using the (zeek's communityid + ts) combination by using the aidhash lib from pypi https://pypi.org/project/aid-hash

suricata eve.json

This tool reads Suricata's eve.json file

if the field event_type is set to 'alert', this tool marks this flow as malicious by suricata.

it calculates the aid of each read flow on the fly using the (communityid + ts) combination by using the aidhash lib from pypi https://pypi.org/project/aid-hash

Comparison tool output

The output directory of this tool can be specified using -o, for example:

python3 main.py -e suricata/eve.json -s slips/flows.sqlite -gtf zeek/conn.log.labeled -o some_dir

if -o is not given, it creates a new dir inside output/ with the date and time of the run.

The output of this tool consists of:

a sqlite db with labels per flow, and labels per time window, it also has the performance errors and, total flows count and the files read of each tool.

The sqlite db created by this tool is stored in a subdir in the output/ dir for example output/2023-07-10-14:04:16

the metrics printed in the CLI and to the output directory in results.txt at the end of the analysis
a metadata file with the versions and files used

How it works

This tool consists of 3 parsers, the ground truth parser runs first, once it's done the slips and suricata parsers start in parallel
The 3 parsers job is to store the label for each tool in the sqlite database
labels are stored per flow and per timewindow
the tool then retrieves the actual and predicted value of each of the given tools and passes them to the calculator for calculating the metrics

Parsers

The below diagram shows how we parse each tool and the ground truth given file.

Parsers result in the following tables

Comparison

Comparison Method 1: Comparison per timewindow

How labels per timewindow are calculated

Timewindow labels are detected in the following way:

A timewindow is 1h interval, the given pcap is split into as many 1h intervals as needed and each interval (timewindow) has 1 label, either malicious or benign

for slips

the slips database given to this tool using -s contains a table called alerts where slips stores the malcious timewindows with their label, start and end date.

Applying the timewindow concept for the ground truth

we read 1h worth of flows, once we find one 'malicious' label, we consider their entire timewindow as malicious, if there is no malicious flows in there, we mark that timewindow as benign

Applying the timewindow concept for suricata

Same as the ground truth. we read 1h worth of flows, once we find one 'malicious' label, we consider their entire timewindow as malicious, if there is no malicious flows in there, we mark that timewindow as benign

If a timewindow was detected by one of the tools, and not detected by the ground truth, for example negative timewindows in slips, we consider the gt label of it as "benign"

Comparison Method2: labels flow by flow

for slips

The slips database given to this tool using -s contains a table called flows where each flow is stored with its label. The flow is considered malicious by slips if it was part of an alert. Slips detects alerts based on a complex ensembling algorithm, check it Slips documentation for more about this.

for suricata

The eve.json given to this tool using -e contains flows and event_type = 'alert'. Each alerts is marked as malicious and each flow is marked as benign

for the ground truth

Ground truth flows are labeled using the netflow labeler. so each flow has a label either benign or malicious

Limitations

the labels in ground truth zeek dir have to be 'Malicious' or 'Benign' only. if any other label is present this tool will completely discard the flow.
ground truth dirs can either be json or tab separated zeek dir or conn.log file
all paths given as parameters to this tool must be absolute paths.
if any flow doesn't have a label by suricata or slips, this tool considers the flow as benign
slips now labels conn.log flows only, just like zeek does when zeek's community_id is enabled as a plugin
all flows and timewindows read by a tool, that don't have a matching flow/timewindow in the ground truth file, are discarded. The number of discarded flows is written in the cli at the end of the analysis.
we only read event_type= "flow" or "alert" in suricata eve.json files
the flows read by suricata, slips and the gt don't have to be the same, meaning that, the final flows count don't have to match because each tool reads the pcap differently
timewindow numbers may be negative if a flow is found with a flow timestamp < timestamp of the first flow seen
if a slips alert exists in parts of 2 timewindows
what we're doing here is marking bith timewindows as malicious ```
```
        1:30                   2:30
         │      slips alert     │
         ├──────────────────────┤
```
1:00 2:00 3:00 ├───────────────────────────┼────────────────────────────┤ │ tw 1 tw 2 │

```

FAQ

When given a PCAP with 24hs of flows, why are the sum of the ByTimewindow comparison not equal 24? Because each IP seen in the traffic, will have a label in each of the timewindows. Check the labelspertw table in the generated db.sqlite and sort by IPs for verification.

Used cmds

command for generating all zeek files in the dataset/

zeek -C -r <pcap> tcp_inactivity_timeout=60mins tcp_attempt_delay=1min

command for labeling conn.log files

python3 netflowlabeler.py -c labels.config -f /path/to/generated/conn.log

(optional) To label the rest of the Zeek files using an already labeled conn.log file (conn.log.labeled)

zeek-files-labeler.py -l conn.log.labeled -f folder-with-zeek-log-files

Checkout the scripts used for the comparison @ https://github.com/stratosphereips/Scripts-Used-for-Comparison

Owner

Name: Stratosphere IPS
Login: stratosphereips
Kind: organization
Location: Prague

Website: https://www.stratosphereips.org
Twitter: StratosphereIPS
Repositories: 25
Profile: https://github.com/stratosphereips

Cybersecurity Research Laboratory at the Czech Technical University in Prague. Creators of Slips, a free software machine learning-based behavioral IDS/IPS.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "YOUR_NAME_HERE"
  given-names: "YOUR_NAME_HERE"
  email: youremailhere
  affiliation: >-
      Stratosphere Laboratory, AIC, FEL, Czech
      Technical University in Prague
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Lisa"
  given-names: "Mona"
  email: youremailhere
  affiliation: >-
      Stratosphere Laboratory, AIC, FEL, Czech
      Technical University in Prague
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "repository-template"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2022-07-13
url: "https://github.com/stratosphereips/repository-template"

GitHub Events

Total

Push event: 9
Create event: 8

Last Year

Push event: 9
Create event: 8

Dependencies

.github/workflows/autotag.yml actions

actions/checkout v2 composite
anothrNick/github-tag-action 1.36.0 composite

requirements.txt pypi

PyYAML ==6.0.1
aid_hash *
communityid ==1.4
joblib ==1.3.2
numpy ==1.24.4
pyaml ==23.9.5
scikit-learn ==1.3.0
scipy ==1.10.1
sklearn ==0.0.post9
termcolor ==2.3.0
threadpoolctl ==3.2.0

idps-comparison-tool

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Installation

Usage

Comparison Tool Input

Slips DB

suricata eve.json

Comparison tool output

How it works

Parsers

Comparison

Comparison Method 1: Comparison per timewindow

How labels per timewindow are calculated

for slips

Applying the timewindow concept for the ground truth

Applying the timewindow concept for suricata

Comparison Method2: labels flow by flow

for slips

for suricata

for the ground truth

Limitations

FAQ

Used cmds

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies