idps-comparison-tool
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: stratosphereips
- License: gpl-2.0
- Language: Python
- Default Branch: main
- Size: 47.1 MB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Installation
pip3 install -r requirements.txt
Usage
python3 main.py -s
for testing, use this command:
Example of using labeled ground truth dir
python3 main.py -gtd $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/ground_truth/ -s $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/slips/flows.sqlite -e $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/suricata/eve.json
Example of using ground truth file
python3 main.py -gtf $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/ground_truth/conn.log.labeled -s $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/slips/flows.sqlite -e $(pwd)/dataset/CTU-Malware-Capture-Botnet-4/suricata/eve.json
python3 main.py -s $(pwd)/dataset/2023-02-20/slips/flows.sqlite -e $(pwd)/dataset/2023-02-20/suricata/eve.json -gtf $(pwd)/dataset/2023-02-20/zeek_labeled/conn.log.labeled
Comparison Tool Input
The tool needs the following 3 to run: 1. Slips db 2. suricata eve.json 3. a labeled conn.log file
Slips DB
Slips stores the AID hash for each conn.log flow in the sqlite db
The SQL table with the AID and label in Slips is called 'flows' inside the flows.sqlite db
This tool reads the flows.sqlite db, extracts the labels and AIDs,
and stores them in its' own db stored in output/<date-time>/db.sqlite
it calculates the aid of each read flow on the fly using the (zeek's communityid + ts) combination by using the aidhash lib from pypi https://pypi.org/project/aid-hash
suricata eve.json
This tool reads Suricata's eve.json file
if the field event_type is set to 'alert', this tool marks this flow as malicious by suricata.
it calculates the aid of each read flow on the fly using the (communityid + ts) combination by using the aidhash lib from pypi https://pypi.org/project/aid-hash
Comparison tool output
The output directory of this tool can be specified using -o, for example:
python3 main.py -e suricata/eve.json -s slips/flows.sqlite -gtf zeek/conn.log.labeled -o some_dir
if -o is not given, it creates a new dir inside output/ with the date and time of the run.
The output of this tool consists of:
- a sqlite db with labels per flow, and labels per time window, it also has the performance errors and, total flows count and the files read of each tool.
The sqlite db created by this tool is stored in a subdir in the output/ dir
for example
output/2023-07-10-14:04:16
the metrics printed in the CLI and to the output directory in results.txt at the end of the analysis
a metadata file with the versions and files used
How it works

- This tool consists of 3 parsers, the ground truth parser runs first, once it's done the slips and suricata parsers start in parallel
- The 3 parsers job is to store the label for each tool in the sqlite database
- labels are stored per flow and per timewindow
- the tool then retrieves the actual and predicted value of each of the given tools and passes them to the calculator for calculating the metrics
Parsers
The below diagram shows how we parse each tool and the ground truth given file.

Parsers result in the following tables
Comparison
Comparison Method 1: Comparison per timewindow
How labels per timewindow are calculated
Timewindow labels are detected in the following way:
A timewindow is 1h interval, the given pcap is split into as many 1h intervals as needed and each interval (timewindow) has 1 label, either malicious or benign
for slips
the slips database given to this tool using -s contains a table called alerts where slips stores the malcious timewindows with their label, start and end date.
Applying the timewindow concept for the ground truth
we read 1h worth of flows, once we find one 'malicious' label, we consider their entire timewindow as malicious, if there is no malicious flows in there, we mark that timewindow as benign
Applying the timewindow concept for suricata
Same as the ground truth. we read 1h worth of flows, once we find one 'malicious' label, we consider their entire timewindow as malicious, if there is no malicious flows in there, we mark that timewindow as benign
If a timewindow was detected by one of the tools, and not detected by the ground truth, for example negative timewindows in slips, we consider the gt label of it as "benign"
Comparison Method2: labels flow by flow
for slips
The slips database given to this tool using -s contains a table called flows where each flow is stored with its label. The flow is considered malicious by slips if it was part of an alert. Slips detects alerts based on a complex ensembling algorithm, check it Slips documentation for more about this.
for suricata
The eve.json given to this tool using -e contains flows and event_type = 'alert'. Each alerts is marked as malicious and each flow is marked as benign
for the ground truth
Ground truth flows are labeled using the netflow labeler. so each flow has a label either benign or malicious
Limitations
- the labels in ground truth zeek dir have to be 'Malicious' or 'Benign' only. if any other label is present this tool will completely discard the flow.
ground truth dirs can either be json or tab separated zeek dir or conn.log file
all paths given as parameters to this tool must be absolute paths.
if any flow doesn't have a label by suricata or slips, this tool considers the flow as benign
slips now labels conn.log flows only, just like zeek does when zeek's community_id is enabled as a plugin
all flows and timewindows read by a tool, that don't have a matching flow/timewindow in the ground truth file, are discarded. The number of discarded flows is written in the cli at the end of the analysis.
we only read event_type= "flow" or "alert" in suricata eve.json files
the flows read by suricata, slips and the gt don't have to be the same, meaning that, the final flows count don't have to match because each tool reads the pcap differently
timewindow numbers may be negative if a flow is found with a flow timestamp < timestamp of the first flow seen
if a slips alert exists in parts of 2 timewindows
what we're doing here is marking bith timewindows as malicious ```
1:30 2:30 │ slips alert │ ├──────────────────────┤1:00 2:00 3:00 ├───────────────────────────┼────────────────────────────┤ │ tw 1 tw 2 │
```
FAQ
- When given a PCAP with 24hs of flows, why are the sum of the ByTimewindow comparison not equal 24? Because each IP seen in the traffic, will have a label in each of the timewindows. Check the labelspertw table in the generated db.sqlite and sort by IPs for verification.
Used cmds
- command for generating all zeek files in the dataset/
zeek -C -r <pcap> tcp_inactivity_timeout=60mins tcp_attempt_delay=1min
- command for labeling conn.log files
python3 netflowlabeler.py -c labels.config -f /path/to/generated/conn.log
- (optional) To label the rest of the Zeek files using an already labeled conn.log file (conn.log.labeled)
zeek-files-labeler.py -l conn.log.labeled -f folder-with-zeek-log-files
Checkout the scripts used for the comparison @ https://github.com/stratosphereips/Scripts-Used-for-Comparison
Owner
- Name: Stratosphere IPS
- Login: stratosphereips
- Kind: organization
- Location: Prague
- Website: https://www.stratosphereips.org
- Twitter: StratosphereIPS
- Repositories: 25
- Profile: https://github.com/stratosphereips
Cybersecurity Research Laboratory at the Czech Technical University in Prague. Creators of Slips, a free software machine learning-based behavioral IDS/IPS.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "YOUR_NAME_HERE"
given-names: "YOUR_NAME_HERE"
email: youremailhere
affiliation: >-
Stratosphere Laboratory, AIC, FEL, Czech
Technical University in Prague
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Lisa"
given-names: "Mona"
email: youremailhere
affiliation: >-
Stratosphere Laboratory, AIC, FEL, Czech
Technical University in Prague
orcid: "https://orcid.org/0000-0000-0000-0000"
title: "repository-template"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2022-07-13
url: "https://github.com/stratosphereips/repository-template"
GitHub Events
Total
- Push event: 9
- Create event: 8
Last Year
- Push event: 9
- Create event: 8
Dependencies
- actions/checkout v2 composite
- anothrNick/github-tag-action 1.36.0 composite
- PyYAML ==6.0.1
- aid_hash *
- communityid ==1.4
- joblib ==1.3.2
- numpy ==1.24.4
- pyaml ==23.9.5
- scikit-learn ==1.3.0
- scipy ==1.10.1
- sklearn ==0.0.post9
- termcolor ==2.3.0
- threadpoolctl ==3.2.0