Feature-augmented_Traffic_Attack_Detection_Based_on_UNSW-NB15_Dataset

Feature-augmented Traffic Attack Detection Based on UNSW-NB15 Dataset.

https://github.com/mycody0810/Feature-augmented_Traffic_Attack_Detection_Based_on_UNSW-NB15_Dataset

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Feature-augmented Traffic Attack Detection Based on UNSW-NB15 Dataset.

Basic Info
  • Host: GitHub
  • Owner: mycody0810
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 45.9 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 12 months ago · Last pushed 9 months ago
Metadata Files
Readme Citation

README.md

Abstract

This code repository is related to threat detection and is used to enhance features in the original UNSW-NB15 threat detection dataset. The newly added features (referred to as Type II features) can further improve the expressive capability of traffic data, and significant performance improvements have been validated across multiple models. The enhanced training and validation data can be used to train and validate new algorithms. Here, we share our processed and enhanced UNSW-NB15 dataset, as well as the relevant feature processing and model training code. If you have any questions, please feel free to contact us.

Dataset

UNSW-NB15 Dataset

UNSW-NB15 public dataset.

part of important dataset is available here : - Statistical features of 2.5 million flow data: UNSW_NB15_x.csv (1, 2, 3, 4) - Statistical features of 250,000 flow data: UNSW_NB15_trainging-set.csv, UNSW_NB15_testing-set.csv

the complete data is available here:The UNSW-NB15 Dataset

Deep Feature

The deep feature we extract by using proposed deep feature extraction method. - feature_7.*.rar

Feature Description.xlsx: Detailed feature description document.

Code

Feature Extraction and Validation Framework

feature_extract

Data Alignment and Feature Extraction based on the original UNSW-NB15 dataset

File and Function Description

List of Files and Their Functions:

Data Alignment:

  1. Data Alignment
  2. step1: Process CSV
    • python file: mainformatUNSW-NB15_data.py
    • description: Clean, Transform, and Handle Special Fields
  3. step2: Insert Data into MangoDB
    • python file: mainUNSW-NB152_mongoDB.py
    • description: Store Data, aiming to improve processing performance
  4. step3: Parse PCAP
    • python file: mainparsingPCAP2packet_data.py
    • description: Extract basic features (statistical features) based on communication content, Feature fields are shown in col_name.py
  5. step4: Data Alignment
    • python file: mainmatchingtestingtrainingset.py, mainmerge1cfeaturetestingtraining.py, maingivetheoptimalclass1featuretesting_training.py
    • description:
    • Match Testing/Training Data to UNSW-NB15 Dataset. The matching result includes one-to-many (indices).
    • Reduce one-to-many features to one-to-one by randomly selecting and using minimum distance judgment.
    • Note: At this step, the indices of the testing and training datasets corresponding to the entire UNSW-NB15 dataset are obtained.
  • step5:
    • python file: maincharacteristics1_category2.py
    • description:
    • makemediacygroup5tuples_time: Statistical analysis of 5-tuples, start and end times for the entire UNSW-NB15 dataset, and record the correspondence between 5-tuples and raw data.
    • makemediacypktinfoset: List all statistical information extracted from packet data within the time window based on 5-tuples and times of UNSW-NB15.
    • calculateaggregatefeatures: Calculate statistical features within the time window of PCAP based on the results from 2.
    • expand1category_features: Combine Type I and Type II Features.

model_validation

Feature Files

  • Feature Version 7: input/us_features/feature_7.csv
    • Description: Type-I and Type-II Feature (i.e., using deep feature extraction method)
    • You need to extract data/feature_7.*.rar to input/us_features/ ### Code:
  • algorithm/
    • model.py: Entry point for code
    • xxx.py: Definitions for various models
  • feature_process/
    • featurex.py: Processing for Feature Version X
    • feature: General feature processing
  • analysis/
    • dataset_analysis.py: Dataset analysis, including data imbalance
    • feature_analysis.py: Feature analysis, including feature importance
    • shap_analysis.py: SHAPley method for feature importance analysis
  • utils/
    • calculate_utils.py: Visualization of experimental results
    • sample_utils.py: Code related to partial dataset sampling
  • Entry Files
    • run.py: Entry for data processing, training, testing, and result analysis
    • params (example: run.py --kfold_random_state=0 --random_state=-1 --all_count=-1 --k_fold=5 --model_name="MLP" --feature_version="feature7/raw" --oversample_all=0)
      • model_name: Model algorithm, {RFC, MLP, KNN, LR, Efficient, Autoencoder}
      • feature_version: Feature selection, choose feature file feature7, where {raw, all} represent Feature Type 1 and Feature Type 2, respectively.
      • k_fold: Value of k for k-fold cross-validation.
      • kfoldrandomstate: Random seed for k-fold cross-validation.
      • random_state: Random seed for mini-batch sampling.
      • all_count: Number of samples for mini-batch sampling.
      • oversample_all: Sampling algorithm.
    • run.sh: Script to execute run.py
    • run_param.py: Entry for model hyperparameter tuning
  • requirements.txt

Result Records

  1. Training Records: output_us/
    • output_us/avg_result_record.csv
      • Records of experimental parameters and metrics
    • output_us/output_shap_info.csv
      • Records of experimental parameters and SHAP storage paths
  2. Model Records: model_us/
  3. SHAPLEY Records: output_shap/
  4. Preprocessed Data: dataset_us/

References

Efficient-CNN-BiLSTM-for-Network-IDS

Code for Paper: Efficient-CNN-BiLSTM-for-Network-IDS
Paper is available here: Efficient Deep CNN-BiLSTM Model for Network Intrusion Detection | Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition

Owner

  • Login: mycody0810
  • Kind: user

GitHub Events

Total
Last Year