exploring-cybersecurity-data-science

Exploring Cybersecurity Data Science: Dimensionality Reduction and Cluster Analysis

https://github.com/muzzyb/exploring-cybersecurity-data-science

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary

Keywords

anomaly-detection cluster-analysis cybersecurity dimensionality-reduction factor-analysis-for-mixed-data-types hdbscan iforest isomap multidimensional-scaling pca t-sne umap
Last synced: 6 months ago · JSON representation ·

Repository

Exploring Cybersecurity Data Science: Dimensionality Reduction and Cluster Analysis

Basic Info
  • Host: GitHub
  • Owner: MuzzyB
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 50 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
anomaly-detection cluster-analysis cybersecurity dimensionality-reduction factor-analysis-for-mixed-data-types hdbscan iforest isomap multidimensional-scaling pca t-sne umap
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Exploring Cybersecurity Data Science

Cybersecurity incidents have become the norm of the day as there is hardly a day without news of a breach or discovery of illegally obtained data on the dark web. This is despite the presence of large amounts of data collected by various devices that are used in the protection of data in information systems. With this high volume of network data that is available through several logging systems, human analysts become overwhelmed to manually analyze the data. Even though the data flagged by the monitoring devices is what is thought to be malicious, a great portion of this data is comprised of false positives. Most of the time these analysts are cybersecurity professionals with little data analytics/science knowledge. If analysts with data analytics/science knowledge are used, they will usually have little cybersecurity knowledge. To aid with finding insights as well as problematic issues from the data, there is an increasing use of data mining/data science, machine learning, deep learning, and artificial intelligence methods. The users of these methods should therefore be knowledgeable of both data analytics/science and cybersecurity concepts and methods. In this paper we will look at how using data analytics techniques can aid in the analysis of cybersecurity data by performing a comparison of dimensionality reduction techniques, and of clustering techniques on some cybersecurity datasets

Methodology

  1. NSLKDDData.py and UNSWNB15_Data.py files with classes that preprocess the respective datasets.
  2. inputData.py contains a class with functions that create the various input files (raw design matrix. one-hot encoded design matrix, and the gower matrix).
  3. DimRed contains a class with functions that create the various dimensionality reduction outputs (PCA, FAMD, UMAP, t-SNE, ISOMAP).
  4. Clustering.py contains a class with functions that create the various clustering outputs

Since the files are very large, each output was saved to a pickle file once available to avoid rerunning. dask parallel processing package was used where methods had that option. 5. nslkddd-01.py and unsw-01.py create the input files, nslkdddimRed75.py and unswdimRed75.py create the dimemnsionality reduction outputs, cluster_uMap.py create the clustering outputs.

This is a personal unsupervised project.

Owner

  • Name: Muuzaani Nkhoma
  • Login: MuzzyB
  • Kind: user
  • Location: Minneapolis, MN

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software or images, please cite it as below."
authors:
  - family-names: "Nkhoma"
    given-names: "Muuzaani"
title: "Exploring Cybersecurity Data Science: Dimensionality Reduction, Cluster Analysis, and Anomaly Detection"
version: "1.0.0"
date-released: "2024-05-01"
url: "https://github.com/MuzzyB/Exploring-Cybersecurity-Data-Science"

GitHub Events

Total
Last Year