dsrl-apt-2023
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: shadab75
- License: mit
- Default Branch: main
- Size: 8.89 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
DSRL-APT-2023: A New Synthetic Dataset For Advanced Persistent Threats
Abstract
Detecting Advanced Persistent Threats (APTs) is crucial, and one effective method is using an intrusion detection system (IDS) integrated with supervised machine learning algorithms. These algorithms require a balanced dataset with ample attack samples to learn and recognize attack patterns effectively. However, widely used APT datasets, such as DAPT2020 and SCVIC-APT-2021, suffer from imbalance issues, which limit the performance of machine learning-based IDS.
To address this, we introduce DSRL-APT-2023, a new balanced synthetic APT dataset generated with CTGAN. We trained the CTGAN model on the DAPT2020 dataset to create this balanced dataset.
Evaluation and Comparison
We evaluate and compare the performance of six common supervised machine learning algorithms:
- Decision Tree
- Support Vector Machine
- K-Nearest Neighbor
- Logistic Regression
- Random Forest
- Multi-Layer Perceptron
Alongside these, we also assess an Intelligent Intrusion Detection System (IDS) based on tree-structured machine learning models. Our evaluation focuses on detecting attacks in DSRL-APT-2023 and compares it to DAPT2020 and SCVIC-APT-2021.
Synthetic Dataset Generation Quality
Additionally, we compare the data quality of synthetic datasets generated by two prominent GANs, CopulaGAN and CTGAN, with CTGAN demonstrating slightly superior performance in generating high-quality tabular data.
Results
Our results show that both the machine learning algorithms and the intelligent IDS can accurately detect attacks in the synthetic dataset, as indicated by the F1-Score metrics.
Owner
- Login: shadab75
- Kind: user
- Repositories: 1
- Profile: https://github.com/shadab75
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this dataset, please cite it as below." authors: - family-names: "Shadabfar" given-names: "Hossein" orcid: "" - family-names: "Dehghan" given-names: "Motahareh" orcid: "" - family-names: "Sadeghian" given-names: "Babak" orcid: "" title: "DSRL-APT-2023: A New Synthetic Dataset For Advanced Persistent Threats" version: 1.0.0 date-released: TBC url: "https://github.com/shadab75/DSRL-APT-2023"
GitHub Events
Total
- Watch event: 2
- Issue comment event: 1
- Push event: 3
- Pull request event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Issue comment event: 1
- Push event: 3
- Pull request event: 2
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- sahnaseredini (1)