https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection

A complete pipeline for network intrusion detection comparing label encoding and one‑hot encoding, with SMOTE resampling, feature selection, and ensemble modeling using scikit‑learn and XGBoost, also this was phase one of our University's "CSAI 253- Machine Learning" course.

Keywords

csai-253 cybersecurity cybersecurity-training ensamble-methods feature-engineering imbalanced-learning machine-learning machine-learning-algorithms network-intrusion-detection one-hot-encoding sckit-learn smote tree-based-model xgboost zewailcity

Last synced: 5 months ago · JSON representation

Repository

A complete pipeline for network intrusion detection comparing label encoding and one‑hot encoding, with SMOTE resampling, feature selection, and ensemble modeling using scikit‑learn and XGBoost, also this was phase one of our University's "CSAI 253- Machine Learning" course.

Basic Info

Host: GitHub
Owner: amr-yasser226
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 6.47 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

csai-253 cybersecurity cybersecurity-training ensamble-methods feature-engineering imbalanced-learning machine-learning machine-learning-algorithms network-intrusion-detection one-hot-encoding sckit-learn smote tree-based-model xgboost zewailcity

Created 11 months ago · Last pushed 8 months ago

Metadata Files

Readme License

README.md

Machine Learning for Network Intrusion Detection

This repository implements a reproducible pipeline to detect network intrusions, comparing Label Encoding vs. One‑Hot Encoding and culminating in ensemble methods. The notebooks walk through data ingestion, cleaning, feature engineering, model training, and evaluation.

Repository Structure

```

. ├── Data │ └── Project_Phase1_before_cleaning.csv ├── LICENSE ├── .gitattributes ├── .gitignore ├── model │ ├── Final_(ADDED_LABEL_ENCODING).ipynb │ └── Final_(One_Hot_Encoding).ipynb ├── REPORT.pdf └── README.md

````

Data/
Raw CSV dataset (pre‑cleaning).
model/Final(ADDEDLABEL_ENCODING).ipynb
Full pipeline using Label Encoding
1. Mount & load data
2. Missing‑value analysis & outlier handling (IQR + winsorization)
3. Data‑type corrections & new feature creation
4. Label encoding + mutual information for feature selection
5. SMOTE resampling for class imbalance
6. Training and comparing 5 classifiers (RF, KNN, SVM, Logistic Regression, Decision Tree)
7. Hyperparameter tuning, feature‑importance filtering, stacking ensembles
8. Final metrics & recommendation
model/Final(OneHot_Encoding).ipynb
Identical pipeline, but uses One‑Hot Encoding (drop‑first) instead of label encoding. Facilitates direct comparison of encoding strategies.
REPORT.pdf
Narrative report with tables, charts, and a concise recommendation.

Key Results

| Encoding | Best Model | Accuracy | False Negatives | Notes | |----------------|----------------|---------:|----------------:|----------------------------------------| | Label Encoding | Random Forest | 99.75% | 0 | Chosen for zero FN in test set | | One‑Hot | XGBoost | 99.82% | 1 | Slightly higher accuracy but 1 FN |

Random Forest (Label Encoding) achieved 99.75% accuracy with 0 false negatives, critical for intrusion detection.
XGBoost (One‑Hot Encoding) delivered 99.82% accuracy but incurred 1 false negative.
All other models (KNN, SVM, Logistic Regression, Decision Tree) performed competitively but with higher FN rates.
Stacking ensembles (RF/DT/SVM) did not improve upon a single Random Forest for zero-FN performance.

Quickstart

```bash git clone https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection.git cd machine-learning-for-network-intrusion-detection

python3 -m venv venv source venv/bin/activate

jupyter lab ````

Open the two notebooks under model/ and run end-to-end.

Dependencies

Python 3.8+
pandas, numpy, matplotlib, seaborn
scikit‑learn, imbalanced‑learn
xgboost

What This Solves

Demonstrates best practices in EDA, feature engineering, and model evaluation
Compares two encoding strategies for categorical data
Addresses class imbalance with SMOTE
Benchmarks multiple classifiers and stacking ensembles
Prioritizes zero false negatives—paramount in intrusion detection

License

This project is released under the MIT License. See LICENSE for details.

Owner

Login: amr-yasser226
Kind: user

Repositories: 1
Profile: https://github.com/amr-yasser226

GitHub Events

Total

Watch event: 1
Push event: 2
Public event: 1

Last Year

Watch event: 1
Push event: 2
Public event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 8
Total Committers: 1
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 8
Committers: 1
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Amr Yasser	1****6	8

Issues and Pull Requests

Last synced: 7 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Machine Learning for Network Intrusion Detection

Repository Structure

Key Results

Quickstart

Dependencies

What This Solves

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests