https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection
A complete pipeline for network intrusion detection comparing label encoding and one‑hot encoding, with SMOTE resampling, feature selection, and ensemble modeling using scikit‑learn and XGBoost, also this was phase one of our University's "CSAI 253- Machine Learning" course.
https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Keywords
Repository
A complete pipeline for network intrusion detection comparing label encoding and one‑hot encoding, with SMOTE resampling, feature selection, and ensemble modeling using scikit‑learn and XGBoost, also this was phase one of our University's "CSAI 253- Machine Learning" course.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Machine Learning for Network Intrusion Detection
This repository implements a reproducible pipeline to detect network intrusions, comparing Label Encoding vs. One‑Hot Encoding and culminating in ensemble methods. The notebooks walk through data ingestion, cleaning, feature engineering, model training, and evaluation.
Repository Structure
```
. ├── Data │ └── Project_Phase1_before_cleaning.csv ├── LICENSE ├── .gitattributes ├── .gitignore ├── model │ ├── Final_(ADDED_LABEL_ENCODING).ipynb │ └── Final_(One_Hot_Encoding).ipynb ├── REPORT.pdf └── README.md
````
Data/
Raw CSV dataset (pre‑cleaning).model/Final(ADDEDLABEL_ENCODING).ipynb
Full pipeline using Label Encoding- Mount & load data
- Missing‑value analysis & outlier handling (IQR + winsorization)
- Data‑type corrections & new feature creation
- Label encoding + mutual information for feature selection
- SMOTE resampling for class imbalance
- Training and comparing 5 classifiers (RF, KNN, SVM, Logistic Regression, Decision Tree)
- Hyperparameter tuning, feature‑importance filtering, stacking ensembles
- Final metrics & recommendation
- Mount & load data
model/Final(OneHot_Encoding).ipynb
Identical pipeline, but uses One‑Hot Encoding (drop‑first) instead of label encoding. Facilitates direct comparison of encoding strategies.REPORT.pdf
Narrative report with tables, charts, and a concise recommendation.
Key Results
| Encoding | Best Model | Accuracy | False Negatives | Notes | |----------------|----------------|---------:|----------------:|----------------------------------------| | Label Encoding | Random Forest | 99.75% | 0 | Chosen for zero FN in test set | | One‑Hot | XGBoost | 99.82% | 1 | Slightly higher accuracy but 1 FN |
- Random Forest (Label Encoding) achieved 99.75% accuracy with 0 false negatives, critical for intrusion detection.
- XGBoost (One‑Hot Encoding) delivered 99.82% accuracy but incurred 1 false negative.
- All other models (KNN, SVM, Logistic Regression, Decision Tree) performed competitively but with higher FN rates.
- Stacking ensembles (RF/DT/SVM) did not improve upon a single Random Forest for zero-FN performance.
Quickstart
```bash git clone https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection.git cd machine-learning-for-network-intrusion-detection
python3 -m venv venv source venv/bin/activate
jupyter lab ````
Open the two notebooks under model/ and run end-to-end.
Dependencies
- Python 3.8+
- pandas, numpy, matplotlib, seaborn
- scikit‑learn, imbalanced‑learn
- xgboost
What This Solves
- Demonstrates best practices in EDA, feature engineering, and model evaluation
- Compares two encoding strategies for categorical data
- Addresses class imbalance with SMOTE
- Benchmarks multiple classifiers and stacking ensembles
- Prioritizes zero false negatives—paramount in intrusion detection
License
This project is released under the MIT License. See LICENSE for details.
Owner
- Login: amr-yasser226
- Kind: user
- Repositories: 1
- Profile: https://github.com/amr-yasser226
GitHub Events
Total
- Watch event: 1
- Push event: 2
- Public event: 1
Last Year
- Watch event: 1
- Push event: 2
- Public event: 1
Issues and Pull Requests
Last synced: 7 months ago