bars

https://github.com/deepaiimpactx/bars

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Keywords

big-data-analytics internet-of-things malicious-node online-machine-learning

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: deepaiimpactx
License: other
Language: Python
Default Branch: main
Homepage:
Size: 16.2 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

big-data-analytics internet-of-things malicious-node online-machine-learning

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

Streamlined Data Pipeline for Real-Time Threat Detection and Model Inference

BARS Architecture

Abstract

Real-time threat detection in streaming data is crucial yet challenging due to varying data volumes and speeds. This paper presents an architecture designed to manage large-scale, high-speed data streams using deep learning and machine learning models. The system utilizes Apache Kafka for high-throughput data transfer and a publish-subscribe model to facilitate continuous threat detection. Various machine learning techniques, including XGBoost, Random Forest, and LightGBM, are evaluated to identify the best model for classification. The ExtraTrees model achieves exceptional performance with accuracy, precision, recall, and F1 score all reaching 99\% using the SensorNetGuard dataset within this architecture. The PyFlink framework, with its parallel processing capabilities, supports real-time training and adaptation of these models. The system calculates prediction metrics every 2,000 data points, ensuring efficient and accurate real-time threat detection.

🎒 Tech Stack

Current Pipeline

pipeline

🖥️ Run Locally

Clone the project

bash git clone https://github.com/deepaiimpactx/BARS

Go to the project directory

bash cd BARS

Build the images bash docker-compose build

Start docker container bash docker compose up -d

Other Useful commands

Check kafka messages shell docker exec -it broker kafka-console-consumer --bootstrap-server localhost:9092 --topic output_topic --partition 0 --offset 4990 --max-messages 20

To run a pyflink job shell docker-compose exec flink-jobmanager flink run -py /opt/flink/usr_jobs/classifier.py

To verify database records

PostgreSQL

Connect to the PostgreSQL Container:

sh docker exec -it postgres bash

Use psql to Query the Database: Once inside the container, use the psql command-line tool to connect to your PostgreSQL database:

sh psql -U postgres -d postgres

Run SQL queries to check the data in your tables: sql \dt -- List all tables SELECT * FROM sensor_data;

Project Organization

.
├── academicPapers  <- Research paper
├── dash    <- Flask app for DL feature selection
│   ├── uploads
├── data    <- Directory for datasets organized by their processing stages
│   ├── external    <- Data from external sources
│   ├── interim     <- Intermediate, transformed data
│   │   ├── pred    <- prediction data
│   │   ├── train   <- training data
│   ├── processed   <- Cleaned and final data ready for modeling or analysis
│   └── raw         <- Raw, unprocessed data
├── initdb      <- Database initialization scripts for Postgres
├── kafka       <- Kafka-related scripts and services
│   ├── api
│   ├── consumer
├── notebooks       <- Jupyter notebooks for data exploration and analysis
├── pyflink     <- Directory for Flink in Python
│   ├── saved_models    <- Directory for pickle serialised ML models saved from PyFLink jobs. Acts as a shared directory for PyFLink Job&Task manager.
│   ├── usr_jobs        <- Directory for Python scripts to be submitted to Flink 
├── simulation      <- Directory for simulating batch and stream environments
│   └── sensorGuard     <- SensorNetGuard Dataset
├── src     <- Source code directory
│   ├── data    <- Scripts for data handling and processing
│   ├── features    <- Scripts for feature engineering
│   ├── models      <- Scripts related to model training and predictions
│   ├── visualization   <- Scripts for data visualization
├── uploads
├── LICENSE     <- Project license file
├── Makefile    <- Makefile for build commands 
├── README.md   <- Top-level README for developers using this project
├── docker-compose.yml      <- Docker Compose configuration for multi-container application
├── qodana.yaml     <- Configuration file for Qodana- code quality and inspection tool
└── requirements.txt    <- Python dependencies for the project

Project structure based on the cookiecutter data science project template

Generate fresh structure with

tree -L 3 --dirsfirst

👨‍💻 Authors

Owner

Name: DeepAI ImpactX
Login: deepaiimpactx
Kind: organization
Location: India

Repositories: 1
Profile: https://github.com/deepaiimpactx

Citation (CITATION.cff)

cff-version: "1.2.0"
message: "If you use this work, please cite it using the following metadata."
title: "Streamlined Data Pipeline for Real-Time Threat Detection and Model Inference"
authors:
  - family-names: "Singh"
    given-names: "Rajkanwar"
  - family-names: "V"
    given-names: "Aravindan"
  - family-names: "Mishra"
    given-names: "Sanket"
  - family-names: "Singh"
    given-names: "Sunil Kumar"
date-released: "2025"
conference: "2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS)"
pages: "1148-1153"
doi: "10.1109/COMSNETS63942.2025.10885573"
keywords:
  - "Training"
  - "Adaptation models"
  - "Accuracy"
  - "Pipelines"
  - "Publish-subscribe"
  - "Threat assessment"
  - "Real-time systems"
  - "Data models"
  - "Streams"
  - "Random forests"
  - "Malicious Node"
  - "Big Data Analytics"
  - "Online Machine Learning"
  - "Internet of Things"

GitHub Events

Total

Public event: 1
Push event: 2

Last Year

Public event: 1
Push event: 2

Dependencies

dash/Dockerfile docker

python 3.12-slim build

docker-compose.yml docker

confluentinc/cp-enterprise-control-center 7.4.0
confluentinc/cp-kafka 7.4.0
confluentinc/cp-schema-registry 7.4.0
confluentinc/cp-zookeeper 7.4.0

kafka/api/Dockerfile docker

python 3.12-slim build

kafka/consumer/Dockerfile docker

python 3.12-slim build

pyflink/Dockerfile docker

flink 1.18.0 build

dash/requirements.txt pypi

Flask *
confluent_kafka *
lightgbm *
pandas *
pyswarm *
scikit-learn *
tensorflow *
zoofs *

kafka/api/requirements.txt pypi

Flask ==2.2.5
confluent-kafka ==2.3.0
waitress *

kafka/consumer/requirements.txt pypi

Flask ==2.2.5
confluent-kafka ==2.3.0

pyflink/poetry.lock pypi

apache-beam 2.48.0
apache-flink 1.18.0
apache-flink-libraries 1.18.0
avro-python3 1.10.2
certifi 2024.6.2
cffi 1.16.0
charset-normalizer 3.3.2
cloudpickle 2.2.1
crcmod 1.7
dill 0.3.1.1
dnspython 2.6.1
docopt 0.6.2
fastavro 1.9.4
fasteners 0.19
find-libpython 0.4.0
grpcio 1.64.1
hdfs 2.7.3
httplib2 0.22.0
idna 3.7
kafka-python 2.0.2
numpy 1.24.4
objsize 0.6.1
orjson 3.10.3
pandas 2.2.2
pemja 0.3.0
proto-plus 1.23.0
protobuf 4.23.4
py4j 0.10.9.7
pyarrow 11.0.0
pycparser 2.22
pydot 1.4.2
pymongo 4.7.2
pyparsing 3.1.2
python-dateutil 2.9.0.post0
pytz 2024.1
regex 2024.5.15
requests 2.32.3
six 1.16.0
typing-extensions 4.12.1
tzdata 2024.1
urllib3 2.2.1
zstandard 0.22.0

pyflink/pyproject.toml pypi

apache-flink 1.18
kafka-python ^2.0.2
python >=3.10,<3.11

pyflink/requirements.txt pypi

apache-flink ==1.18
apache-flink-libraries ==1.18
confluent_kafka *
joblib *
keras *
lightgbm *
pickle5 *
river *
scikit-learn *
torch ==2.3.1
torchsampler ==0.1.2
xgboost *
zoofs *

requirements.txt pypi

Sphinx *
apache-airflow *
awscli *
click *
confluent-kafka ==2.3.0
coverage *
diagrams *
fastapi *
flake8 *
imbalanced-learn *
lightgbm *
matplotlib *
numpy *
pandas *
pydantic *
pyflink *
python-dotenv >=0.5.1
scikit-learn *
scipy *
seaborn *
uvicorn *
xgboost *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science