https://github.com/5uperpalo/fireman-project_frontend

https://github.com/5uperpalo/fireman-project_frontend

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: 5uperpalo
  • Language: CSS
  • Default Branch: main
  • Size: 6.02 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed about 5 years ago
Metadata Files
Readme

README.MD

FIREMAN-project Frontend repository

Machine learning prediction Frontend related to FIREMAN project and main FIREMAN-project repository. Repository is a work-in-progress project that is part of FIREMAN project activities. Skeleton of the repository is based on Kafka Fraud Detector.

1. Cosiderations and design

1.1. Considerations

  • emulate real-world IoT scenario
  • pluggable approach, ie. easily add/swap imputer/classifer for Python, Java, or other implementation
  • scalability
  • maintainability
  • robustness

1.2. Design

Generator streams("produces") data with missing values to Collector by POST messages. Collector streams measurements to Kafka topics (kafka-network) for immediate processing and to TiMeseries DataBase (TMDB - InfluxDB) for further use in new model training experiments in Airflow/MLflow/etc.. Example kafka-network includes only 1 broker(no need for more for development purpose) and Apache Zookeeper (keeps track of status of the Kafka cluster node[s], topics, partitions etc.). Imputer/classifier "consumes" Kafka topic, imputes missing values with SimpleImputer and predicts label with RandomForest classifier. The predicted labels are send back to Kafka. Telegraf(API) reads and consumes Kafka streams and forwards them to Analytics dashboard(InfluxDB 2.x) for visualization. Example train/test data included in the repo are from UCI - small size of the dataset. * data streams - Kafka or Faust, well-known, well-supported, data is replicated on brokers, weel integrated with Python, Java, Scala, Spark etc. * data processing - Python, Java + easy way to incorporate ML models lifecycle using MLflow, AirFlow, etc. * data visualization - analytics dashboard provided by InfluxDB 2.x (prev solution was made in Flask, Node JS, Socket.IO and FusionCharts) * data storage - time-series database, InfluxDB * it is possible to swap Kafka client in collector with Faust(Python stream processing) or add KSQL to join/merge streams(Kafka topics) from sensors, eg. solution with KSQL * [dev] possibility to test new models using saved {Python, Java, R} models with corresponding MLflow API on a local machine, for some ideas read following article * possibility to train/test/track new models in Apache Airflow or MLflow to periodically update and track the models, see folllowing article

2. Starting/Running

Implementation is fully containerised. You will need Docker and Docker Compose to run it.

  • create a Docker network called kafka-network to enable communication between the Kafka
    bash docker network create storage docker network create api
  • create single-node Kafka cluster and run in the background bash docker-compose -f docker-compose.kafka.yml up -d
  • start the (i) data generator, (ii) imputer/classifier, (iii) InfluxDB and (iv) Analytics Dashboard bash docker-compose -f docker-compose.yml up -d ### 2.1. Dashboard

Telegraf and Influxdb are preconfigured with initial password, organization, bucket and security token for mutual communication. For InfluxDB configuration see docker-compose.yml. Telegraf is also preconfigured to consume Kafka topics, see telegraf/telegraf.conf. Influxdb documentation does not include step necessary to load dashboard as resource when the database is intially started. As workaround either (i) import the dashboard in GUI as shown on screenshots below, or (ii) issue a command to run script in the InfluxDB container that check if InfluxDB is running and import the previously export dasboard template. bash docker exec influxdb './script.sh' Current version of InfluxDB docker starts the influx service by influxd run command after all initial commands finished, as the dashboard imports neet the server to be already running this creates an issues with running scripts in the background at the startup. * user/pass: admin/adminadmin * dashboard location: influxdb/spamucidataset.json

3. Monitoring

3.1. NEW solution using InfluxDB 2.x dashboard

  • easy to use, more flexible than prev solution (can include metrics monitoring docker containers from Telegraf)
  • GUI accessible with user/pass admin/adminadmin
  • telegraf communicates with infuxdb using token with predefined combination of [organization, bucket] and consumes topics [spamdata, spampredictions], see /telegraf/telegraf.conf

3.2. PREV solution

4. Usefull Docker commands

```

build dockerfile - must be run from folder with dockerfile definition

docker build -t [CONTAINER_TAG] .

show list of images

docker images

show list of containers

docker ps

remove container (add -f parameter for forced remove)

docker rm [CONTAINER_TAG] -f

remove image

docker image remove [IMAGE_NAME]

start/stop container

docker start/stop [CONTAINER_TAG]

run container with port forwarding

docker run -p containerport:localport [IMAGE_NAME]

run linux bash in container

docker exec -it [CONTAINER_TAG] /bin/bash ```

5. Note

6. Appendix

Owner

  • Name: Pavol Mulinka
  • Login: 5uperpalo
  • Kind: user
  • Location: Barcelona, ES
  • Company: CTTC

Data Scientist / Machine learning Enthusiast & former network engineer

GitHub Events

Total
Last Year

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 33
  • Total Committers: 1
  • Avg Commits per committer: 33.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Pavol Mulinka m****l@g****m 33

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

analytics/Dockerfile docker
  • nikolaik/python-nodejs python3.6-nodejs15 build
classifier/Dockerfile docker
  • python 3.8-slim build
collector/Dockerfile docker
  • python 3.8-slim build
docker-compose.kafka.yml docker
  • confluentinc/cp-kafka latest
  • confluentinc/cp-zookeeper latest
generator/Dockerfile docker
  • python 3.8-slim build
influxdb/Dockerfile docker
  • influxdb 2.0.6 build
telegraf/Dockerfile docker
  • telegraf 1.18.2 build
analytics/app/base/static/assets/js/package-lock.json npm
  • 481 dependencies
analytics/app/base/static/assets/js/package.json npm
  • express ^4.17.1
  • fusioncharts ^3.16.0
  • kafka-node ^5.0.0
  • pubnub ^4.20.0
  • socket.io ^2.4.1
  • webpack ^3.12.0
analytics/requirements.txt pypi
  • flask ==1.1.2
  • flask_login ==0.5.0
  • flask_wtf ==0.14.3
  • gunicorn ==20.0.4
  • python-decouple ==3.4
classifier/requirements.txt pypi
  • influxdb *
  • joblib *
  • kafka-python *
  • numpy *
  • sklearn *
collector/requirements.txt pypi
  • Flask ==1.1.2
  • Flask-HTTPAuth ==4.1.0
  • Werkzeug ==1.0.1
  • bottle *
  • celery ==4.4.4
  • flask-restful ==0.3.8
  • gunicorn ==20.0.4
  • influxdb *
  • jsonschema *
  • kafka-python *
  • pandas *
  • redis ==3.5.3
  • tornado >=4.2.0,<6.0.0
generator/requirements.txt pypi
  • influxdb *
  • kafka-python *