https://github.com/5uperpalo/fireman-project_frontend

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: 5uperpalo
Language: CSS
Default Branch: main
Size: 6.02 MB

Statistics

Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 5 years ago · Last pushed about 5 years ago

Metadata Files

Readme

FIREMAN-project Frontend repository

Machine learning prediction Frontend related to FIREMAN project and main FIREMAN-project repository. Repository is a work-in-progress project that is part of FIREMAN project activities. Skeleton of the repository is based on Kafka Fraud Detector.

1. Cosiderations and design

1.1. Considerations

emulate real-world IoT scenario
pluggable approach, ie. easily add/swap imputer/classifer for Python, Java, or other implementation
scalability
maintainability
robustness

1.2. Design

Generator streams("produces") data with missing values to Collector by POST messages. Collector streams measurements to Kafka topics (kafka-network) for immediate processing and to TiMeseries DataBase (TMDB - InfluxDB) for further use in new model training experiments in Airflow/MLflow/etc.. Example kafka-network includes only 1 broker(no need for more for development purpose) and Apache Zookeeper (keeps track of status of the Kafka cluster node[s], topics, partitions etc.). Imputer/classifier "consumes" Kafka topic, imputes missing values with SimpleImputer and predicts label with RandomForest classifier. The predicted labels are send back to Kafka. Telegraf(API) reads and consumes Kafka streams and forwards them to Analytics dashboard(InfluxDB 2.x) for visualization. Example train/test data included in the repo are from UCI - small size of the dataset. * data streams - Kafka or Faust, well-known, well-supported, data is replicated on brokers, weel integrated with Python, Java, Scala, Spark etc. * data processing - Python, Java + easy way to incorporate ML models lifecycle using MLflow, AirFlow, etc. * data visualization - analytics dashboard provided by InfluxDB 2.x (prev solution was made in Flask, Node JS, Socket.IO and FusionCharts) * data storage - time-series database, InfluxDB * it is possible to swap Kafka client in collector with Faust(Python stream processing) or add KSQL to join/merge streams(Kafka topics) from sensors, eg. solution with KSQL * [dev] possibility to test new models using saved {Python, Java, R} models with corresponding MLflow API on a local machine, for some ideas read following article * possibility to train/test/track new models in Apache Airflow or MLflow to periodically update and track the models, see folllowing article

2. Starting/Running

Implementation is fully containerised. You will need Docker and Docker Compose to run it.

create a Docker network called kafka-network to enable communication between the Kafka
bash docker network create storage docker network create api
create single-node Kafka cluster and run in the background bash docker-compose -f docker-compose.kafka.yml up -d
start the (i) data generator, (ii) imputer/classifier, (iii) InfluxDB and (iv) Analytics Dashboard bash docker-compose -f docker-compose.yml up -d ### 2.1. Dashboard

Telegraf and Influxdb are preconfigured with initial password, organization, bucket and security token for mutual communication. For InfluxDB configuration see docker-compose.yml. Telegraf is also preconfigured to consume Kafka topics, see telegraf/telegraf.conf. Influxdb documentation does not include step necessary to load dashboard as resource when the database is intially started. As workaround either (i) import the dashboard in GUI as shown on screenshots below, or (ii) issue a command to run script in the InfluxDB container that check if InfluxDB is running and import the previously export dasboard template. bash docker exec influxdb './script.sh' Current version of InfluxDB docker starts the influx service by influxd run command after all initial commands finished, as the dashboard imports neet the server to be already running this creates an issues with running scripts in the background at the startup. * user/pass: admin/adminadmin * dashboard location: influxdb/spamucidataset.json

3. Monitoring

3.1. NEW solution using InfluxDB 2.x dashboard

easy to use, more flexible than prev solution (can include metrics monitoring docker containers from Telegraf)
GUI accessible with user/pass admin/adminadmin
telegraf communicates with infuxdb using token with predefined combination of [organization, bucket] and consumes topics [spamdata, spampredictions], see /telegraf/telegraf.conf

3.2. PREV solution

using adjusted flask dashboard.

4. Usefull Docker commands

```

build dockerfile - must be run from folder with dockerfile definition

docker build -t [CONTAINER_TAG] .

show list of images

docker images

show list of containers

docker ps

remove container (add -f parameter for forced remove)

docker rm [CONTAINER_TAG] -f

remove image

docker image remove [IMAGE_NAME]

start/stop container

docker start/stop [CONTAINER_TAG]

run container with port forwarding

docker run -p containerport:localport [IMAGE_NAME]

run linux bash in container

docker exec -it [CONTAINER_TAG] /bin/bash ```

5. Note

jupyter notebook describes how we create simple imputer , classifier and dataset
- notebook uses functions from FIREMAN imputation repo

6. Appendix

TODO
NOTES, usefull notes

Owner

Name: Pavol Mulinka
Login: 5uperpalo
Kind: user
Location: Barcelona, ES
Company: CTTC

Website: https://5uperpalo.github.io/online-cv/
Repositories: 18
Profile: https://github.com/5uperpalo

Data Scientist / Machine learning Enthusiast & former network engineer

GitHub Events

Total

Last Year

Committers

Last synced: 11 months ago

All Time

Total Commits: 33
Total Committers: 1
Avg Commits per committer: 33.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Pavol Mulinka	m**l@g**m	33

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

analytics/Dockerfile docker

nikolaik/python-nodejs python3.6-nodejs15 build

classifier/Dockerfile docker

python 3.8-slim build

collector/Dockerfile docker

python 3.8-slim build

docker-compose.kafka.yml docker

confluentinc/cp-kafka latest
confluentinc/cp-zookeeper latest

generator/Dockerfile docker

python 3.8-slim build

influxdb/Dockerfile docker

influxdb 2.0.6 build

telegraf/Dockerfile docker

telegraf 1.18.2 build

analytics/app/base/static/assets/js/package-lock.json npm

481 dependencies

analytics/app/base/static/assets/js/package.json npm

express ^4.17.1
fusioncharts ^3.16.0
kafka-node ^5.0.0
pubnub ^4.20.0
socket.io ^2.4.1
webpack ^3.12.0

analytics/requirements.txt pypi

flask ==1.1.2
flask_login ==0.5.0
flask_wtf ==0.14.3
gunicorn ==20.0.4
python-decouple ==3.4

classifier/requirements.txt pypi

influxdb *
joblib *
kafka-python *
numpy *
sklearn *

collector/requirements.txt pypi

Flask ==1.1.2
Flask-HTTPAuth ==4.1.0
Werkzeug ==1.0.1
bottle *
celery ==4.4.4
flask-restful ==0.3.8
gunicorn ==20.0.4
influxdb *
jsonschema *
kafka-python *
pandas *
redis ==3.5.3
tornado >=4.2.0,<6.0.0

generator/requirements.txt pypi

influxdb *
kafka-python *

https://github.com/5uperpalo/fireman-project_frontend

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.MD

FIREMAN-project Frontend repository

1. Cosiderations and design

1.1. Considerations

1.2. Design

2. Starting/Running

3. Monitoring

3.1. NEW solution using InfluxDB 2.x dashboard

3.2. PREV solution

4. Usefull Docker commands

build dockerfile - must be run from folder with dockerfile definition

show list of images

show list of containers

remove container (add -f parameter for forced remove)

remove image

start/stop container

run container with port forwarding

run linux bash in container

5. Note

6. Appendix

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies