https://github.com/5uperpalo/fireman-project_frontend
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: 5uperpalo
- Language: CSS
- Default Branch: main
- Size: 6.02 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.MD
FIREMAN-project Frontend repository
Machine learning prediction Frontend related to FIREMAN project and main FIREMAN-project repository. Repository is a work-in-progress project that is part of FIREMAN project activities. Skeleton of the repository is based on Kafka Fraud Detector.
1. Cosiderations and design

1.1. Considerations
- emulate real-world IoT scenario
- pluggable approach, ie. easily add/swap imputer/classifer for Python, Java, or other implementation
- scalability
- maintainability
- robustness
1.2. Design
Generator streams("produces") data with missing values to Collector by POST messages. Collector streams measurements to Kafka topics (kafka-network) for immediate processing and to TiMeseries DataBase (TMDB - InfluxDB) for further use in new model training experiments in Airflow/MLflow/etc.. Example kafka-network includes only 1 broker(no need for more for development purpose) and Apache Zookeeper (keeps track of status of the Kafka cluster node[s], topics, partitions etc.). Imputer/classifier "consumes" Kafka topic, imputes missing values with SimpleImputer and predicts label with RandomForest classifier. The predicted labels are send back to Kafka. Telegraf(API) reads and consumes Kafka streams and forwards them to Analytics dashboard(InfluxDB 2.x) for visualization. Example train/test data included in the repo are from UCI - small size of the dataset. * data streams - Kafka or Faust, well-known, well-supported, data is replicated on brokers, weel integrated with Python, Java, Scala, Spark etc. * data processing - Python, Java + easy way to incorporate ML models lifecycle using MLflow, AirFlow, etc. * data visualization - analytics dashboard provided by InfluxDB 2.x (prev solution was made in Flask, Node JS, Socket.IO and FusionCharts) * data storage - time-series database, InfluxDB * it is possible to swap Kafka client in collector with Faust(Python stream processing) or add KSQL to join/merge streams(Kafka topics) from sensors, eg. solution with KSQL * [dev] possibility to test new models using saved {Python, Java, R} models with corresponding MLflow API on a local machine, for some ideas read following article * possibility to train/test/track new models in Apache Airflow or MLflow to periodically update and track the models, see folllowing article
2. Starting/Running
Implementation is fully containerised. You will need Docker and Docker Compose to run it.
- create a Docker network called kafka-network to enable communication between the Kafka
bash docker network create storage docker network create api - create single-node Kafka cluster and run in the background
bash docker-compose -f docker-compose.kafka.yml up -d - start the (i) data generator, (ii) imputer/classifier, (iii) InfluxDB and (iv) Analytics Dashboard
bash docker-compose -f docker-compose.yml up -d### 2.1. Dashboard
Telegraf and Influxdb are preconfigured with initial password, organization, bucket and security token for mutual communication. For InfluxDB configuration see docker-compose.yml. Telegraf is also preconfigured to consume Kafka topics, see telegraf/telegraf.conf. Influxdb documentation does not include step necessary to load dashboard as resource when the database is intially started.
As workaround either (i) import the dashboard in GUI as shown on screenshots below, or (ii) issue a command to run script in the InfluxDB container that check if InfluxDB is running and import the previously export dasboard template.
bash
docker exec influxdb './script.sh'
Current version of InfluxDB docker starts the influx service by influxd run command after all initial commands finished, as the dashboard imports neet the server to be already running this creates an issues with running scripts in the background at the startup.
* user/pass: admin/adminadmin
* dashboard location: influxdb/spamucidataset.json

3. Monitoring
3.1. NEW solution using InfluxDB 2.x dashboard
- easy to use, more flexible than prev solution (can include metrics monitoring docker containers from Telegraf)
- GUI accessible with user/pass admin/adminadmin
- telegraf communicates with infuxdb using token with predefined combination of [organization, bucket] and consumes topics [spamdata, spampredictions], see /telegraf/telegraf.conf

3.2. PREV solution
- using adjusted flask dashboard.

4. Usefull Docker commands
```
build dockerfile - must be run from folder with dockerfile definition
docker build -t [CONTAINER_TAG] .
show list of images
docker images
show list of containers
docker ps
remove container (add -f parameter for forced remove)
docker rm [CONTAINER_TAG] -f
remove image
docker image remove [IMAGE_NAME]
start/stop container
docker start/stop [CONTAINER_TAG]
run container with port forwarding
docker run -p containerport:localport [IMAGE_NAME]
run linux bash in container
docker exec -it [CONTAINER_TAG] /bin/bash ```
5. Note
- jupyter notebook describes how we create simple imputer , classifier and dataset
- notebook uses functions from FIREMAN imputation repo
6. Appendix
Owner
- Name: Pavol Mulinka
- Login: 5uperpalo
- Kind: user
- Location: Barcelona, ES
- Company: CTTC
- Website: https://5uperpalo.github.io/online-cv/
- Repositories: 18
- Profile: https://github.com/5uperpalo
Data Scientist / Machine learning Enthusiast & former network engineer
GitHub Events
Total
Last Year
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Pavol Mulinka | m****l@g****m | 33 |
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- nikolaik/python-nodejs python3.6-nodejs15 build
- python 3.8-slim build
- python 3.8-slim build
- confluentinc/cp-kafka latest
- confluentinc/cp-zookeeper latest
- python 3.8-slim build
- influxdb 2.0.6 build
- telegraf 1.18.2 build
- 481 dependencies
- express ^4.17.1
- fusioncharts ^3.16.0
- kafka-node ^5.0.0
- pubnub ^4.20.0
- socket.io ^2.4.1
- webpack ^3.12.0
- flask ==1.1.2
- flask_login ==0.5.0
- flask_wtf ==0.14.3
- gunicorn ==20.0.4
- python-decouple ==3.4
- influxdb *
- joblib *
- kafka-python *
- numpy *
- sklearn *
- Flask ==1.1.2
- Flask-HTTPAuth ==4.1.0
- Werkzeug ==1.0.1
- bottle *
- celery ==4.4.4
- flask-restful ==0.3.8
- gunicorn ==20.0.4
- influxdb *
- jsonschema *
- kafka-python *
- pandas *
- redis ==3.5.3
- tornado >=4.2.0,<6.0.0
- influxdb *
- kafka-python *