Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: sciencedirect.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Keywords
Repository
NAOMI: Network AI Workflow Democratization
Basic Info
- Host: GitHub
- Owner: copandrej
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://doi.org/10.1016/j.jnca.2025.104180
- Size: 15.6 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
NAOMI: Network AI Workflow Democratization
NAOMI is a production MLOps solution designed for deployment on a heterogeneous Kubernetes cluster.
The system uses the Ray framework for data processing, model training, and inference, distributing the computational load across nodes. Data preparation is done using Pandas or Ray Data, with Minio as an object store. Model training supports Keras, TensorFlow, and PyTorch, and is managed by Ray.
MLflow handles model storage and management, while trained models are deployed as inference API endpoints using Ray Serve or as Kubernetes deployments. Flyte orchestrates AI/ML workflows for retraining and redeployment of models, with retraining triggers based on monitored metrics. System monitoring is provided by Prometheus and Grafana.
Developers register workflows with Flyte and monitor the system, while users can trigger workflows, monitor progress, and access models in MLflow. The system is designed to run autonomously, delivering efficient production AI/ML workflows. It is modular and can be adjusted to different use cases and requirements.
Deployment
Installation video
Minimal requirements
- 12 CPU cores
- 32GB RAM
- 100GB Available disk space
1. Kubernetes cluster
Skip this step if you already have a kubernetes cluster with required addons.
- Install microk8s with addons: dns, storage, ingress or run install script ./helper_scripts/system-install.sh
- (Optional) Run install script
./helper_scripts/rasp-install.shon any raspberry pi node you want to join to the cluster. - (Optional) Ansible playbook for installing microk8s on multiple nodes:
./helper_scripts/microk8s_ansible/(requires ssh access and ansible)
NAOMI can also be deployed on k3s. In this case run install script ./helper_scripts/NAOMI-on-k3s.sh, which adjusts k3s configurations to be compatible with NAOMI.
2. AI/ML workflow system
- Adjust configs in
values_example.yaml, then deploy with helm:
bash
helm repo add naomi_charts https://copandrej.github.io/NAOMI/
helm install naomi naomi_charts/NAOMI --version 0.3.0 --values values_example.yaml -n your_namespace
[!IMPORTANT] Helm version should be between 3.14 and 3.17
3. Environment
This step is only required for running example AI/ML workflows.
- Run config script ./helper_scripts/env-prepare.sh on VM to install requirements and connect flytectl to the cluster for running AI/ML workflows.
Configurations
All configurations are set as helm values. Adjust configs in values_example.yaml and deploy with helm.
Documentation and all configurations can be found in SEMR/helm_charts/values.yaml.
Project is modular with 5 main components: - AI/ML model store with MLflow - Distributed computing and AI/ML training with Ray - Workflow orchestration with Flyte - Data storage with MinIO - System monitoring with Prometheus & Grafana
All components can be disabled, enabled, and configured in the helm values.
Usage
After the system is deployed, users can access the components through the following dashboards. System should be deployed in a closed network as access to dashboards and APIs is not secured.
Dashboards
- Ray:
http://<node_ip>/ray/ - Flyte:
http://<node_ip>:31082/ - MinIO:
http://<node_ip>:30090/ - Grafana:
http://<node_ip>:30000/ - MLflow:
http://<node_ip>:31007/#/models
Components can be used separately or together to create AI/ML workflows.
To utilize MLflow model store users can use MLflow API on http://<node_ip>:31007 (refer to MLflow documentation link: https://www.mlflow.org/docs/latest).
MinIO object store is accessible with default credentials minio:miniostorage.
Default grafana dashboard credentials are admin:prom-operator.
If required credentials can be changed in the helm values, other components and AI/ML workflow examples have to be updated with new credentials.
Ray cluster is a distributed computing framework and can be used with Ray API (https://docs.ray.io/en/master/index.html), refer to AI/ML workflow examples for how to send tasks to Ray cluster.
Flyte orchestrates AI/ML workflows. To create and run workflows refer to AI/ML workflow examples.
AI/ML workflow examples
QoE prediction
Workflow example in workflow_examples/qoe_prediction/.
Quality of Experience (QoE) prediction is a workflow example adjusted from O-RAN SC AI/ML Framework use case https://docs.o-ran-sc.org/en/latest/projects.html#ai-ml-framework.
- Populate MinIO with file
insert.pyinworkflow_examples/qoe_prediction/populate_minio/(Change IP endpoint of MinIO in the script). - Run the workflow with Flyte CLI; --bts is batch size, --n is dataset size (1, 10, 100): ```bash pyflyte run --remote --env SYSTEMIP=$(hostname -I | awk '{print $1}') --image copandrej/flyteworkflow:8 wf.py qoetrain --bt_s 10 --n 1 ```
- Monitor the progress on dashboards.
MNIST
A workflow example for distributed data processing, distributed model training, and retraining triggers based on metrics collection. (It requires at least two Ray workers)
- Populate MinIO with file
populate.pyinworkflow_examples/mnist/populate_minio/(Change IP endpoint of MinIO in the script). Run the workflow with Flyte CLI from
workflow_examples/mnist/directory: ```bash pyflyte run --remote --env SYSTEMIP=$(hostname -I | awk '{print $1}') --image copandrej/flyteworkflow:8 wf.py mnist_train```
Monitor the progress on dashboards.
To schedule retraining based on cluster metrics...TO-DO
Model deployment with SEMR_inference helm charts
This is a separate use case for deploying ML models as a service using SEMR_inference helm charts for models stored in MLflow. If using example AI/ML workflows, models are served as API endpoints using Ray Serve. - Trained models are stored using MLFlow API.
```python import mlflow
SEMR's model store endpoint
os.environ['MLFLOWTRACKINGURI'] = 'http://
Log trained ML model to SEMR
mlflow.pytorch.logmodel(model, "CNNspectrum", registeredmodelname="CNN_spectrum") ```
Model inference service have to be containerized.
- Docker image template has to be modified with code for model inference
docker_build/model_deployment/api-endpoint.py. - Requirements for model inference have to be appended to requirements.txt and imported
docker_build/model_deployment/requirements.txt. - Docker image has to be built and pushed to docker registry.
- Docker image template has to be modified with code for model inference
ML Models as a Service can be instantiated and configured using helm values overrides, specifying model version, docker image, service port, number of replicas, and other configurations required by the service
helm_charts/SEMR_inference/values-overrides-*.yaml.When a new model version is uploaded to MLflow, inference service can be re-instantiated using new configurations (values overrides). Docker images don't require any additional modification when models are retrained.
Repository structure
workflow_examples/
Examples of full MLOps workflows for QoE prediction and MNIST classification.
helper_scripts/
Install & configure scripts for kubernetes, distributed clusters and setting up the environment.
docker_build/
Dockerfiles and scripts for building docker images for model deployment (docker_build/model_deployment/) and for Ray cluster (docker_build/ray_image/).
If the system is deployed on multi architecture cluster, docker images have to be built for each architecture.
helm_charts/
Helm charts SEMR and SEMRinference.
SEMR is the main system helm chart, SEMRinference is for model deployment.
Helm charts repository is hosted on GitHub pages: https://copandrej.github.io/NAOMI/
values_example.yaml
Example of helm values file for configuring the system.
System architecture

User workflow diagrams


License
This project is licensed under the BSD-3 Clause License - see the LICENSE file for details.
Citation
Please cite our paper as follows:
@article{COP2025104180,
title = {An overview and solution for democratizing AI workflows at the network edge},
journal = {Journal of Network and Computer Applications},
volume = {239},
pages = {104180},
year = {2025},
issn = {1084-8045},
doi = {https://doi.org/10.1016/j.jnca.2025.104180},
url = {https://www.sciencedirect.com/science/article/pii/S1084804525000773},
author = {Andrej Čop and Blaž Bertalanič and Carolina Fortuna}
}
Acknowledgment
The authors would like to acknowledge funding from the European Union's Horizon Europe Framework Programme NANCY project under Grant Agreement No. 101096456.
Owner
- Name: Andrej Čop
- Login: copandrej
- Kind: user
- Location: Slovenia
- Repositories: 3
- Profile: https://github.com/copandrej
CS Student @ UNI-LJ
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Čop"
given-names: "Andrej"
- family-names: "Bertalanič"
given-names: "Blaž"
- family-names: "Fortuna"
given-names: "Carolina"
title: "An Overview and Solution for Democratizing AI Workflows at the Network Edge"
url: "https://www.sciencedirect.com/science/article/pii/S1084804525000773"
preferred-citation:
type: article
authors:
- family-names: "Čop"
given-names: "Andrej"
- family-names: "Bertalanič"
given-names: "Blaž"
- family-names: "Fortuna"
given-names: "Carolina"
title: "An overview and solution for democratizing AI workflows at the network edge"
journal: "Journal of Network and Computer Applications"
volume: "239"
pages: "104180"
year: 2025
issn: "1084-8045"
doi: "10.1016/j.jnca.2025.104180"
url: "https://www.sciencedirect.com/science/article/pii/S1084804525000773"
GitHub Events
Total
- Issues event: 1
- Watch event: 1
- Delete event: 1
- Push event: 11
- Pull request event: 2
- Fork event: 2
- Create event: 1
Last Year
- Issues event: 1
- Watch event: 1
- Delete event: 1
- Push event: 11
- Pull request event: 2
- Fork event: 2
- Create event: 1
Dependencies
- argparse *
- datasets *
- evaluate *
- fastapi *
- filelock *
- flytekit *
- keras *
- kubernetes *
- numpy *
- pandas *
- pillow *
- python-multipart *
- pyyaml *
- ray ==2.6.3
- requests *
- scikit-learn *
- starlette *
- tensorflow *
- torch *
- torchvision *
- tqdm *
- zenml ==0.50.0
- rayproject/ray 2.10.0-py310 build
- python 3.10-slim-buster build
- evaluate *
- fastapi ==0.104.0
- flytekit >=1.5.0
- keras ==2.15.0
- kubernetes *
- mlflow ==2.10.2
- pandas <=2.1.4
- pillow *
- python-multipart ==0.0.7
- ray ==2.10.0
- requests *
- s3fs *
- tensorflow *
- torch *
- torchvision *
- transformers *
- fastapi ==0.104.0
- flytekit >=1.5.0
- keras ==2.15.0
- kubernetes *
- mlflow ==2.10.2
- pandas <=2.1.4
- pillow *
- prometheus-api-client ==0.5.5
- python-multipart ==0.0.7
- ray ==2.10.0
- s3fs *
- tensorflow *
