ml-sysops_project
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ho1447
- License: mit
- Language: Python
- Default Branch: main
- Size: 361 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Modular Speech Command Recognition System
Table of Contents
- Value Proposition
- Contributors
- System Diagram
- Summary of Outside Materials
- Summary of Infrastructure Requirements
- Detailed Design Plan
- Difficulty Points Achieved
Value Proposition
Voice-controlled interfaces are increasingly common in smart home devices, vehicles, and industrial machinery. Most systems today rely on proprietary cloud APIs like Google Assistant or Alexa, which introduce privacy risks, internet dependency, and latency. Our system improves on this by providing a cloud-native machine learning service that enables fast, customizable, and private speech command recognition.
We train and serve models on Chameleon Cloud, exposing a speech recognition API that can be used in existing smart systems. The system supports real-time command detection and is later adaptable for edge deployment.
Current non-ML status: Manual control interfaces, rule-based keyword spotting, or reliance on cloud APIs. Business metric: Recognition accuracy, latency per inference, system responsiveness under noise.)
Contributors
| Name | Responsible for | Link to their commits in this repo | |---------------------------------|------------------------------|------------------------------------| | All team members | Overall system architecture | | | Vorrapard Kumthongdee | Model training | https://github.com/ho1447/ML-SysOpsProject/commits/main/?author=vorrapard | | Iris Ho | Model serving and monitoring | https://github.com/ho1447/ML-SysOpsProject/commits/main/?author=ho1447 | | Angelina Huang | Data pipeline | https://github.com/ho1447/ML-SysOpsProject/commits/main/?author=phh242 | | Jay Roy | Continuous X pipeline | https://github.com/ho1447/ML-SysOpsProject/commits/main/?author=jayroy9825|
System Diagram

Summary of Outside Materials
| | How it was created | Conditions of use | |------------------------------|--------------------------------------------------------------------------------|------------------------| | Speech Commands v2 (3.34 GB) | Created by Google, includes 105k+ WAV clips of spoken commands | Free for academic use | | Background noise data | Packaged with SCv2 dataset for audio augmentation | Free for academic use | | Wav2Vec2.0 (95m parameters) | Pretrained self-supervised model for audio embeddings (HuggingFace) | Apache 2.0 License | | SpeechBrain | Open-source toolkit for speech processing (feature extraction, classification) | MIT License |
Summary of Infrastructure Requirements
| Requirement | How many/when | Justification |
|-----------------|---------------------------------------------------|-------------------------------------------------------------|
| m1.medium VMs | 3 for entire project duration | Run API server, monitoring, preprocessing |
| gpu_mi100 | 4 hour block twice a week | Train models like Wav2Vec2.0 or CNN-based classifiers |
| Floating IPs | 1 for entire project duration, 1 for sporadic use | Expose API externally, test in canary/staging environments |
| Persistent Vols | 50 GB | Store dataset, processed features, model artifacts and logs |
Detailed Design Plan
Model Training and Training Platforms
Strategy:
- Use a three-part model:
- Feature extraction (Mel spectrograms using torchaudio)
- Noise classification model (CNN-based)
- Speech command classification model (MobileNetV2 or Wav2Vec2.0)
- Train on Google Speech Commands v2 with augmentation
- Tune hyperparameters with Ray Tune
- Use a three-part model:
Tools:
- Ray Train for distributed training on Chameleon Cloud (instructions for running on Chameleon)
- MLflow to track experiment runs and parameters
Justification:
- Enables modular updates and robust performance in noisy conditions
- Scalable training supports model reuse or extension (e.g., multi-language)
Course links:
- Unit 4: Training at scale with Ray and augmentation
- Unit 5: MLflow for experiment logging
- ✅ Difficulty point: Ray Tune for HPO + multi-model setup
Model Serving and Monitoring Platforms
Strategy:
- Package models into a container and expose them via a FastAPI endpoint
- Perform inference using ONNX-optimized models on both CPU and edge device (Raspberry pi)
- Compare latency and concurrency behavior
Monitoring:
- Log prediction confidence, input quality (signal-to-noise)
- Use a dashboard to visualize misclassification trends and input stats
Course links:
- Unit 6: Serving via API and edge deployment for a low resource device with latency/concurrency monitoring
- Unit 7: Log-based and live monitoring of performance
- ✅ Difficulty point: ONNX and edge device deployment + dashboard for model degradation
Data Pipeline
Persistent storage:
- Object storage bucket on CHI@TACC (21.96 GB): docker-compose-etl.yaml
- speechcommandsv0.02
- speechcommandsv0.02_processed
- speechcommandsv0.02processedmel
- speechcommandstestsetv0.02
- speechcommandstestsetv0.02_processed
- speechcommandstestsetv0.02processedmel
- Block storage volume on KVM@TACC (50 GB): docker-compose-block.yaml
- Minio
- Postgres
- MLflow
- Jupyter
- Prometheus
- Grafana
- Label Studio
- Object storage bucket on CHI@TACC (21.96 GB): docker-compose-etl.yaml
Offline data:
- Training dataset: speechcommandsv0.02processed and speechcommandsv0.02processed_mel
- speechcommandsv0.02_processed sample: 004ae714nohash0doingthe_dishes.wav
- speechcommandsv0.02processedmel sample: 004ae714nohash0doingthe_dishes.npy
speechcommandsv0.02
- Consists of one-second .wav audio files, each containing an English word spoken by different speakers
- Crowdsourced by Google, where participants were prompted to say a specific command such as "yes", "no", "stop", etc.
- Also includes realistic background audio files ("doingthedishes.wav", "running_tap.wav") which can be mixed into training data to simulate noisy environments
- Dataset sample
- Speech Commands: 004ae714nohash0.wav
- Background noise: doingthedishes.wav
- This data can be used by Alexa to initiate the voice assistant, control media playback, etc.
- Training dataset: speechcommandsv0.02processed and speechcommandsv0.02processed_mel
Data pipeline:
- Retrieves the data from its original source and loads it into the object store: docker-compose-etl.yaml
- extract-data
- Downloads speechcommandsv0.02 and speechcommandstestsetv0.02
- Unzips speechcommandsv0.02 and speechcommandstestsetv0.02
- process-data
- Normalizes the .wav audio files in speechcommandsv0.02 and speechcommandstestsetv0.02
- Overlays speech command audio files with background noise audio files, saving the results to:
- speechcommandsv0.02_processed
- speechcommandstestsetv0.02_processed
- Generates mel spectrograms for the processed audio files, saving the results to:
- speechcommandsv0.02processedmel
- speechcommandstestsetv0.02processedmel
- transform-data
- Organizes speechcommandsv0.02processed and speechcommandsv0.02processed_mel into directories ("training", "validation", "evaluation") according to command labels
- Decides which set the data should belong to by taking and using a hash of the filename
- Training:Validation:Evaluation = 8:1:1
- Organizes speechcommandsv0.02processed and speechcommandsv0.02processed_mel into directories ("training", "validation", "evaluation") according to command labels
- load-data
- Loads training data into the object store
- extract-data
- Retrieves the data from its original source and loads it into the object store: docker-compose-etl.yaml
Online data:
- Sends new data to the FastAPI inference endpoint during "production" use: onlinedatapipeline.py
- Uses speechcommandstestsetv0.02_processed and as "new" data
- Shuffle the paths to the files and send to the FastAPI inference endpoint
- Sends new data to the FastAPI inference endpoint during "production" use: onlinedatapipeline.py
Continuous X
1. Selecting Site
The Modular-Speech Continuous X Pipeline sets up infrastructure predominantly on KVM\@TACC using Chameleon Cloud. We start by selecting the site.
```python from chi import server, context
context.version = "1.0" context.choose_site(default="KVM@TACC") ```
This pipeline glues together the Model Training, Evaluation, Serving, and Data Operations components. The ultimate goal is rapid development-to-deployment cycles with iterative improvements—this is the Ops in MLOps.
We'll provision resources and install tooling through infrastructure-as-code:
- Terraform: Manages our cloud infra declaratively.
- Ansible: Installs Kubernetes and Argo ecosystem tools.
- Argo CD: Enables GitOps-based continuous delivery.
- Argo Workflows: Powers the container-native orchestration of our ML pipelines.
Start by cloning the infrastructure repository:
bash
git clone --recurse-submodules https://github.com/ho1447/ML-SysOps_Project.git
2. Setup Environment
Install Terraform:
bash
mkdir -p /work/.local/bin
wget https://releases.hashicorp.com/terraform/1.10.5/terraform_1.10.5_linux_amd64.zip
unzip -o -q terraform_1.10.5_linux_amd64.zip
mv terraform /work/.local/bin
rm terraform_1.10.5_linux_amd64.zip
export PATH=/work/.local/bin:$PATH
Prepare the path for additional tools:
bash
export PATH=/work/.local/bin:$PATH
export PYTHONUSERBASE=/work/.local
Install Kubespray dependencies:
bash
PYTHONUSERBASE=/work/.local pip install --user -r ./Modular-Speech/continuous_X/ansible/k8s/kubespray/requirements.txt
3. Provision Infrastructure with Terraform
Navigate to the Terraform config directory:
bash
cd /work/Modular-Speech/continuous_X/tf/kvm/
export PATH=/work/.local/bin:$PATH
unset $(set | grep -o "^OS_[A-Za-z0-9_]*")
Initialize and apply configuration:
bash
terraform init
export TF_VAR_suffix=speech_proj
export TF_VAR_key=id_rsa_chameleon_speech
terraform validate
terraform apply -auto-approve
4. Ansible for Configuration Management
Ensure your environment is ready:
bash
export PATH=/work/.local/bin:$PATH
export PYTHONUSERBASE=/work/.local
Check connectivity:
bash
ansible -i inventory.yml all -m ping
Run a hello-world test:
bash
ansible-playbook -i inventory.yml general/hello_host.yml
5. Deploy Kubernetes
SSH and prepare Kubernetes installation:
```bash cd /work/.ssh/ ssh-add idrsachameleon_speech
cd /work/Modular-Speech/continuousX/ansible ansible-playbook -i inventory.yml prek8s/prek8sconfigure.yml ```
Deploy Kubernetes with Kubespray:
bash
cd ./k8s/kubespray
ansible-playbook -i ../inventory/mycluster --become --become-user=root ./cluster.yml
6. Argo CD for Application Deployment
Set up ArgoCD for platform services:
```bash cd /work/.ssh ssh-add idrsachameleon_speech
cd /work/Modular-Speech/continuousX/ansible ansible-playbook -i inventory.yml argocd/argocdadd_platform.yml ```
Platform includes:
- MinIO
- MLFlow
- PostgreSQL
- Label Studio
- Grafana
- Prometheus
Deploy the initial container image for Modular-Speech:
bash
ansible-playbook -i inventory.yml argocd/workflow_build_init.yml
Deploy staging environment:
bash
ansible-playbook -i inventory.yml argocd/argocd_add_staging.yml
Canary and production environments:
bash
ansible-playbook -i inventory.yml argocd/argocd_add_canary.yml
ansible-playbook -i inventory.yml argocd/argocd_add_prod.yml
7. Model Lifecycle - Part 1
To manually trigger training and evaluation:
- Use the
train-modelArgo Workflow template. Provide the public IPs for:
- training endpoint
- evaluation endpoint
- MLFlow
Model training triggers via REST API and returns a RUN_ID, which we poll via MLFlow’s API.
Evaluation endpoint returns a model version, which will be used to tag the container.
8. Model Lifecycle - Part 2
Progress through environments:
- Staging: Test performance and integration.
- Canary: Serve a subset of real users.
- Production: Full rollout after validation.
To promote models:
text
Argo Workflows > promote-model > Submit
This copies artifacts and builds new images for each environment using templates like build-container-image.yaml.
9. Teardown with Terraform
To remove infrastructure:
bash
cd /work/Modular-Speech/continuous_X/tf/kvm
export TF_VAR_suffix=speech_proj
export TF_VAR_key=id_rsa_chameleon_speech
terraform destroy -auto-approve
Difficulty Points Achieved
We have satisfied 4 difficulty points across different units in our project proposal, ensuring our approach is robust, scalable, and aligned with the requirements.
Owner
- Login: ho1447
- Kind: user
- Repositories: 2
- Profile: https://github.com/ho1447
Citation (CITATIONS.bib)
@article{speechcommandsv2,
author = {{Warden}, P.},
title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1804.03209},
primaryClass = "cs.CL",
keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
year = 2018,
month = apr,
url = {https://arxiv.org/abs/1804.03209},
}
GitHub Events
Total
- Issues event: 2
- Member event: 3
- Push event: 72
- Fork event: 1
- Create event: 2
Last Year
- Issues event: 2
- Member event: 3
- Push event: 72
- Fork event: 1
- Create event: 2