saia-hpc
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: gwdg
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 626 KB
Statistics
- Stars: 6
- Watchers: 6
- Forks: 2
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
SAIA - the Scalable AI Accelerator - HPC
This repository contains the HPC-side components of the Scalable AI Accelerator SAIA, which hosts AI services such as Chat AI. The implementation of the remaining components of the complete architecture for Chat AI can be found in two other repos: - Stand-alone web interface: https://github.com/gwdg/chat-ai - Server components, incl. API gateway and SSH proxy: https://github.com/gwdg/saia-hub
Together these repos provide the entire underyling mechanism for Chat AI, which can be generalized as a slurm-native HPC web service.
SAIA HPC
This repository contains the tools and scripts to deploy a consistent and scalable service on a Slurm-based HPC center with the ability to integrate with the web server provided in SAIA Hub.
SSH-based proxy
In a typical HPC cluster setting, the high-performance compute nodes that are capable of running Large Language Models (LLMs) may not be directly accessible from the internet. In these circumstances, the requests from the web server would have to go through an entry point to the cluster, for example a login node or service node. Furthermore, direct tunneling and port forwarding may be forbidden as a security mechanism, and only certain protocols such as SSH may be allowed.
Therefore, the HPC proxy runs on the cloud server and uses an SSH key to establish a connection to the cluster's entrypoint, i.e. the login/service node. For security reasons, the SSH key hosted on the cloud server is restricted to always run a single script on the login node, namely cloud_interface.sh and is never actually given a shell instance. This prevents direct access to the cluster even if the web server is compromised. The restriction to run this script is implemented by configuring the ForceCommand directive in SSH for this specific SSH key; this can be set in the ~/.ssh/authorized_keys file of an HPC user or functional account without root access, like this:
bash
command="/path/to/cloud_interface.sh",no-port-forwarding,no-X11-forwarding ssh-rsa <public_key>
Scheduler
The task of the scheduler script scheduler.py is to run reliable and scalable HPC services on a Slurm-based HPC cluster. A configuration for the desired services should be set in config.json, and once the scheduler is initialized with python scheduler.py init, everything else is done automatically.
When an SSH proxy is running and configured to connect to the HPC center, it periodically sends keep-alive prompts to maintain the established SSH connection. Due to the ForceCommand directive, these prompts actually run the cloud_interface.sh, which in turn periodically runs the scheduler. The scheduler maintains the state of active backend jobs in the services/cluster.services file, and makes sure that there are always sufficient available jobs to handle the incoming requests, scaling up and down the jobs based on demand. It can also log timestamps of user requests which can be used for accounting purposes.
For each service, some configuration parameters must be provided, most notably a Slurm batch (sbatch) script with which the scheduler can submit Slurm jobs in order to host the service on a high-performance compute node, possibly with GPUs. The submission of Slurm jobs is handled automatically by the scheduler, as it preemptively resubmits jobs that are about to expire and randomly assigns a port number for each job, which the service should listen to for incoming requests.
Services
The implementation of the scheduler and proxy has been abstracted from the service itself, meaning it should be possible to run any REST service within this framework. As for the Chat AI service, we simply use vLLM. vLLM is capable of hosting cutting-edge LLMs with state-of-the-art performance on HPC GPU nodes and even provides OpenAI API compatibility out of the box.
Getting Started
Clone this repository
bash
git clone https://github.com/gwdg/saia-hpc
Create the SSH key on the cloud server, and add a restricted entry via ForceCommand in the authorized_keys file in the HPC cluster following this template:
bash
command="/path/to/cloud_interface.sh",no-port-forwarding,no-X11-forwarding ssh-rsa <public_key>
Initialize the cluster configuration:
- Replace the parameters in config.json with your custom service setup.
- Modify the corresponding service scripts in sbatch/ accordingly.
- Run python scheduler.py init to initialize the scheduler with config.json.
Acknowledgements
We thank all colleagues and partners involved in this project.
Citation
If you use SAIA or Chat AI in your research or services, please cite us as follows:
@misc{doosthosseiniSAIASeamlessSlurmNative2025,
title = {{{SAIA}}: {{A Seamless Slurm-Native Solution}} for {{HPC-Based Services}}},
shorttitle = {{{SAIA}}},
author = {Doosthosseini, Ali and Decker, Jonathan and Nolte, Hendrik and Kunkel, Julian},
year = {2025},
month = jul,
publisher = {Research Square},
issn = {2693-5015},
doi = {10.21203/rs.3.rs-6648693/v1},
url = {https://www.researchsquare.com/article/rs-6648693/v1},
urldate = {2025-07-29},
archiveprefix = {Research Square}
}
Owner
- Name: Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
- Login: gwdg
- Kind: organization
- Email: github@gwdg.de
- Location: Göttingen
- Website: https://www.gwdg.de
- Repositories: 159
- Profile: https://github.com/gwdg
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: SAIA
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Ali
family-names: Doosthosseini
email: ali.doosthosseini@uni-goettingen.de
affiliation: University of Göttingen
orcid: 'https://orcid.org/0000-0002-0654-1268'
- given-names: Jonathan
family-names: Decker
email: jonathan.decker@uni-goettingen.de
affiliation: University of Göttingen
orcid: 'https://orcid.org/0000-0002-7384-7304'
- given-names: Hendrik
family-names: Nolte
email: hendrik.nolte@gwdg.de
affiliation: GWDG
orcid: 'https://orcid.org/0000-0003-2138-8510'
- given-names: Julian
name-particle: M.
family-names: Kunkel
email: julian.kunkel@gwdg.de
affiliation: GWDG
orcid: 'https://orcid.org/0000-0002-6915-1179'
identifiers:
- type: doi
value: 10.21203/rs.3.rs-6648693/v1
- type: url
value: 'https://www.researchsquare.com/article/rs-6648693/v1'
repository-code: 'https://github.com/gwdg/chat-ai'
url: 'https://chat-ai.academiccloud.de'
abstract: >-
Recent developments indicate a shift toward web services
that employ ever larger AI models, e.g., Large Language
Models (LLMs), requiring powerful hardware for inference.
High-Performance Computing (HPC) systems are commonly
equipped with such hardware for the purpose of large scale
computation tasks. However, HPC infrastructure is
inherently unsuitable for hosting real-time web services
due to network, security and scheduling constraints. While
various efforts exist to integrate external scheduling
solutions, these often require compromises in terms of
security or usability for existing HPC users. In this
paper, we present SAIA, a Slurm-native platform consisting
of a scheduler and a proxy. The scheduler interacts with
Slurm to ensure the availability and scalability of
services, while the proxy provides external access, which
is secured via confined SSH commands. We have demonstrated
SAIA’s applicability by deploying a large-scale LLM web
service that has served over 50,000 users.
keywords:
- AI
- HPC
- Slurm
license: GPL-3.0
version: v0.8.1
date-released: '2024-02-22'
GitHub Events
Total
- Release event: 1
- Watch event: 6
- Push event: 3
- Fork event: 1
- Create event: 1
Last Year
- Release event: 1
- Watch event: 6
- Push event: 3
- Fork event: 1
- Create event: 1