saia-hub
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: gwdg
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 637 KB
Statistics
- Stars: 8
- Watchers: 4
- Forks: 1
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
SAIA - the Scalable AI Accelerator - Hub
This repository contains the server components of the Scalable AI Accelerator SAIA, which hosts AI services such as Chat AI. The implementation of the remaining components of the complete architecture for Chat AI can be found in two other repos: - Stand-alone web interface: https://github.com/gwdg/chat-ai - HPC-side components, incl. scheduler and slurm scripts: https://github.com/gwdg/saia-hpc
Together these repos provide the entire underyling mechanism for Chat AI, which can be generalized as a slurm-native HPC web service.
SAIA Hub
SAIA is the umbrella brand that hosts our AI services. It consists of an API gateway which performs routing and user management, and several proxies that connect to various backend services such as an HPC center or external cloud service, in order to route incoming requests to these upstream connections. All components can be deployed as docker containers; the configuration is provided in the docker-compose.yml file.
API Gateway
The API Gateway, Kong OSS, is the entrypoint of incoming requests. Since it doesn't support OAuth integration out-of-the-box, we run Apache with a module named OpenIDC only for the purpose of integration with our OAuth provider AcademicCloud's Single-Sign-On (SSO); all requests are then routed directly to Kong. Kong can perform user management, load balancing, health checks, offering API keys or tokens for direct access to certain routes, and also integrates with monitoring services such as Prometheus.
SSH-based proxy
In a typical HPC cluster setting, the high-performance compute nodes that are capable of running Large Language Models (LLMs) may not be directly accessible from the internet. In these circumstances, the requests from the web server would have to go through an entry point to the cluster, for example a login node or service node. Furthermore, direct tunneling and port forwarding may be forbidden as a security mechanism, and only certain protocols such as SSH may be allowed.
Therefore, the HPC proxy runs on the cloud server and uses an SSH key to establish a connection to the cluster's entrypoint, i.e. the login/service node. For security reasons, the SSH key hosted on the cloud server is restricted to always run a single script on the login node, namely cloud_interface.sh and is never actually given a shell instance. This prevents direct access to the cluster even if the web server is compromised. The restriction to run this script is implemented by configuring the ForceCommand directive in SSH for this specific SSH key; this can be set in the ~/.ssh/authorized_keys file of an HPC user or functional account without root access.
External proxy
Alternatively, some services may be hosted on the internet by other service providers such as OpenAI or Microsoft. To maintain full control over the communications to these service providers, it is vital to send and receive such requests from the web server. The API gateway treats this external proxy as an upstream server, which then indirectly forwards incoming requests to the service providers.
Getting started
Make sure you have docker installed.
bash
docker --version
Clone this repository and navigate to the root folder.
bash
git clone https://github.com/gwdg/saia-hub
cd saia-hub
Create the secrets/POSTGRES_PASSWORD file and set the database password in it.
Start the database and Kong API gateway:
bash
docker compose up db -d
docker compose up kong-migrations
docker compose up kong -d
You should now be able to access the Kong dashboard via localhost:8002, the database via localhost:8001 and the admin API via localhost:8000, or whatever port number is defined in docker-compose.yml.
SSH-based proxies
Create an SSH key on the cloud server, and add it as a ForceCommand-restricted entry to the ~/.ssh/authorized_keys file on the HPC cluster:
bash
command="/path/to/cloud_interface.sh",no-port-forwarding,no-X11-forwarding ssh-rsa <public_key>
Configure the secrets and environment variables in the docker-compose.yml file following the template in order to establish the connections to the upstream HPC services:
PORT=8721
HPC_HOST=1.2.3.4
HPC_USER=u12345
KEY_NAME=my-ssh-key
my-ssh-key:
file: ./secrets/my-ssh-key # Path to SSH key
Then, start the proxy:
bash
docker compose build proxy-kisski
docker compose up proxy-kisski
It is possible to define multiple proxies in the docker-compose.yml file. Specific routes can be configured to each proxy in Kong.
External proxies
The azure proxy enables access to OpenAI models hosted through Microsoft Azure. To run this proxy, create a secrets/openai_config.json file according to the provided information in docker-compose.yml and secrets/openai_config.json.sample. Then, start the proxy:
bash
docker compose build proxy-azure
docker compose up proxy-azure
Database backup and restore
The two scripts tools/db_backup.sh and tools/db_restore.sh provide the possibility to store and restore backups of the database, which contains all routes, services, consumer/users and other configurations that are used in Kong.
Acknowledgements
We thank all colleagues and partners involved in this project.
Citation
If you use SAIA or Chat AI in your research or services, please cite us as follows:
@misc{doosthosseiniSAIASeamlessSlurmNative2025,
title = {{{SAIA}}: {{A Seamless Slurm-Native Solution}} for {{HPC-Based Services}}},
shorttitle = {{{SAIA}}},
author = {Doosthosseini, Ali and Decker, Jonathan and Nolte, Hendrik and Kunkel, Julian},
year = {2025},
month = jul,
publisher = {Research Square},
issn = {2693-5015},
doi = {10.21203/rs.3.rs-6648693/v1},
url = {https://www.researchsquare.com/article/rs-6648693/v1},
urldate = {2025-07-29},
archiveprefix = {Research Square}
}
Owner
- Name: Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
- Login: gwdg
- Kind: organization
- Email: github@gwdg.de
- Location: Göttingen
- Website: https://www.gwdg.de
- Repositories: 159
- Profile: https://github.com/gwdg
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: SAIA
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Ali
family-names: Doosthosseini
email: ali.doosthosseini@uni-goettingen.de
affiliation: University of Göttingen
orcid: 'https://orcid.org/0000-0002-0654-1268'
- given-names: Jonathan
family-names: Decker
email: jonathan.decker@uni-goettingen.de
affiliation: University of Göttingen
orcid: 'https://orcid.org/0000-0002-7384-7304'
- given-names: Hendrik
family-names: Nolte
email: hendrik.nolte@gwdg.de
affiliation: GWDG
orcid: 'https://orcid.org/0000-0003-2138-8510'
- given-names: Julian
name-particle: M.
family-names: Kunkel
email: julian.kunkel@gwdg.de
affiliation: GWDG
orcid: 'https://orcid.org/0000-0002-6915-1179'
identifiers:
- type: doi
value: 10.21203/rs.3.rs-6648693/v1
- type: url
value: 'https://www.researchsquare.com/article/rs-6648693/v1'
repository-code: 'https://github.com/gwdg/chat-ai'
url: 'https://chat-ai.academiccloud.de'
abstract: >-
Recent developments indicate a shift toward web services
that employ ever larger AI models, e.g., Large Language
Models (LLMs), requiring powerful hardware for inference.
High-Performance Computing (HPC) systems are commonly
equipped with such hardware for the purpose of large scale
computation tasks. However, HPC infrastructure is
inherently unsuitable for hosting real-time web services
due to network, security and scheduling constraints. While
various efforts exist to integrate external scheduling
solutions, these often require compromises in terms of
security or usability for existing HPC users. In this
paper, we present SAIA, a Slurm-native platform consisting
of a scheduler and a proxy. The scheduler interacts with
Slurm to ensure the availability and scalability of
services, while the proxy provides external access, which
is secured via confined SSH commands. We have demonstrated
SAIA’s applicability by deploying a large-scale LLM web
service that has served over 50,000 users.
keywords:
- AI
- HPC
- Slurm
license: GPL-3.0
version: v0.8.1
date-released: '2024-02-22'
GitHub Events
Total
- Create event: 2
- Issues event: 1
- Release event: 1
- Watch event: 8
- Delete event: 1
- Push event: 3
Last Year
- Create event: 2
- Issues event: 1
- Release event: 1
- Watch event: 8
- Delete event: 1
- Push event: 3
Dependencies
- python 3.11-slim-buster build
- python 3.12.8-slim-bookworm build
- fastapi ==0.104.1
- httpcore ==1.0.2
- httpx ==0.25.1
- openai ==1.12.0
- paramiko ==3.3.1
- pillow ==11.1.0
- pyyaml ==6.0.1
- requests ==2.31.0
- starlette ==0.27.0
- tiktoken ==0.8.0
- tqdm ==4.66.1
- uvicorn ==0.24.0.post1
- aiohttp ==3.11.11
- anyio ==4.8.0
- fastapi ==0.115.8
- httpcore ==1.0.7
- httpx ==0.28.1
- openai ==1.61.0
- orjson ==3.10.15
- paramiko ==3.5.0
- pyyaml ==6.0.2
- requests ==2.32.0
- starlette ==0.45.3
- tqdm ==4.67.1
- uvicorn ==0.34.0
- uvloop ==0.21.0