Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: gwdg
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 637 KB
Statistics
  • Stars: 8
  • Watchers: 4
  • Forks: 1
  • Open Issues: 1
  • Releases: 1
Created over 1 year ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

SAIA - the Scalable AI Accelerator - Hub

Documentation | Paper

This repository contains the server components of the Scalable AI Accelerator SAIA, which hosts AI services such as Chat AI. The implementation of the remaining components of the complete architecture for Chat AI can be found in two other repos: - Stand-alone web interface: https://github.com/gwdg/chat-ai - HPC-side components, incl. scheduler and slurm scripts: https://github.com/gwdg/saia-hpc

Chat AI architecture

Together these repos provide the entire underyling mechanism for Chat AI, which can be generalized as a slurm-native HPC web service.

SAIA Hub

SAIA is the umbrella brand that hosts our AI services. It consists of an API gateway which performs routing and user management, and several proxies that connect to various backend services such as an HPC center or external cloud service, in order to route incoming requests to these upstream connections. All components can be deployed as docker containers; the configuration is provided in the docker-compose.yml file.

API Gateway

The API Gateway, Kong OSS, is the entrypoint of incoming requests. Since it doesn't support OAuth integration out-of-the-box, we run Apache with a module named OpenIDC only for the purpose of integration with our OAuth provider AcademicCloud's Single-Sign-On (SSO); all requests are then routed directly to Kong. Kong can perform user management, load balancing, health checks, offering API keys or tokens for direct access to certain routes, and also integrates with monitoring services such as Prometheus.

SSH-based proxy

In a typical HPC cluster setting, the high-performance compute nodes that are capable of running Large Language Models (LLMs) may not be directly accessible from the internet. In these circumstances, the requests from the web server would have to go through an entry point to the cluster, for example a login node or service node. Furthermore, direct tunneling and port forwarding may be forbidden as a security mechanism, and only certain protocols such as SSH may be allowed.

Therefore, the HPC proxy runs on the cloud server and uses an SSH key to establish a connection to the cluster's entrypoint, i.e. the login/service node. For security reasons, the SSH key hosted on the cloud server is restricted to always run a single script on the login node, namely cloud_interface.sh and is never actually given a shell instance. This prevents direct access to the cluster even if the web server is compromised. The restriction to run this script is implemented by configuring the ForceCommand directive in SSH for this specific SSH key; this can be set in the ~/.ssh/authorized_keys file of an HPC user or functional account without root access.

External proxy

Alternatively, some services may be hosted on the internet by other service providers such as OpenAI or Microsoft. To maintain full control over the communications to these service providers, it is vital to send and receive such requests from the web server. The API gateway treats this external proxy as an upstream server, which then indirectly forwards incoming requests to the service providers.

Getting started

Make sure you have docker installed.

bash docker --version

Clone this repository and navigate to the root folder.

bash git clone https://github.com/gwdg/saia-hub cd saia-hub

Create the secrets/POSTGRES_PASSWORD file and set the database password in it.

Start the database and Kong API gateway: bash docker compose up db -d docker compose up kong-migrations docker compose up kong -d

You should now be able to access the Kong dashboard via localhost:8002, the database via localhost:8001 and the admin API via localhost:8000, or whatever port number is defined in docker-compose.yml.

SSH-based proxies

Create an SSH key on the cloud server, and add it as a ForceCommand-restricted entry to the ~/.ssh/authorized_keys file on the HPC cluster:

bash command="/path/to/cloud_interface.sh",no-port-forwarding,no-X11-forwarding ssh-rsa <public_key>

Configure the secrets and environment variables in the docker-compose.yml file following the template in order to establish the connections to the upstream HPC services:

PORT=8721 HPC_HOST=1.2.3.4 HPC_USER=u12345 KEY_NAME=my-ssh-key

my-ssh-key: file: ./secrets/my-ssh-key # Path to SSH key

Then, start the proxy: bash docker compose build proxy-kisski docker compose up proxy-kisski

It is possible to define multiple proxies in the docker-compose.yml file. Specific routes can be configured to each proxy in Kong.

External proxies

The azure proxy enables access to OpenAI models hosted through Microsoft Azure. To run this proxy, create a secrets/openai_config.json file according to the provided information in docker-compose.yml and secrets/openai_config.json.sample. Then, start the proxy: bash docker compose build proxy-azure docker compose up proxy-azure

Database backup and restore

The two scripts tools/db_backup.sh and tools/db_restore.sh provide the possibility to store and restore backups of the database, which contains all routes, services, consumer/users and other configurations that are used in Kong.

Acknowledgements

We thank all colleagues and partners involved in this project.

Citation

If you use SAIA or Chat AI in your research or services, please cite us as follows:

@misc{doosthosseiniSAIASeamlessSlurmNative2025, title = {{{SAIA}}: {{A Seamless Slurm-Native Solution}} for {{HPC-Based Services}}}, shorttitle = {{{SAIA}}}, author = {Doosthosseini, Ali and Decker, Jonathan and Nolte, Hendrik and Kunkel, Julian}, year = {2025}, month = jul, publisher = {Research Square}, issn = {2693-5015}, doi = {10.21203/rs.3.rs-6648693/v1}, url = {https://www.researchsquare.com/article/rs-6648693/v1}, urldate = {2025-07-29}, archiveprefix = {Research Square} }

Owner

  • Name: Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
  • Login: gwdg
  • Kind: organization
  • Email: github@gwdg.de
  • Location: Göttingen

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: SAIA
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Ali
    family-names: Doosthosseini
    email: ali.doosthosseini@uni-goettingen.de
    affiliation: University of Göttingen
    orcid: 'https://orcid.org/0000-0002-0654-1268'
  - given-names: Jonathan
    family-names: Decker
    email: jonathan.decker@uni-goettingen.de
    affiliation: University of Göttingen
    orcid: 'https://orcid.org/0000-0002-7384-7304'
  - given-names: Hendrik
    family-names: Nolte
    email: hendrik.nolte@gwdg.de
    affiliation: GWDG
    orcid: 'https://orcid.org/0000-0003-2138-8510'
  - given-names: Julian
    name-particle: M.
    family-names: Kunkel
    email: julian.kunkel@gwdg.de
    affiliation: GWDG
    orcid: 'https://orcid.org/0000-0002-6915-1179'
identifiers:
  - type: doi
    value: 10.21203/rs.3.rs-6648693/v1
  - type: url
    value: 'https://www.researchsquare.com/article/rs-6648693/v1'
repository-code: 'https://github.com/gwdg/chat-ai'
url: 'https://chat-ai.academiccloud.de'
abstract: >-
  Recent developments indicate a shift toward web services
  that employ ever larger AI models, e.g., Large Language
  Models (LLMs), requiring powerful hardware for inference.
  High-Performance Computing (HPC) systems are commonly
  equipped with such hardware for the purpose of large scale
  computation tasks. However, HPC infrastructure is
  inherently unsuitable for hosting real-time web services
  due to network, security and scheduling constraints. While
  various efforts exist to integrate external scheduling
  solutions, these often require compromises in terms of
  security or usability for existing HPC users. In this
  paper, we present SAIA, a Slurm-native platform consisting
  of a scheduler and a proxy. The scheduler interacts with
  Slurm to ensure the availability and scalability of
  services, while the proxy provides external access, which
  is secured via confined SSH commands. We have demonstrated
  SAIA’s applicability by deploying a large-scale LLM web
  service that has served over 50,000 users.
keywords:
  - AI
  - HPC
  - Slurm
license: GPL-3.0
version: v0.8.1
date-released: '2024-02-22'

GitHub Events

Total
  • Create event: 2
  • Issues event: 1
  • Release event: 1
  • Watch event: 8
  • Delete event: 1
  • Push event: 3
Last Year
  • Create event: 2
  • Issues event: 1
  • Release event: 1
  • Watch event: 8
  • Delete event: 1
  • Push event: 3

Dependencies

docker-compose.yml docker
proxy-azure/Dockerfile docker
  • python 3.11-slim-buster build
proxy-hpc/Dockerfile docker
  • python 3.12.8-slim-bookworm build
proxy-azure/requirements.txt pypi
  • fastapi ==0.104.1
  • httpcore ==1.0.2
  • httpx ==0.25.1
  • openai ==1.12.0
  • paramiko ==3.3.1
  • pillow ==11.1.0
  • pyyaml ==6.0.1
  • requests ==2.31.0
  • starlette ==0.27.0
  • tiktoken ==0.8.0
  • tqdm ==4.66.1
  • uvicorn ==0.24.0.post1
proxy-hpc/requirements.txt pypi
  • aiohttp ==3.11.11
  • anyio ==4.8.0
  • fastapi ==0.115.8
  • httpcore ==1.0.7
  • httpx ==0.28.1
  • openai ==1.61.0
  • orjson ==3.10.15
  • paramiko ==3.5.0
  • pyyaml ==6.0.2
  • requests ==2.32.0
  • starlette ==0.45.3
  • tqdm ==4.67.1
  • uvicorn ==0.34.0
  • uvloop ==0.21.0