saia-hpc

https://github.com/gwdg/saia-hpc

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: gwdg
License: gpl-3.0
Language: Python
Default Branch: main
Size: 626 KB

Statistics

Stars: 6
Watchers: 6
Forks: 2
Open Issues: 1
Releases: 1

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Citation

SAIA - the Scalable AI Accelerator - HPC

Documentation | Paper

This repository contains the HPC-side components of the Scalable AI Accelerator SAIA, which hosts AI services such as Chat AI. The implementation of the remaining components of the complete architecture for Chat AI can be found in two other repos: - Stand-alone web interface: https://github.com/gwdg/chat-ai - Server components, incl. API gateway and SSH proxy: https://github.com/gwdg/saia-hub

Chat AI architecture

Together these repos provide the entire underyling mechanism for Chat AI, which can be generalized as a slurm-native HPC web service.

SAIA HPC

This repository contains the tools and scripts to deploy a consistent and scalable service on a Slurm-based HPC center with the ability to integrate with the web server provided in SAIA Hub.

SSH-based proxy

In a typical HPC cluster setting, the high-performance compute nodes that are capable of running Large Language Models (LLMs) may not be directly accessible from the internet. In these circumstances, the requests from the web server would have to go through an entry point to the cluster, for example a login node or service node. Furthermore, direct tunneling and port forwarding may be forbidden as a security mechanism, and only certain protocols such as SSH may be allowed.

Therefore, the HPC proxy runs on the cloud server and uses an SSH key to establish a connection to the cluster's entrypoint, i.e. the login/service node. For security reasons, the SSH key hosted on the cloud server is restricted to always run a single script on the login node, namely cloud_interface.sh and is never actually given a shell instance. This prevents direct access to the cluster even if the web server is compromised. The restriction to run this script is implemented by configuring the ForceCommand directive in SSH for this specific SSH key; this can be set in the ~/.ssh/authorized_keys file of an HPC user or functional account without root access, like this:

bash command="/path/to/cloud_interface.sh",no-port-forwarding,no-X11-forwarding ssh-rsa <public_key>

Scheduler

The task of the scheduler script scheduler.py is to run reliable and scalable HPC services on a Slurm-based HPC cluster. A configuration for the desired services should be set in config.json, and once the scheduler is initialized with python scheduler.py init, everything else is done automatically.

When an SSH proxy is running and configured to connect to the HPC center, it periodically sends keep-alive prompts to maintain the established SSH connection. Due to the ForceCommand directive, these prompts actually run the cloud_interface.sh, which in turn periodically runs the scheduler. The scheduler maintains the state of active backend jobs in the services/cluster.services file, and makes sure that there are always sufficient available jobs to handle the incoming requests, scaling up and down the jobs based on demand. It can also log timestamps of user requests which can be used for accounting purposes.

For each service, some configuration parameters must be provided, most notably a Slurm batch (sbatch) script with which the scheduler can submit Slurm jobs in order to host the service on a high-performance compute node, possibly with GPUs. The submission of Slurm jobs is handled automatically by the scheduler, as it preemptively resubmits jobs that are about to expire and randomly assigns a port number for each job, which the service should listen to for incoming requests.

Services

The implementation of the scheduler and proxy has been abstracted from the service itself, meaning it should be possible to run any REST service within this framework. As for the Chat AI service, we simply use vLLM. vLLM is capable of hosting cutting-edge LLMs with state-of-the-art performance on HPC GPU nodes and even provides OpenAI API compatibility out of the box.

Getting Started

Clone this repository

bash git clone https://github.com/gwdg/saia-hpc

Create the SSH key on the cloud server, and add a restricted entry via ForceCommand in the authorized_keys file in the HPC cluster following this template:

bash command="/path/to/cloud_interface.sh",no-port-forwarding,no-X11-forwarding ssh-rsa <public_key>

Initialize the cluster configuration: - Replace the parameters in config.json with your custom service setup. - Modify the corresponding service scripts in sbatch/ accordingly. - Run python scheduler.py init to initialize the scheduler with config.json.

Acknowledgements

We thank all colleagues and partners involved in this project.

Citation

If you use SAIA or Chat AI in your research or services, please cite us as follows:

@misc{doosthosseiniSAIASeamlessSlurmNative2025, title = {{{SAIA}}: {{A Seamless Slurm-Native Solution}} for {{HPC-Based Services}}}, shorttitle = {{{SAIA}}}, author = {Doosthosseini, Ali and Decker, Jonathan and Nolte, Hendrik and Kunkel, Julian}, year = {2025}, month = jul, publisher = {Research Square}, issn = {2693-5015}, doi = {10.21203/rs.3.rs-6648693/v1}, url = {https://www.researchsquare.com/article/rs-6648693/v1}, urldate = {2025-07-29}, archiveprefix = {Research Square} }

Owner

Name: Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
Login: gwdg
Kind: organization
Email: github@gwdg.de
Location: Göttingen

Website: https://www.gwdg.de
Repositories: 159
Profile: https://github.com/gwdg

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: SAIA
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Ali
    family-names: Doosthosseini
    email: ali.doosthosseini@uni-goettingen.de
    affiliation: University of Göttingen
    orcid: 'https://orcid.org/0000-0002-0654-1268'
  - given-names: Jonathan
    family-names: Decker
    email: jonathan.decker@uni-goettingen.de
    affiliation: University of Göttingen
    orcid: 'https://orcid.org/0000-0002-7384-7304'
  - given-names: Hendrik
    family-names: Nolte
    email: hendrik.nolte@gwdg.de
    affiliation: GWDG
    orcid: 'https://orcid.org/0000-0003-2138-8510'
  - given-names: Julian
    name-particle: M.
    family-names: Kunkel
    email: julian.kunkel@gwdg.de
    affiliation: GWDG
    orcid: 'https://orcid.org/0000-0002-6915-1179'
identifiers:
  - type: doi
    value: 10.21203/rs.3.rs-6648693/v1
  - type: url
    value: 'https://www.researchsquare.com/article/rs-6648693/v1'
repository-code: 'https://github.com/gwdg/chat-ai'
url: 'https://chat-ai.academiccloud.de'
abstract: >-
  Recent developments indicate a shift toward web services
  that employ ever larger AI models, e.g., Large Language
  Models (LLMs), requiring powerful hardware for inference.
  High-Performance Computing (HPC) systems are commonly
  equipped with such hardware for the purpose of large scale
  computation tasks. However, HPC infrastructure is
  inherently unsuitable for hosting real-time web services
  due to network, security and scheduling constraints. While
  various efforts exist to integrate external scheduling
  solutions, these often require compromises in terms of
  security or usability for existing HPC users. In this
  paper, we present SAIA, a Slurm-native platform consisting
  of a scheduler and a proxy. The scheduler interacts with
  Slurm to ensure the availability and scalability of
  services, while the proxy provides external access, which
  is secured via confined SSH commands. We have demonstrated
  SAIA’s applicability by deploying a large-scale LLM web
  service that has served over 50,000 users.
keywords:
  - AI
  - HPC
  - Slurm
license: GPL-3.0
version: v0.8.1
date-released: '2024-02-22'

GitHub Events

Total

Release event: 1
Watch event: 6
Push event: 3
Fork event: 1
Create event: 1

Last Year

Release event: 1
Watch event: 6
Push event: 3
Fork event: 1
Create event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science