yascheduler

Yet another cloud computing scheduler for the high-throughput cloud scientific simulations

https://github.com/tilde-lab/yascheduler

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary

Keywords

ab-initio azure azure-api-management azure-cloud-services hetzner hetzner-api hetzner-cloud materials-informatics materials-science python queues scheduler upscale

Last synced: 6 months ago · JSON representation ·

Repository

Yet another cloud computing scheduler for the high-throughput cloud scientific simulations

Basic Info

Host: GitHub
Owner: tilde-lab
License: mit
Language: Python
Default Branch: master
Homepage: https://mpds.io/search/ab%20initio%20calculations
Size: 592 KB

Statistics

Stars: 5
Watchers: 1
Forks: 4
Open Issues: 20
Releases: 5

Topics

ab-initio azure azure-api-management azure-cloud-services hetzner hetzner-api hetzner-cloud materials-informatics materials-science python queues scheduler upscale

Created over 6 years ago · Last pushed 9 months ago

Metadata Files

Readme Changelog License Citation

Yet another computing scheduler & cloud orchestration engine

Yascheduler is a simple job scheduler designed for submitting scientific calculations and copying back the results from the computing clouds.

Currently it supports several scientific simulation codes in chemistry and solid state physics. Any other scientific simulation code can be supported via the declarative control template system (see yascheduler.conf settings file). There is an example dummy C++ code with its configuration template.

Installation

Use pip and PyPI: pip install yascheduler.

By default, no cloud connectors are installed. To install the appropriate connector, use one of the commands:

for Microsoft Azure: pip install yascheduler[azure]
for Hetzner Cloud: pip install yascheduler[hetzner]
for UpCloud: pip install yascheduler[upcloud]

The last updates and bugfixes can be obtained cloning the repository:

sh git clone https://github.com/tilde-lab/yascheduler.git pip install yascheduler/

The installation procedure creates the configuration file located at /etc/yascheduler/yascheduler.conf. The file contains credentials for Postgres database access, used directories, cloud providers and scientific simulation codes (called engines). Please check and amend this file with the correct credentials. The database and the system service should then be initialized with yainit script.

Usage

```python from yascheduler import Yascheduler

yac = Yascheduler() label = "test assignment" engine = "pcrystal" structinput = str(...) # simulation control file: crystal structure setupinput = str(...) # simulation control file: main setup, can include structinput result = yac.queuesubmittask( label, {"fort.34": structinput, "INPUT": setup_input}, engine ) print(result) ```

Or run directly in console with yascheduler (use a key -l DEBUG to change the log level).

Supervisor config reads e.g.:

[program:scheduler] command=/usr/local/bin/yascheduler user=root autostart=true autorestart=true stderr_logfile=/data/yascheduler.log stdout_logfile=/data/yascheduler.log

File paths can be set using the environment variables:

YASCHEDULER_CONF_PATH

Configuration file.

Default: /etc/yascheduler/yascheduler.conf

YASCHEDULER_LOG_PATH

Log file path.

Default: /var/log/yascheduler.log

YASCHEDULER_PID_PATH

PID file.

Default: /var/run/yascheduler.pid

Configuration File Reference

Database Configuration `[db]`

Connection to a PostgreSQL database.

user

The username to connect to the PostgreSQL server with.

password

The user password to connect to the server with. This parameter is optional

host

The hostname of the PostgreSQL server to connect with.

port

The TCP/IP port of the PostgreSQL server instance.

Default: 5432

database

The name of the database instance to connect with.

Default: Same as user

Local Settings `[local]`

data_dir

Path to root directory of local data files. Can be relative to the current working directory.

Default: ./data (but it's always a good idea to set up explicitly!)

Example: /srv/yadata

tasks_dir

Path to directory with tasks results.

Default: tasks under data_dir

Example: %(data_dir)s/tasks

keys_dir

Path to directory with SSH keys. Make sure it only contains the private keys.

Default: keys under data_dir

Example: %(data_dir)s/keys

engines_dir

Path to directory with engines repository.

Default: engines under data_dir

Example: %(data_dir)s/engines

webhook_reqs_limit

Maximum number of in-flight webhook http requests.

Default: 5

conn_machine_limit

Maximum number of concurrent SSH connection's connect requests.

Default: 10

conn_machine_pending

Maximum number of pending SSH connection's connect requests.

Default: 10

allocate_limit

Maximum number of concurrent task or node allocation requests.

Default: 20

allocate_pending

Maximum number of pending task or node allocation requests.

Default: 1

consume_limit

Maximum number of concurrent task's results downloads.

Default: 20

consume_pending

Maximum number of pending task's results downloads.

Default: 1

deallocate_limit

Maximum number of concurrent node deallocation requests.

Default: 5

deallocate_pending

Maximum number of pending node deallocation requests.

Default: 1

Remote Settings `[remote]`

data_dir

Path to root directory of data files on remote node. Can be relative to the remote current working directory (usually $HOME).

Default: ./data

Example: /src/yadata

tasks_dir

Path to directory with tasks results on remote node.

Default: tasks under data_dir

Example: %(data_dir)s/tasks

engines_dir

Path to directory with engines on remote node.

Default: engines under data_dir

Example: %(data_dir)s/engines

user

Default ssh username.

Default: root

jump_user

Username of default SSH jump host (if used).

jump_host

Host of default SSH jump host (if used).

Providers `[clouds]`

All cloud providers settings are set in the [cloud] group. Each provider has its own settings prefix.

These settings are common to all the providers:

*_max_nodes

The maximum number of nodes for a given provider. The provider is not used if the value is less than 1.

*_user

Per provider override of remote.user.

*_priority

Per provider priority of node allocation. Sorted in descending order, so the cloud with the highest value is the first.

*_idle_tolerance

Per provider idle tolerance (in seconds) for deallocation of nodes.

Default: different for providers, starting from 120 seconds.

*_jump_user

Username of this cloud SSH jump host (if used).

*_jump_host

Host of this cloud SSH jump host (if used).

Hetzner

Settings prefix is hetzner.

hetzner_token

API token with Read & Write permissions for the project.

hetzner_server_type

Server type (size).

Default: cx52

hetzner_location

Location name.

hetzner_image_name

Image name for new nodes.

Default: debian-11

Azure

Azure Cloud should be pre-configured for yascheduler. See Cloud Providers.

Settings prefix is az.

az_tenant_id

Tenant ID of Azure Active Directory.

az_client_id

Application ID.

az_client_secret

Client Secret value from the Application Registration.

az_subscription_id

Subscription ID

az_resource_group

Resource Group name.

Default: yascheduler-rg

az_user

SSH username. root is not supported.

az_location

Default location for resources.

Default: westeurope

az_vnet

Virtual network name.

Default: yascheduler-vnet

az_subnet

Subnet name.

Default: yascheduler-subnet

az_nsg

Network security group name.

Default: yascheduler-nsg

az_vm_image

OS image name.

Default: Debian

az_vm_size

Machine size.

Default: Standard_B1s

UpCloud

Settings prefix is upcloud.

upcloud_login

Username.

upcloud_password

Password.

Engines `[engine.*]`

Supported engines should be defined in the section(s) [engine.name]. The name is alphanumeric string to represent the real engine name. Once set, it cannot be changed later.

platforms

List of supported platform, separated by space or newline.

Default: debian-10 Example: mY-cOoL-OS another-cool-os

platform_packages

A list of required packages, separated by space or newline, which will be installed by the system package manager.

Default: [] Example: openmpi-bin wget

deploy_local_files

A list of filenames, separated by space or newline, which will be copied from local %(engines_dir)s/%(engine_name)s to remote %(engines_dir)s/%(engine_name)s. Conflicts with deploy_local_archive and deploy_remote_archive.

Example: dummyengine

deploy_local_archive

A name of the local archive (.tar.gz) which will be copied from local %(engines_dir)s/%(engine_name)s to the remote machine and then unarchived to the %(engines_dir)s/%(engine_name)s. Conflicts with deploy_local_archive and deploy_remote_archive.

Example: dummyengine.tar.gz

deploy_remote_archive

The url to the engine arhive (.tar.gz) which will be downloaded to the remote machine and then unarchived to the %(engines_dir)s/%(engine_name)s. Conflicts with deploy_local_archive and deploy_remote_archive.

Example: https://example.org/dummyengine.tar.gz

spawn

This command is used by the scheduler to initiate calculations.

```sh cp {taskpath}/INPUT OUTPUT && mpirun -np {ncpus} --allow-run-as-root \ -wd {taskpath} {engine_path}/Pcrystal >> OUTPUT 2>&1

```

Example: {engine_path}/gulp < INPUT > OUTPUT

check_pname

Process name used to check that the task is still running. Conflicts with check_cmd.

Example: dummyengine

check_cmd

Command used to check that the task is still running. Conflicts with check_pname. See also check_cmd_code.

Example: ps ax -ocomm= | grep -q dummyengine

check_cmd_code

Expected exit code of command from check_cmd. If code matches than task is running.

Default: 0

sleep_interval

Interval in seconds between the task checks. Set to a higher value if you are expecting long running jobs.

Default: 10

input_files

A list of task input file names, separated by a space or new line, that will be copied to the remote directory of the task before it is started. The first input is considered as the main input.

Example: INPUT sibling.file

output_files

A list of task output file names, separated by a space or new line, that will be copied from the remote directory of the task after it is finished.

Example: INPUT OUTPUT

Aiida Integration

See the detailed instructions for the MPDS-AiiDA-CRYSTAL workflows as well as the ansible-mpds repository. In essence:

sh ssh aiidauser@localhost # important reentry scan verdi computer setup verdi computer test $COMPUTER verdi code setup

License

Owner

Name: Tilde
Login: tilde-lab
Kind: organization
Email: support@tilde.pro
Location: The Internet

Website: https://tilde.pro
Repositories: 27
Profile: https://github.com/tilde-lab

Tilde Materials Informatics Virtual Lab

Citation (CITATION.cff)

cff-version: 1.2.0
title: yascheduler
type: software
license: MIT
authors:
  - given-names: Sergei
    family-names: Korolev
    orcid: 'https://orcid.org/0009-0003-0771-206X'
  - given-names: Andrey
    family-names: Sobolev
    orcid: 'https://orcid.org/0000-0001-5086-6601'
  - given-names: Evgeny
    family-names: Blokhin
    orcid: 'https://orcid.org/0000-0002-5333-3947'
doi: 10.5281/zenodo.7693555
url: 'https://github.com/tilde-lab/yascheduler'
repository-artifact: 'https://pypi.org/project/yascheduler'
keywords:
  - scheduler
  - materials science
  - ab initio
  - materials informatics
  - azure cloud
  - upscale cloud
  - hetzner cloud

GitHub Events

Total

Create event: 7
Release event: 3
Issues event: 6
Delete event: 2
Issue comment event: 12
Push event: 14
Pull request review comment event: 5
Pull request review event: 15
Pull request event: 17

Last Year

Create event: 7
Release event: 3
Issues event: 6
Delete event: 2
Issue comment event: 12
Push event: 14
Pull request review comment event: 5
Pull request review event: 15
Pull request event: 17

Committers

Last synced: 10 months ago

All Time

Total Commits: 331
Total Committers: 7
Avg Commits per committer: 47.286
Development Distribution Score (DDS): 0.505

Past Year

Commits: 26
Committers: 4
Avg Commits per committer: 6.5
Development Distribution Score (DDS): 0.385

Top Committers

Name	Email	Commits
Sergey Korolev	k**g@g**m	164
Evgeny Blokhin	eb@t****o	142
Andrey Sobolev	a**v@g**m	14
github-actions[bot]	g****]	5
Anton Domnin	a**n@g**m	4
whitesource-bolt-for-github[bot]	4****]	1
fossabot	b**s@f**o	1

Committer Domains (Top 20 + Academic)

fossa.io: 1 tilde.pro: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 49
Total pull requests: 91
Average time to close issues: 6 months
Average time to close pull requests: 1 day
Total issue authors: 5
Total pull request authors: 4
Average comments per issue: 0.96
Average comments per pull request: 0.69
Merged pull requests: 89
Bot issues: 6
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 16
Average time to close issues: N/A
Average time to close pull requests: 2 days
Issue authors: 1
Pull request authors: 3
Average comments per issue: 0.0
Average comments per pull request: 1.06
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

blokhin (38)
mend-bolt-for-github[bot] (6)
alinzh (2)
knopki (1)
akvatol (1)

Pull Request Authors

knopki (73)
blokhin (15)
akvatol (6)
fossabot (1)

Top Labels

Issue Labels

enhancement (20) bug (12) documentation (7) help wanted (7) security vulnerability (5) question (4) invalid (3) aiida (3) wontfix (3) Mend: dependency security vulnerability (1)

Pull Request Labels

enhancement (5) aiida (1)

Packages

Total packages: 1
Total downloads:
- pypi 317 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 9
Total maintainers: 3

pypi.org: yascheduler

Yet another computing scheduler and cloud orchestration engine

Homepage: https://github.com/tilde-lab/yascheduler
Documentation: https://yascheduler.readthedocs.io/
License: mit
Latest release: 1.5.0
published 9 months ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 317 Last month

Rankings

Downloads: 6.7%

Dependent packages count: 7.5%

Average: 14.9%

Forks count: 17.1%

Stargazers count: 20.5%

Dependent repos count: 22.5%

Maintainers (3)

tilde ansobolev knopki

Last synced: 7 months ago

Dependencies

.github/workflows/linter.yml actions

actions/cache v3 composite
actions/checkout v2 composite
actions/setup-python v4 composite
mfinelli/setup-shfmt v1 composite

.github/workflows/pr.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/push.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
commitizen-tools/commitizen-action e41bf7f2029bc8175af362badd6fd0860a329b0f composite
softprops/action-gh-release de2c0eb89ae2a093876385947365aca7b0e5f844 composite

.github/workflows/release.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
pypa/gh-action-pypi-publish release/v1 composite

pyproject.toml pypi

aiohttp ~=3.8
asyncssh ~=2.11
asyncstdlib ~=3.10
attrs ~=21.0
azure-identity ~=1.10.0
azure-mgmt-compute ~=27.2.0
azure-mgmt-network ~=20.0.0
backoff ~=2.1.2
hcloud ~=1.17
pg8000 ~=1.19
python-daemon ~=2.3
typing-extensions >= 4.2.0; python_version < '3.11'
upcloud_api ~=2.0

yascheduler

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Yet another computing scheduler & cloud orchestration engine

Installation

Usage

Configuration File Reference

Database Configuration [db]

Local Settings [local]

Remote Settings [remote]

Providers [clouds]

Hetzner

Azure

UpCloud

Engines [engine.*]

Aiida Integration

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: yascheduler

Rankings

Maintainers (3)

Dependencies

Database Configuration `[db]`

Local Settings `[local]`

Remote Settings `[remote]`

Providers `[clouds]`

Engines `[engine.*]`