infra

Instructions for system administrators to deploy the eWaterCycle platform

https://github.com/ewatercycle/infra

Keywords

ansible ewatercycle hydrology

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 60% confidence

Last synced: 6 months ago · JSON representation ·

Repository

Instructions for system administrators to deploy the eWaterCycle platform

Basic Info

Host: GitHub
Owner: eWaterCycle
License: apache-2.0
Language: Jinja
Default Branch: main
Homepage:
Size: 2.27 MB

Statistics

Stars: 0
Watchers: 5
Forks: 0
Open Issues: 23
Releases: 6

Topics

ansible ewatercycle hydrology

Created over 7 years ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

Instructions for system administrators to deploy the eWaterCycle platform

Instructions for system administrators to deploy the eWaterCycle platform

This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the eWaterCycle documentation.

With grading setup is one class, one grader.

For instructions on how to use the machine as deployed by this repo see the User guide.

These instructions assume you have some basic knowledge of vagrant and Ansible.

Setup of eWaterCycle platform on the SURF Research cloud

The hardware environment used by the eWaterCycle platform development team is the SURF Research Cloud. Starting a machine on the Surf Research Cloud requires that you have research budget with SURF, for more info see the website of SURF. Once running, access to the machine can be shared to anyone.

The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with:

Jupyter Hub: to interactivly generate forcings and perform experiments on hydrological models using the eWatercycle Python package
- nbgrader for grading
- nbgitpuller to open a cloned git repository in Jupyter Lab from an URL
ERA5 and ERA-Interim global climate data, which can be used to generate forcings
Installed models and their example parameter sets

An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).

In addition to the standard VM storage, additional read-only datasets are mounted at /data/shared from a file server like a samba server or a dcache server. They may contain things like:

climate data, see https://ewatercycle.readthedocs.io/en/latest/system_setup.html#download-climate-data
observation
parameter-sets
singularity-images of hydrological models wrapped in grpc4bmi servers

See File server chapter for more information on the file server.

Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see v0.1.2 release for that code.

Setup of eWaterCycle platform on a local test VM

For developing the SURF Research Cloud applications locally you can use the Vagrant instructions

SURF Reseach cloud catalog item registration

To register the eWaterCycle platform on the SURF Research cloud, follow instructions in SURF Research cloud developer document.

SURF Research cloud workspace

This chapter is dedicated for application deployers. A workspace is name for a Virtual Machine (VM) on the SURF Research cloud. The workspace is created with the eWaterCycle application from the catalog.

Shared data source

The eWatercycle system setup requires a lot of data files.

Two eWaterCycle catalog items have been created: 1. eWaterCycle dcache, uses dcache as shared data source. High capacity, but high latency storage accessible via WebDAV from anywhere on the Internet. Usefull for research. 2. eWaterCycle samba, uses samba as shared data source. A low capacity, low latency file server that is only accessible from the private network of the SURF Research cloud. Usefull for teaching.

The shared data is mounted read-only /data/shared on the workspaces. In the following chapters you will need to make choose which catalog item you want to use. Depending on the choice, you need to do certain things.

Preparations

Before you can create a workspace several steps need to be done first.

Log into SURF Research Cloud
Make sure you are allowed to use eWaterCycle catalog item
Create new storage item for home directories (no capital letters)
- To store user files
- Use 50Gb size for simple experiments or bigger when required for experiment.
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
If shared data source is dcache then create new storage item for dcache cache
- To store cached files from dCache by rclone
- Use 50GB size as size
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
If shared data source is samba then create new storage item for data
- To store training material like parameter sets, ready-to-use forcings, raw forcings and apptainer sif files for models.
- This storage item should be used later in the Samba file server.
If shared data source is samba then create a private network: under "Workspaces" "Storage" "IP addresses (advanced)" "Networks (advanced)"
- Name: file-storage-network
On https://portal.live.surfresearchcloud.nl/profile page in Collaborative organizations
- Create a secret named samba_password and a strong random password as value
- Create a secret named dcache_ro_token and a dcache read-only token as value

To become root on a VM the user needs to be member of the src_co_admin group on SRAM. See docs.

File Server

If you want to create a eWaterCycle machine (aka workspace) that uses a Samba file server (aka shared data source is samba), you need to create a Samba file server first.

Each collaborative organization should run a single file server. This file server will be used to store read-only shared data. The file server should be created with the following steps:

Create a new workspace
Select Samba Server application
Select size with 2 cores and 16 GB RAM for a big data storage, otherwise 1 core is enough.
Select data storage item, created in previous section
Select private network
Wait for machine to be running
Login to machine with ssh
1. Become root with sudo -i
2. Edit /etc/samba/smb.conf and in [samba-share] section replace read only = no with read only = yes
3. Restart samba server with systemctl restart smbd
Populate /data/volume_2/samba-share/ directory with training material. This directory will be shared with other machines.
(Optional) do this on the samba server after the git clone!
- For ENVM1502 you want to change the main yml file
- And also change the task yml file to only keep the "Download example parameter sets" and "Download example forcing."

See data documentation on how to populate the file server.

Workspace creation with dcache as shared data source

Steps to create a eWaterCycle workspace:

Create a new workspace
Select collaborative organisation (CO) for example ewatercycle-tudelft
Select eWaterCycle dcache catalog item
Select size of VM (cpus/memory) based on use case
Select storage item for home directories. Remember item you picked as you will need it in the workspace parameters.
Select storage item for dcache cache. Remember item you picked as you will need it in the workspace parameters.
Fill all the workspace parameters. They should look something like If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'.
Wait for machine to be running
Visit URL/IP
When done delete machine

End user should be invited to Collaborative organization in SRAM or created as students so they can login.

See User guide to see what users have to do to login or use GitHub repository.

Workspace creation with samba as shared data source

Steps to create a eWaterCycle workspace:

Create a new workspace
Select collaborative organisation (CO) for example ewatercycle-tudelft
Select eWaterCycle samba catalog item
Select size of VM (cpus/memory) based on use case
Select home storage item. Remember items you picked as you will need them in the workspace parameters.
Select the private network
Fill all the workspace parameters. They should look something like If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'.
Wait for machine to be running
Visit URL/IP
When done delete machine

End user should be invited to Collaborative organization in SRAM or created as students so they can login.

See User guide to see what users have to do to login or use GitHub repository.

Students

During creation you can set the students parameter to create local posix accounts for students. The format of the parameter value is <username1>:<password1>,<username2>:<password2>. Use emtpy string for no students. Make sure to use strong passwords as anyone on the internet can access the machine.

You can use the python script createstudentpasswords.py to generate passwords. To use it, create a file "usernames.txt" with one username on each line. Then call the script to generate passwords. They will be stored in a new file called students.txt. See docs in script for more details. The passwords generated by the script should be distributed to the students.

Example notebooks

To get example notebooks end users should goto to the machines homepage and click one of the notebook links.

These links use nbgitpuller to sync a git repo and open a notebook in it.

Restrict memory and cpu usage of JupyterHub

To restrict the memory and cpu usage of each Jupyter user, you can edit the /etc/jupyterhub/jupyterhub_config.py file on the workspace. Add the following lines to the file:

```python

Each user can use at most 4G of memory and 1 CPU

c.SystemdSpawner.memlimit = '4G' c.SystemdSpawner.cpulimit = 1.0 ``` See JupyterHub Systemdspawner docs for more information.

Reload configuration with sudo systemctl restart jupyterhub.

By default the each user can use all the memory and cpu of the machine.

Docker images

In the eWaterCycle project we make Docker images. The images are hosted on Docker Hub and GitHub Container Registry. A project member can create issues here for permisison to push images to Docker Hub or GitHub Container Registry.

AI Disclaimer

The documentation/software code in this repository has been generated and/or refined using GitHub CoPilot. All AI-output has been verified for correctness, accuracy and completeness, adapted where needed, and approved by the author.

Owner

Name: eWaterCycle
Login: eWaterCycle
Kind: organization

Website: http://www.ewatercycle.org
Repositories: 74
Profile: https://github.com/eWaterCycle

Citation (CITATION.cff)

# YAML 1.2
---
authors:
  -
    affiliation: "Netherlands eScience Center"
    family-names: Verhoeven
    given-names: Stefan
    orcid: "https://orcid.org/0000-0002-5821-2060"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Drost
    given-names: Niels
    orcid: "https://orcid.org/0000-0001-9795-7981"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Weel
    given-names: Berend
    orcid: "https://orcid.org/0000-0002-9693-9332"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Kalverla
    given-names: Peter
    orcid: "https://orcid.org/0000-0002-5025-7862"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Alidoost
    given-names: Fakhereh
    orcid: "https://orcid.org/0000-0001-8407-6472"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Andela
    given-names: Bouwe
    orcid: "https://orcid.org/0000-0001-9005-8940"
  -
    affiliation: "Delft University of Technology"
    family-names: Melotto
    given-names: Mark
    orcid: "https://orcid.org/0009-0005-2727-660X"
    email: "markmelotto@tudelft.nl"
  -

cff-version: "1.2.0"
identifiers:
  - type: doi
    value: 10.5281/zenodo.1462548
    description: This the identifier used to uniquely identify the software as a concept (i.e., version-agnostic).
license: Apache-2.0
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/eWaterCycle/infra"
title: "eWaterCycle infra"

GitHub Events

Total

Issues event: 12
Delete event: 3
Issue comment event: 16
Push event: 17
Pull request event: 10
Pull request review event: 6
Pull request review comment event: 3
Create event: 3

Last Year

Issues event: 12
Delete event: 3
Issue comment event: 16
Push event: 17
Pull request event: 10
Pull request review event: 6
Pull request review comment event: 3
Create event: 3

Committers

Last synced: about 2 years ago

All Time

Total Commits: 533
Total Committers: 9
Avg Commits per committer: 59.222
Development Distribution Score (DDS): 0.126

Past Year

Commits: 43
Committers: 2
Avg Commits per committer: 21.5
Development Distribution Score (DDS): 0.14

Top Committers

Name	Email	Commits
Stefan Verhoeven	s**n@g**m	466
Peter Kalverla	p**a@g**m	23
Berend Weel	b**l@e**l	12
Bouwe Andela	b**a@e**l	9
Bart Schilperoort	b**t@e**l	6
SarahAlidoost	f**t@e**l	6
Niels Drost	n**t@e**l	6
Niels Drost	n**t@e**l	4
ipelupessy	i****y	1

Committer Domains (Top 20 + Academic)

esciencecenter.nl: 5 esciencecente.nl: 1 gmx.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 81
Total pull requests: 43
Average time to close issues: 9 months
Average time to close pull requests: 30 days
Total issue authors: 8
Total pull request authors: 6
Average comments per issue: 1.79
Average comments per pull request: 1.86
Merged pull requests: 40
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 7
Pull requests: 6
Average time to close issues: about 1 month
Average time to close pull requests: 24 days
Issue authors: 4
Pull request authors: 2
Average comments per issue: 0.43
Average comments per pull request: 0.33
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sverhoeven (64)
Peter9192 (9)
BSchilperoort (6)
SarahAlidoost (5)
RolfHut (2)
MarkMelotto (2)
nielsdrost (1)
bouweandela (1)

Pull Request Authors

sverhoeven (32)
BSchilperoort (5)
MarkMelotto (4)
Peter9192 (3)
SarahAlidoost (1)
nielsdrost (1)

Top Labels

Issue Labels

enhancement (2) bug (1)

infra

Science Score: 67.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Instructions for system administrators to deploy the eWaterCycle platform

Setup of eWaterCycle platform on the SURF Research cloud

Setup of eWaterCycle platform on a local test VM

SURF Reseach cloud catalog item registration

SURF Research cloud workspace

Shared data source

Preparations

File Server

Workspace creation with dcache as shared data source

Workspace creation with samba as shared data source

Students

Example notebooks

Restrict memory and cpu usage of JupyterHub

Each user can use at most 4G of memory and 1 CPU

Docker images

AI Disclaimer

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies