infra
Instructions for system administrators to deploy the eWaterCycle platform
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Scientific Fields
Repository
Instructions for system administrators to deploy the eWaterCycle platform
Basic Info
Statistics
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 23
- Releases: 6
Topics
Metadata Files
README.md
Instructions for system administrators to deploy the eWaterCycle platform
- Instructions for system administrators to deploy the eWaterCycle platform
- Setup of eWaterCycle platform on the SURF Research cloud
- Setup of eWaterCycle platform on a local test VM
- SURF Reseach cloud catalog item registration
- SURF Research cloud workspace
- Shared data source
- Preparations
- File Server
- Workspace creation with dcache as shared data source
- Workspace creation with samba as shared data source
- Students
- Example notebooks
- Docker images
- AI Disclaimer
This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the eWaterCycle documentation.
With grading setup is one class, one grader.
For instructions on how to use the machine as deployed by this repo see the User guide.
These instructions assume you have some basic knowledge of vagrant and Ansible.
Setup of eWaterCycle platform on the SURF Research cloud
The hardware environment used by the eWaterCycle platform development team is the SURF Research Cloud. Starting a machine on the Surf Research Cloud requires that you have research budget with SURF, for more info see the website of SURF. Once running, access to the machine can be shared to anyone.
The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with:
- Jupyter Hub: to interactivly generate forcings and perform experiments on hydrological models using the eWatercycle Python package
- nbgrader for grading
- nbgitpuller to open a cloned git repository in Jupyter Lab from an URL
- ERA5 and ERA-Interim global climate data, which can be used to generate forcings
- Installed models and their example parameter sets
An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).
In addition to the standard VM storage, additional read-only datasets are mounted at /data/shared from a file server like a samba server or a dcache server. They may contain things like:
- climate data, see https://ewatercycle.readthedocs.io/en/latest/system_setup.html#download-climate-data
- observation
- parameter-sets
- singularity-images of hydrological models wrapped in grpc4bmi servers
See File server chapter for more information on the file server.
Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see v0.1.2 release for that code.
Setup of eWaterCycle platform on a local test VM
For developing the SURF Research Cloud applications locally you can use the Vagrant instructions
SURF Reseach cloud catalog item registration
To register the eWaterCycle platform on the SURF Research cloud, follow instructions in SURF Research cloud developer document.
SURF Research cloud workspace
This chapter is dedicated for application deployers. A workspace is name for a Virtual Machine (VM) on the SURF Research cloud. The workspace is created with the eWaterCycle application from the catalog.
Shared data source
The eWatercycle system setup requires a lot of data files.
Two eWaterCycle catalog items have been created: 1. eWaterCycle dcache, uses dcache as shared data source. High capacity, but high latency storage accessible via WebDAV from anywhere on the Internet. Usefull for research. 2. eWaterCycle samba, uses samba as shared data source. A low capacity, low latency file server that is only accessible from the private network of the SURF Research cloud. Usefull for teaching.
The shared data is mounted read-only /data/shared on the workspaces.
In the following chapters you will need to make choose which catalog item you want to use.
Depending on the choice, you need to do certain things.
Preparations
Before you can create a workspace several steps need to be done first.
- Log into SURF Research Cloud
- Make sure you are allowed to use eWaterCycle catalog item
- Create new storage item for home directories (no capital letters)
- To store user files
- Use 50Gb size for simple experiments or bigger when required for experiment.
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
- If shared data source is dcache then create new storage item for dcache cache
- To store cached files from dCache by rclone
- Use 50GB size as size
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
- If shared data source is samba then create new storage item for data
- To store training material like parameter sets, ready-to-use forcings, raw forcings and apptainer sif files for models.
- This storage item should be used later in the Samba file server.
- If shared data source is samba then create a private network: under "Workspaces" "Storage" "IP addresses (advanced)" "Networks (advanced)"
- Name:
file-storage-network
- Name:
- On https://portal.live.surfresearchcloud.nl/profile page in Collaborative organizations
- Create a secret named
samba_passwordand a strong random password as value - Create a secret named
dcache_ro_tokenand a dcache read-only token as value
- Create a secret named
To become root on a VM the user needs to be member of the src_co_admin group on SRAM.
See docs.
File Server
If you want to create a eWaterCycle machine (aka workspace) that uses a Samba file server (aka shared data source is samba), you need to create a Samba file server first.
Each collaborative organization should run a single file server. This file server will be used to store read-only shared data. The file server should be created with the following steps:
- Create a new workspace
- Select
Samba Serverapplication - Select size with 2 cores and 16 GB RAM for a big data storage, otherwise 1 core is enough.
- Select data storage item, created in previous section
- Select private network
- Wait for machine to be running
- Login to machine with ssh
- Become root with
sudo -i - Edit /etc/samba/smb.conf and in
[samba-share]section replaceread only = nowithread only = yes - Restart samba server with
systemctl restart smbd
- Become root with
- Populate
/data/volume_2/samba-share/directory with training material. This directory will be shared with other machines. - (Optional) do this on the samba server after the git clone!
- For ENVM1502 you want to change the main yml file
- And also change the task yml file to only keep the "Download example parameter sets" and "Download example forcing."
See data documentation on how to populate the file server.
Workspace creation with dcache as shared data source
Steps to create a eWaterCycle workspace:
- Create a new workspace
- Select collaborative organisation (CO) for example
ewatercycle-tudelft - Select
eWaterCycle dcachecatalog item - Select size of VM (cpus/memory) based on use case
- Select storage item for home directories. Remember item you picked as you will need it in the workspace parameters.
- Select storage item for dcache cache. Remember item you picked as you will need it in the workspace parameters.
- Fill all the workspace parameters. They should look something like
If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'. - Wait for machine to be running
- Visit URL/IP
- When done delete machine
End user should be invited to Collaborative organization in SRAM or created as students so they can login.
See User guide to see what users have to do to login or use GitHub repository.
Workspace creation with samba as shared data source
Steps to create a eWaterCycle workspace:
- Create a new workspace
- Select collaborative organisation (CO) for example
ewatercycle-tudelft - Select
eWaterCycle sambacatalog item - Select size of VM (cpus/memory) based on use case
- Select home storage item. Remember items you picked as you will need them in the workspace parameters.
- Select the private network
- Fill all the workspace parameters. They should look something like
If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'. - Wait for machine to be running
- Visit URL/IP
- When done delete machine
End user should be invited to Collaborative organization in SRAM or created as students so they can login.
See User guide to see what users have to do to login or use GitHub repository.
Students
During creation you can set the students parameter to create local posix accounts for students.
The format of the parameter value is <username1>:<password1>,<username2>:<password2>.
Use emtpy string for no students.
Make sure to use strong passwords as anyone on the internet can access the machine.
You can use the python script createstudentpasswords.py to generate passwords. To use it, create a file "usernames.txt" with one username on each line. Then call the script to generate passwords. They will be stored in a new file called students.txt. See docs in script for more details. The passwords generated by the script should be distributed to the students.
Example notebooks
To get example notebooks end users should goto to the machines homepage and click one of the notebook links.
These links use nbgitpuller to sync a git repo and open a notebook in it.
Restrict memory and cpu usage of JupyterHub
To restrict the memory and cpu usage of each Jupyter user, you can edit the /etc/jupyterhub/jupyterhub_config.py file on the workspace. Add the following lines to the file:
```python
Each user can use at most 4G of memory and 1 CPU
c.SystemdSpawner.memlimit = '4G' c.SystemdSpawner.cpulimit = 1.0 ``` See JupyterHub Systemdspawner docs for more information.
Reload configuration with sudo systemctl restart jupyterhub.
By default the each user can use all the memory and cpu of the machine.
Docker images
In the eWaterCycle project we make Docker images. The images are hosted on Docker Hub and GitHub Container Registry. A project member can create issues here for permisison to push images to Docker Hub or GitHub Container Registry.
AI Disclaimer
The documentation/software code in this repository has been generated and/or refined using GitHub CoPilot. All AI-output has been verified for correctness, accuracy and completeness, adapted where needed, and approved by the author.
Owner
- Name: eWaterCycle
- Login: eWaterCycle
- Kind: organization
- Website: http://www.ewatercycle.org
- Repositories: 74
- Profile: https://github.com/eWaterCycle
Citation (CITATION.cff)
# YAML 1.2
---
authors:
-
affiliation: "Netherlands eScience Center"
family-names: Verhoeven
given-names: Stefan
orcid: "https://orcid.org/0000-0002-5821-2060"
-
affiliation: "Netherlands eScience Center"
family-names: Drost
given-names: Niels
orcid: "https://orcid.org/0000-0001-9795-7981"
-
affiliation: "Netherlands eScience Center"
family-names: Weel
given-names: Berend
orcid: "https://orcid.org/0000-0002-9693-9332"
-
affiliation: "Netherlands eScience Center"
family-names: Kalverla
given-names: Peter
orcid: "https://orcid.org/0000-0002-5025-7862"
-
affiliation: "Netherlands eScience Center"
family-names: Alidoost
given-names: Fakhereh
orcid: "https://orcid.org/0000-0001-8407-6472"
-
affiliation: "Netherlands eScience Center"
family-names: Andela
given-names: Bouwe
orcid: "https://orcid.org/0000-0001-9005-8940"
-
affiliation: "Delft University of Technology"
family-names: Melotto
given-names: Mark
orcid: "https://orcid.org/0009-0005-2727-660X"
email: "markmelotto@tudelft.nl"
-
cff-version: "1.2.0"
identifiers:
- type: doi
value: 10.5281/zenodo.1462548
description: This the identifier used to uniquely identify the software as a concept (i.e., version-agnostic).
license: Apache-2.0
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/eWaterCycle/infra"
title: "eWaterCycle infra"
GitHub Events
Total
- Issues event: 12
- Delete event: 3
- Issue comment event: 16
- Push event: 17
- Pull request event: 10
- Pull request review event: 6
- Pull request review comment event: 3
- Create event: 3
Last Year
- Issues event: 12
- Delete event: 3
- Issue comment event: 16
- Push event: 17
- Pull request event: 10
- Pull request review event: 6
- Pull request review comment event: 3
- Create event: 3
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Stefan Verhoeven | s****n@g****m | 466 |
| Peter Kalverla | p****a@g****m | 23 |
| Berend Weel | b****l@e****l | 12 |
| Bouwe Andela | b****a@e****l | 9 |
| Bart Schilperoort | b****t@e****l | 6 |
| SarahAlidoost | f****t@e****l | 6 |
| Niels Drost | n****t@e****l | 6 |
| Niels Drost | n****t@e****l | 4 |
| ipelupessy | i****y | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 81
- Total pull requests: 43
- Average time to close issues: 9 months
- Average time to close pull requests: 30 days
- Total issue authors: 8
- Total pull request authors: 6
- Average comments per issue: 1.79
- Average comments per pull request: 1.86
- Merged pull requests: 40
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 6
- Average time to close issues: about 1 month
- Average time to close pull requests: 24 days
- Issue authors: 4
- Pull request authors: 2
- Average comments per issue: 0.43
- Average comments per pull request: 0.33
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sverhoeven (64)
- Peter9192 (9)
- BSchilperoort (6)
- SarahAlidoost (5)
- RolfHut (2)
- MarkMelotto (2)
- nielsdrost (1)
- bouweandela (1)
Pull Request Authors
- sverhoeven (32)
- BSchilperoort (5)
- MarkMelotto (4)
- Peter9192 (3)
- SarahAlidoost (1)
- nielsdrost (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ansible *
- ansible-lint *
- docker-py *
- molecule *
- passlib *
- actions/checkout v2 composite
- actions/setup-python v2 composite