sds_env

Spatial Data Science Environment

https://github.com/jreades/sds_env

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

markdown python teaching
Last synced: 6 months ago · JSON representation ·

Repository

Spatial Data Science Environment

Basic Info
  • Host: GitHub
  • Owner: jreades
  • License: bsd-3-clause
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 69.5 MB
Statistics
  • Stars: 22
  • Watchers: 3
  • Forks: 23
  • Open Issues: 6
  • Releases: 0
Topics
markdown python teaching
Created over 5 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

sds_env: Spatial Data Science Platform

This is a fork from Dani's work (please see below for citing) to remove R as we don't need this for teaching but do have a few more Python packages that we do use. We've also added some JupyterLab extensions to make interacting with the Lab server a bit easier.

We previously experimented with four approaches to installation: VirtualBox; Vagrant; Docker; and Anaconda Python directly. Each of these has pros and cons, but after careful consideration we have come to the conclusion that Docker is the most robust way to ensure a consistent experience in which all students end up with the same versions of each library, difficult-to-diagnose hardware/OS issues are minimised, and running/recovery is the most straightfoward.

=========================

For most users you should really be looking at this through the GitHub.io web site!

=========================

To Dos

  1. Work out why the Intel (AMD64) image is so much larger than the Apple Silicon (ARM64) image. I can get a decent report using docker history --format "{{.Size}}\t{{.CreatedBy}}" --no-trunc jreades/sds:2023-intel | grep -e "[G]B" which traces the difference back to just two layers:
    • 6.52GB RUN |2 USERNAME=jovyan TARGETPLATFORM=linux/amd64 /bin/bash -c mamba env update -n base --quiet --file ./${yaml_nm} && conda clean --all --yes --force-pkgs-dirs && find /opt/conda/ -follow -type f -name '*.a' -delete && find /opt/conda/ -follow -type f -name '*.pyc' -delete && find /opt/conda/ -follow -type f -name '*.js.map' -delete && pip cache purge && rm -rf /home/$NB_USER/.cache/pip && rm ./${yaml_nm} # buildkit
    • 5.48GB RUN |2 USERNAME=jovyan TARGETPLATFORM=linux/amd64 /bin/bash -c fix-permissions $CONDA_DIR && fix-permissions $HOME # buildkit
    • My guess is that the second command's effect depends on the effects of the first: there are a lot of files modified by the mamba update but they end up with a different/wrong set of permissions from what the fix-permissions script is expecting so it then has to modify the permissions on all of them which almost doubles the size of image.

Using UCL JupyterHub

Creating an Environment (Staff)

  1. Start up the UCL VPN.
  2. Connect to JupyterHub
  3. Authenticate using UCL credentials.
  4. Create a new terminal: File > New > Terminal
Incorrect Instructions from ISD

I think that these instructions are not correct (see below for the alternative) in the sense the use of a symlink can cause problems and duplicated environments down the line:

```shell course_name="casa0013"

ln -s /shared/.../casa/${coursename} $HOME/${coursename}

conda config --add envsdirs /shared/groups/.../casa/${coursename}/envs

curl -o /tmp/casa0013.yml https://raw.githubusercontent.com/jreades/sdsenv/master/conda/environmentpy.yml

conda env create -n casa0013 -f /tmp/casa0013.yml ```

Revised Instructions

I now think that the correct way to do this is:

```shell course_name="casa0013"

conda config --add envs_dirs /shared/groups/.../casa/envs

curl -o /tmp/casa0013.yml https://raw.githubusercontent.com/jreades/sdsenv/master/conda/environmentpy.yml

conda env create -p /shared/groups/.../casa/envs -f /tmp/casa0013.yml ```

However, note that this now means you have .../casa/casa0013/envs/casa0013... so it might be more sensible to set envs_dirs to just ...casa/envs and then have per-module environments underneath that.

Tweaks to environyment_py.yml:

Two shortcomings in the existing approach of generating environment_py.yml were identified and need to be tweaked in the Makefile:

  1. Remove anything with ‘linux’ in it
  2. Remove SOMPY and mrmr
  3. Remove version from gitpython.
  4. Remove python-graphviz entirely.

Additional issues may exist with replication to non-Linux systems.

Connecting to an Existing Environment (PGTAs & Students)

To connect to JupyterHub:

  1. Start up the UCL VPN.
  2. Connect to JupyterHub
  3. Authenticate using UCL credentials.
  4. If you see a URL that ends in tree? please replace this with lab? to get the JupyterLab interface and not the original Jupyter Notebook interface.
  5. Create a new terminal: File > New > Terminal

Note that you need to replace ... with the appropriate path (this will be obvious logged in):

```shell course_name="casa0013"

conda config --append envs_dirs /shared/groups/.../casa/envs

jupyter contrib nbextension install --user ```

Citing

This draws heavily on Dani Arribas-Bel's work for Liverpool. If you use this, you should cite him.

DOI

bibtex @software{hadoop, author = {{Dani Arribas-Bel}}, title = {\texttt{gds_env}: A containerised platform for Geographic Data Science}, url = {https://github.com/darribas/gds_env}, version = {3.0}, date = {2019-08-06}, }

Owner

  • Name: Jon Reades
  • Login: jreades
  • Kind: user
  • Location: London
  • Company: UCL

Associate Prof at the Centre for Advanced Spatial Analysis (UCL).

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: "Urban Spatial Science Platform"
message: >-
  Build files and general-purpose support content
  for students on CASA's Urban Spatial Science programme.
type: software
authors:
  - given-names: Jon
    family-names: Reades
    email: j.reades@ucl.ac.uk
    affiliation: University College London
    orcid: 'https://orcid.org/0000-0002-1443-9263'
repository-code: 'https://github.com/jreades/sds_env/'
url: 'https://jreades.github.io/sds_env/'
repository: 'https://jreades.github.io/sds_env/'
abstract: >-
  This is a set of practical 'setup guides' for our
  USS programme, covering the basic 'tools of the 
  trade' in getting up and running with Python in 
  a virtualised environment as well as other useful
  applications and technologies for coding.
keywords:
  - python
  - programming
  - docker
  - markdown
  - github
license: CC-BY-NC-4.0

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Issues event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request event: 1
  • Fork event: 1