practical-ml-with-pytorch
Training material on writing machine learning code with PyTorch by ICCS
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Keywords
Repository
Training material on writing machine learning code with PyTorch by ICCS
Basic Info
- Host: GitHub
- Owner: Cambridge-ICCS
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://cambridge-iccs.github.io/practical-ml-with-pytorch/
- Size: 149 MB
Statistics
- Stars: 31
- Watchers: 3
- Forks: 39
- Open Issues: 5
- Releases: 1
Topics
Metadata Files
README.md

ICCS Practical Machine Learning with PyTorch
This repository contains documentation, resources, and code for the Introduction to
Machine Learning with PyTorch session designed and delivered by Jack Atkinson (@jatkinson1000),
Matt Archer @ma595, and Jim Denholm (@jdenholm) of ICCS.
The material has been delivered at both the ICCS
and NCAS summer schools.
All materials, including slides and videos, are available such that individuals can cover the course in their own time.
A website for this workshop can be found at https://cambridge-iccs.github.io/practical-ml-with-pytorch/.
Contents
- Learning Objectives
- Teaching material
- Preparation and prerequisites
- Installation and setup
- JOSE Publication
- License information
- Contribution Guidelines and Support
Learning Objectives
The key learning objective from this workshop could be simply summarised as:
Provide the ability to develop ML models in PyTorch.
However, more specifically we aim to:
- provide an understanding of the structure of a PyTorch model and ML pipeline,
- introduce the different functionalities PyTorch might provide,
- encourage good research software engineering (RSE) practice, and
- exercise careful consideration and understanding of data used for training ML models.
With regards to specific ML content we cover:
- using ML for both classification and regression,
- artificial neural networks (ANNs) and convolutional neural networks (CNNs)
- treatment of both tabular and image data
Teaching Material
Slides
The slides for this workshop can be viewed on the ICCS Summer School Website: - Teaching - Climate Applications
The slides are generated from markdown using quarto. The raw markdown and html files can be found in the slides directory.
Exercises
The exercises for the course can be found in the exercises directory.
These take the form of partially complete jupyter notebooks.
Videos
Videos from past workshops may be useful if you are following along independently.
These can be found on the ICCS youtube channel
under the 2023 Summer School materials.
Worked Solutions
Worked solutions for all of the exercises can be found in the worked solutions directory.
These are for recapping after the course in case you missed anything, and contain ideal solutions complete with
docstrings, outfitted with
type hints,
linted, and conforming to the
black code style.
Preparation and prerequisites
To get the most out of the session we assume a basic understanding in a few areas and for you to do some preparation in advance. Expected knowledge is outlined below, along with resources for reading if you are unfamiliar.
Mathematics and Machine Learning
Basic mathematics knowledge: - calculus - differentiating a function - matrix algebra - matrix multiplication and representing data as a matrix - regression - fitting a function to data
Neural Networks: - Awareness of high-level concepts - We recommend the video series by 3Blue1Brown, at least chapters 1-3.
Python
The course will be taught in python using pyTorch.
Whilst no prior knowledge of pyTorch is expected we assume users are familiar with the basics of Python3.
This includes:
- Basic mathematical operations
- Writing and running scripts/programs
- Writing and using functions
- The concept of object orientation
i.e. that an object, e.g. a dataset, can have associated functions/methods associated with it.
- Basic use of the following libraries:
- numpy for mathematical and array operations
- matplotlib for ploting and visualisation
- pandas for storing and accessing tabular data
- Familiarity with the concept of a jupyter notebook
git and GitHub
You will be expected to know how to - clone and/or fork a repository, - commit, and - push.
The workshop from the 2022 ICCS Summer School should provide the necessary knowledge.
Preparation
In preparation for the course please ensure that your computer contains the following: - A text editor - e.g. vim/neovim, gedit, vscode, sublimetext etc. to open and edit code files - A terminal emulator - e.g. GNOME Terminal, wezterm, Windows Terminal (windows only), iTerm (mac only) - python virtual environment (see Installation and setup)
Note for Windows users: We have linked suitable applications for windows in the above lists. However, you may wish to refer to Windows' getting-started with python information for a complete guide to getting set up on a Windows system.
If you require assistance or further information with any of these please reach out to us before a training session.
Installation and setup
There are three options for participating in this workshop for which instructions are provided below:
- via a local install
- on Google Colab
- on binder
We recommend the local install approach, especially if you forked the repository, as it is the easiest way to keep a copy of your work and push back to GitHub.
However, if you experience issues with the installation process or are unfamiliar with the terminal/installation process there is the option to run the notebooks in Google Colab or on binder.
Local Install
1. Clone or fork the repository
Navigate to the location you want to install this repository on your system and clone
via https by running:
git clone https://github.com/Cambridge-ICCS/practical-ml-with-pytorch.git
This will create a directory practical-ml-with-pytorch/ with the contents of this repository.
Please note that if you have a GitHub account and want to preserve any work you do we suggest you first fork the repository and then clone your fork. This will allow you to push your changes and progress from the workshop back up to your fork for future reference.
2. Create a virtual environment
Before installing any Python packages it is important to first create a Python virtual environment. This provides an insulated environment inside which we can install Python packages without polluting the operating systems' Python environment.
If you have never done this before don't worry: it is very good practise, especially when you are working on multiple projects, and easy to do.
python3 -m venv MLvenv
This will create a directory called MLvenv containing software for the virtual environment.
To activate the environment run:
source MLvenv/bin/activate
You can now work on python from within this isolated environment, installing packages
as you wish without disturbing your base system environment.
When you have finished working on this project run:
deactivate
to deactivate the venv and return to the system python environment.
You can always boot back into the venv as you left it by running the activate command again.
3. Install dependencies
It is now time to install the dependencies for our code, for example PyTorch.
The project has been packaged with a pyproject.toml so can be installed in one go.
From within the root directory in a active virtual environment run:
pip install .
This will download the relevant dependencies into the venv as well as setting up the
datasets that we will be using in the course.\
Whilst the workshop should install and run with the latest versions of python libraries,
it has been tested with following versions for major dependencies: torch 2.0.1,
pandas 2.1.0, palmerpenguins 0.1.4, ipykernel 6.25.2, matplotlib 3.8.0, notebook 7.0.3.
4. Run the notebook
From the current directory, launch the jupyter notebook server:
jupyter notebook
This command should then point you to the right location within your browser to use the notebook, typically http://localhost:8888/.
(Optional) Keep virtual environment persistent in jupyter Notebooks
The following step is sometimes useful if you're having trouble with your jupyter notebook finding the virtual environment. You will want to do this before
launching the jupyter notebook.
python -m ipykernel install --user --name=MLvenv
Google Colab
Running on Colab is useful as it allows you to access GPU resources.
To launch the notebooks in Google Colab click the following links for each of the exercises:
- Exercise 01 - Worked Solution 01
- Exercise 02 - Worked Solution 02
- Exercise 03 - Worked Solution 03
- Exercise 04 - Worked Solution 04
Notes: * Running in Google Colab requires you to have a Google account. * If you leave a Colab session your work will be lost, so be careful to save any work you want to keep.
binder
If you cannot operate using a local install, and do not wish to sign up for a Google account, the repository can be launched on binder.
Notes: * If you leave a binder session your work will be lost, so be careful to save any work you want to keep * Due to the limited resources provided by binder you will struggle to run training in exercises 3 and 4.
JOSE Publication
This workshop has been published in JOSE, the Journal of Open Source Education with DOI: 10.21105/jose.00239. The paper materials can be found in JOSE_paper/ directory.
If you re-use or build on this material please cite this publication using the information in the CITATION.cff file.
@article{Atkinson2024, doi = {10.21105/jose.00239}, url = {https://doi.org/10.21105/jose.00239}, year = {2024}, publisher = {The Open Journal}, volume = {7}, number = {76}, pages = {239}, author = {Jack Atkinson and Jim Denholm}, title = {Practical machine learning with PyTorch}, journal = {Journal of Open Source Education} }
License
The code materials in this project are licensed under the MIT License.
The teaching materials are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Contribution Guidelines and Support
If you spot an issue with the materials please let us know by opening an issue here on GitHub clearly describing the problem.
If you are able to fix an issue that you spot, or an existing open issue please get in touch by commenting on the issue thread.
Contributions from the community are welcome. To contribute back to the repository please first fork it, make the necessary changes to fix the problem, and then open a pull request back to this repository clearly describing the changes you have made. We will then preform a review and merge once ready.
If you would like support using these materials, adapting them to your needs, or delivering them please get in touch either via GitHub or via ICCS.
Owner
- Name: Institute of Computing for Climate Science
- Login: Cambridge-ICCS
- Kind: organization
- Website: https://cambridge-iccs.github.io/
- Twitter: Cambridge_ICCS
- Repositories: 8
- Profile: https://github.com/Cambridge-ICCS
Institute of Computing for Climate Science at the University of Cambridge
Citation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Atkinson
given-names: Jack
orcid: "https://orcid.org/0000-0001-5001-4812"
- family-names: Denholm
given-names: Jim
orcid: "https://orcid.org/0000-0002-2389-3134"
contact:
- family-names: Atkinson
given-names: Jack
orcid: "https://orcid.org/0000-0001-5001-4812"
doi: 10.5281/zenodo.11401113
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Atkinson
given-names: Jack
orcid: "https://orcid.org/0000-0001-5001-4812"
- family-names: Denholm
given-names: Jim
orcid: "https://orcid.org/0000-0002-2389-3134"
date-published: 2024-06-23
doi: 10.21105/jose.00239
issn: 2577-3569
issue: 76
journal: Journal of Open Source Education
publisher:
name: Open Journals
start: 239
title: Practical machine learning with PyTorch
type: article
url: "https://jose.theoj.org/papers/10.21105/jose.00239"
volume: 7
title: Practical machine learning with PyTorch
GitHub Events
Total
- Issues event: 17
- Watch event: 11
- Delete event: 9
- Issue comment event: 10
- Push event: 33
- Pull request review comment event: 13
- Pull request review event: 14
- Pull request event: 14
- Fork event: 7
- Create event: 11
Last Year
- Issues event: 17
- Watch event: 11
- Delete event: 9
- Issue comment event: 10
- Push event: 33
- Pull request review comment event: 13
- Pull request review event: 14
- Pull request event: 14
- Fork event: 7
- Create event: 11
Dependencies
- ipykernel *
- matplotlib *
- notebook *
- palmerpenguins >=0.1.4
- pandas *
- scikit-image *
- torch >=2.0
- torch_tools @ git+https://github.com/jdenholm/TorchTools.git
- torchvision >=0.13
