prudent-response-surface-models

Prudent Response Surface Models combine predictions with confidence scores and uncertainty levels, allowing their use in downstream analysis even for high-uncertainty or out-of-distribution inputs.

https://github.com/juntyr/prudent-response-surface-models

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary

Keywords

machine-learning out-of-distribution response-surface-model sosaa uncertainty-quantification
Last synced: 6 months ago · JSON representation ·

Repository

Prudent Response Surface Models combine predictions with confidence scores and uncertainty levels, allowing their use in downstream analysis even for high-uncertainty or out-of-distribution inputs.

Basic Info
  • Host: GitHub
  • Owner: juntyr
  • License: cc-by-4.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 37.9 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
machine-learning out-of-distribution response-surface-model sosaa uncertainty-quantification
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Prudent Response Surface Models   DOI URN

Exploring a Framework for Approximating Simulations with Confidence and Certainty


This repository contains all data, code, results, visualisations, and writing produced by Juniper Tyree for their Master's Thesis on "Prudent Response Surface Models" for the M.Sc. Theoretical and Computational Methods programme at the University of Helsinki.

The thesis was supervised by - Prof. Michael Boy, University of Helsinki / LUT University, Finland - Dr Andreas Rupp, LUT University, Finland - Petri Clusius, University of Helsinki, Finland

Abstract

Response Surface Models (RSM) are cheap, reduced complexity, and, usually, statistical models that are fit to the response of more complex models to approximate their outputs with higher computational efficiency. In atmospheric science, there has been a continuous push to reduce the amount of training data required to fit an RSM. With this reduction in costly data gathering, RSMs can be used more ad hoc and quickly adapted to new applications. However, with the decrease in diverse training data, the risk increases that the RSM is eventually used on inputs on which it cannot make a prediction. If there is no indication from the model that its outputs can no longer be trusted, trust in an entire RSM decreases. We present a framework for building prudent RSMs that always output predictions with confidence and uncertainty estimates. We show how confidence and uncertainty can be propagated through downstream analysis such that even predictions on inputs outside the training domain or in areas of high variance can be integrated.

Specifically, we introduce the Icarus RSM architecture, which combines an out-of-distribution detector, a prediction model, and an uncertainty quantifier. Icarus-produced predictions and their uncertainties are conditioned on the confidence that the inputs come from the same distribution that the RSM was trained on. We put particular focus on exploring out-of-distribution detection, for which we conduct a broad literature review, design an intuitive evaluation procedure with three easily-visualisable toy examples, and suggest two methodological improvements. We also explore and evaluate popular prediction models and uncertainty quantifiers.

We use the one-dimensional atmospheric chemistry transport model SOSAA as an example of a complex model for this thesis. We produce a dataset of model inputs and outputs from simulations of the atmospheric conditions along air parcel trajectories that arrived at the SMEAR II measurement station in Hyytiälä, Finland, in May 2018. We evaluate several prediction models and uncertainty quantification methods on this dataset and construct a proof-of-concept SOSAA RSM using the Icarus RSM architecture. The SOSAA RSM is built on pairwise-difference regression using random forests and an auto-associative out-of-distribution detector with a confidence scorer, which is trained with both the original training inputs and new synthetic out-of-distribution samples. We also design a graphical user interface to configure the SOSAA model and trial the SOSAA RSM.

We provide recommendations for out-of-distribution detection, prediction models, and uncertainty quantification based on our exploration of these three systems. We also stress-test the proof-of-concept SOSAA RSM implementation to reveal its limitations for predicting model perturbation outputs and show directions for valuable future research. Finally, our experiments affirm the importance of reporting predictions alongside well-calibrated confidence scores and uncertainty levels so that the predictions can be used with confidence and certainty in scientific research applications.

Motivation

We hope that our framework for building prudent response surface models using the Icarus RSM architecture inspires future work into more advanced implementations and enables the safe integration of machine-learning-based RSMs into more research. While these methods hold great promise, they can only be employed with confidence and certainty if conditioning on confidence scores and uncertainty levels are treated as critical parts of their design. The confidence and uncertainty of such prudent predictions can be propagated through analyses and thus allow for the rigorous analysis of any downstream conclusions.

In particular, we hope that the new SOSAA GUI and RSMs built with Icarus support the integration of higher complexity atmospheric chemistry models into policy-making to fight climate change.

Overview of the Repository

This repository is divided into the following folders:

  • The sosaa/ submodule, which contains a public snapshot of version SOSAA@10618aa of the otherwise not yet publicly available SOSAA model. Please refer to its README for more information.
  • The sosaa-data/experiments/ folder contains the code of the experiments conducted for this project. The Jupyter notebooks found in this directory are each connected to one similarly named section or chapter in the thesis. Note that you can also find the icarus.min.yml and icarus.yml files there, which allow you to set up a conda environment that matches the one used throughout this thesis.
  • The sosaa-data/trajectories/ submodule contains the SOSAA trajectories dataset that was produced for this thesis. Please refer to its README for more information.
  • The sosaa-gui/ submodule contains the graphical user interface for SOSAA. Please refer to its README for more information.
  • The thesis/ folder contains the Master's Thesis itself, written in LaTex, as well as all figures and the SOSAA RSM evaluation results. The install.sh, compile.sh, and clean.sh helper scripts should allow you to reproduce the thesis PDF, main.pdf.

Citation

Please refer to the CITATION.cff file and refer to https://citation-file-format.github.io to extract the citation in a format of your choice.

License

Unless specified otherwise below, this repository is licensed under the CC BY 4.0 license (LICENSE or https://creativecommons.org/licenses/by/4.0/).


at your option.

  • The sosaa-data/trajectories/ submodule, which contains the SOSAA trajectories dataset, is licensed under the CC0 1.0 license (sosaa-data/trajectories/LICENSE or https://creativecommons.org/publicdomain/zero/1.0/).

  • The sosaa-gui/ submodule, which contains the graphical user interface for SOSAA, is licensed under the GPL-3.0 license (sosaa-gui/LICENSE-GPL or https://www.gnu.org/licenses/gpl-3.0.html).

  • The thesis/ folder, which contains the Master's Thesis document, figures, and LaTex source code, is licensed under the CC BY 4.0 license (thesis/LICENSE or https://creativecommons.org/licenses/by/4.0/).

Owner

  • Name: Juniper Tyree
  • Login: juntyr
  • Kind: user
  • Location: Helsinki
  • Company: University of Helsinki

PhD researcher at UH with a passion for the environment and Rust. Graduate of MEng Computing at ICL and MSc Theoretical & Computational Methods at UH.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Prudent Response Surface Models
message: >-
  If you use this work, please cite it using the   metadata
  from this file.
type: software
authors:
  - given-names: Juniper
    family-names: Tyree
    email: juniper.tyree@helsinki.fi
    affiliation: University of Helsinki
    orcid: 'https://orcid.org/0000-0002-7923-9609'
  - given-names: Michael
    family-names: Boy
    email: michael.boy@helsinki.fi
    orcid: 'https://orcid.org/0000-0002-8107-4524'
    affiliation: 'University of Helsinki, LUT University'
  - given-names: Andreas
    family-names: Rupp
    email: andreas.rupp@lut.fi
    affiliation: LUT University
    orcid: 'https://orcid.org/0000-0001-5527-7187'
  - given-names: Petri
    family-names: Clusius
    email: petri.clusius@helsinki.fi
    orcid: 'https://orcid.org/0000-0003-3121-0775'
    affiliation: University of Helsinki
identifiers:
  - type: doi
    value: 10.5281/zenodo.7938527
    description: The Zenodo release
  - type: url
    value: 'http://urn.fi/URN:NBN:fi:hulib-202305151941'
    description: The HELDA record
  - type: url
    value: >-
      https://github.com/juntyr/prudent-response-surface-models/releases/tag/msc-tcm
    description: The GitHub release
repository-code: 'https://github.com/juntyr/prudent-response-surface-models'
abstract: >-
  Prudent Response Surface Models combine predictions with
  confidence scores and uncertainty levels, allowing their
  use in downstream analysis even for high-uncertainty or
  out-of-distribution inputs.
keywords:
  - SOSAA
  - response surface model
  - machine learning
  - out-of-distribution
  - uncertainty quantification
license: CC-BY-4.0
commit: msc-tcm
version: 1.0.0
date-released: '2023-04-27'

GitHub Events

Total
Last Year

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 2
  • Total Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Juniper Tyree j****e@h****i 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels