tengen

Reference solar irradiance spectrum datasets manager

https://github.com/rayference/tengen

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

radiative-transfer reference-data solar-spectrum

Keywords from Contributors

mesh sequences interactive hacking network-simulation
Last synced: 6 months ago · JSON representation ·

Repository

Reference solar irradiance spectrum datasets manager

Basic Info
  • Host: GitHub
  • Owner: rayference
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 514 KB
Statistics
  • Stars: 3
  • Watchers: 0
  • Forks: 0
  • Open Issues: 1
  • Releases: 2
Topics
radiative-transfer reference-data solar-spectrum
Created over 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation

README.md

Tengen

Reference solar irradiance spectrum datasets manager.

GitHub license GitHub release (latest SemVer) uv

Aim

A number of reference solar irradiance spectra has been made available. However, their original format are various and often non-standard. The aim of this repository is to gather all of these reference solar irradiance spectra at the same place under a unique and well-defined standard format and in a manner that supports data traceability.

Motivation

The need to organise and manage reference solar irradiance spectrum datasets originated in the development of the Eradiate radiative transfer model. Such a radiative transfer model takes a solar irradiance spectrum as input to a radiative transfer simulation. The radiative transfer model usually does not work directly with the original data but instead stores the corresponding data under a specific format. To convert the data to the specific format, the original data is transformed. This comes with two challenges:

  • the transformation algorithm must not introduce any error
  • the data traceability must be preserved

which Tengen aims to address.

Install

After cloning the repository and navigating to the root directory, install the project with uv:

shell uv sync

Usage

Simply run the notebooks you are interested in.

Notebooks

The work of downloading and converting raw data for each solar irradiance spectrum to a unique format is stored in Jupyter notebooks, under notebooks/. The idea is to have one notebook per solar irradiance spectrum, or per group of spectra if the latter somehow come together, e.g. different observation time periods or different spectral resolutions associated to the same observation data.

For example, the thuillier_2003.ipynb notebook downloads the raw data for the well-known Thuillier (2003) reference solar irradiance spectrum and converts it to the unique format.

Run a notebook

To generate the dataset(s), run the corresponding notebook(s).

Run from the command line

You can run a notebook from the command line, using the nbconvert library. For example, the whi_2008.ipynb notebook is executed with:

shell jupyter nbconvert --to notebook --execute notebooks/whi_2008.ipynb

Run all notebooks with:

shell jupyter nbconvert --to notebook --execute notebooks/*.ipynb

Write a notebook

Each notebook follows a template defined by notebooks/tempplate.ipynb. A notebook is divided into four sections: * a Setup section: this is where imports are made and global information about the dataset is set * a Download section: this is where the function to download the raw data is implemented * a Format section: this is where the function to format the raw data to the Tengen format is implemented * a Run section: identical to all notebooks, executing the cells in this section will download and format the dataset(s) and save them in temporary files or in the cache depending on the value of UPDATE_CACHE.

To write a new notebook, begin by copying the template and modify it to provide the required information and methods implementation. In case of doubt, take example on existing notebooks.

Before pushing your notebook to the repository, make sure to run nbstripout on it to remove cells outputs:

shell nbstripout notebooks/your_notebook.ipynb

Dataset format and schema

Every notebook produces solar irradiance spectrum datasets with the same unique format and schema, which are described here.

Format

Datasets comply with the netCDF format.

Schema

Variables

The dataset contains one data variable: * the solar spectral irradiance, denoted ssi.

The solar spectral irradiance has two dimensions: * a time dimension, denoted t, * a wavelength dimension, denoted w.

The time dimension refers to the time at which the solar spectral irradiance was observed. Associated to these two dimensions are two coordinate variables, denoted t and w, respectively.

| Symbol | Long name | Standard name | Units | | :----: | :-------------------------: | :------------------------------------: | :-------------: | | ssi | solar spectral irradiance | solar_irradiance_per_unit_wavelength | W / m **2/ nm | | w | wavelength | radiation_wavelength | nm | | t | time | time | days |

Metadata

Dataset metadata comply with the NetCDF Climate and Forecast (CF) Metadata Conventions.

The following dataset metadata are set:

  • title: the title of the dataset
  • institution: the institution where the original data was produced
  • source: the method of production of the original data
  • history: the history of transformations that the original data has undergone
  • references: the publications of web-based references that describe the original data and/or the methods used to produce it
  • data_url: the URL where the original data has been downloaded from
  • data_url_datetime: the date and time at which the original data has been downloaded

Traceability

Data traceability means that one is able to track all the transformations that a dataset has undergone from its original form to its current form. Tengen cannot guarantee data traceability but it strives to provide the means to do so. When a notebook is run, the original data is downloaded and converted, i.e., transformed, to the Tengen format.

image

To preserve the traceability of the data, the following information is stored in the dataset metadata:

  • that date and time at which the dataset was created, including the corresponding Tengen version (history)
  • the original data URL (data_url)
  • the date and time at which the original data was downloaded (data_url_datetime)

The attribute history create a link between the transformed data and the transformation algorithms (this repository) whereas the attributes data_url and data_url_datetime create a link between the original data and the transformed data.

image

If any one of these two links is broken, the traceability of the data is lost.

Since the existence and accessibility of the original data cannot be guaranteed, data that was downloaded from a URL may not be available anymore at a later date.

image

This is the reason why Tengen cannot guarantee data traceability. This is also the reason why a cache system is provided.

Cache

A cache is managed that stores the original (raw) and formatted data. By default, running a notebook does not populate the cache. To make it so, modify the following line in the Setup section of a notebbok:

python UPDATE_CACHE = False # change to True to update the cache when running this notebook

and change the value to True as indicated in the comment.

Owner

  • Name: Rayference
  • Login: rayference
  • Kind: organization
  • Location: Belgium

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Tengen
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Yvan
    family-names: Nollet
    affiliation: Rayference
    orcid: 'https://orcid.org/0000-0002-6241-444X'
identifiers:
  - type: url
    value: 'https://github.com/nollety/tengen/releases/tag/v23.2.0'
    description: The GitHub release URL of tag v23.2.0
repository-code: 'https://github.com/nollety/tengen'
abstract: Reference solar irradiance spectrum datasets manager.
keywords:
  - solar-spectrum
  - radiative-transfer
  - reference-data
license: MIT
commit: b518e4c2da0b4bb0c0d402efbed4badedc20f91c
version: v23.2.0
date-released: '2023-01-17'

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 87
  • Total Committers: 2
  • Avg Commits per committer: 43.5
  • Development Distribution Score (DDS): 0.011
Past Year
  • Commits: 29
  • Committers: 1
  • Avg Commits per committer: 29.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yvan Nollet y****t@r****u 86
dependabot[bot] 4****] 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 1
  • Total pull requests: 99
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 months
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.98
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 97
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nollety (1)
Pull Request Authors
  • dependabot[bot] (97)
  • nollety (2)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels
dependencies (97) python (78) github_actions (19)

Dependencies

pyproject.toml pypi
  • Pygments ^2.9.0 develop
  • black ^21.12b0 develop
  • coverage ^5.4 develop
  • darglint ^1.8.0 develop
  • flake8 ^3.9.2 develop
  • flake8-bandit ^2.1.2 develop
  • flake8-bugbear ^21.4.3 develop
  • flake8-docstrings ^1.6.0 develop
  • flake8-rst-docstrings ^0.2.3 develop
  • ipykernel ^5.5.5 develop
  • jupyterlab ^3.2.4 develop
  • matplotlib ^3.5.0 develop
  • mypy ^0.902 develop
  • pep8-naming ^0.11.1 develop
  • pre-commit ^2.13.0 develop
  • pre-commit-hooks ^4.0.1 develop
  • pytest ^6.2.4 develop
  • reorder-python-imports ^2.5.0 develop
  • safety ^1.10.3 develop
  • sphinx ^4.0.2 develop
  • sphinx-autobuild ^2021.3.14 develop
  • sphinx-click ^3.0.1 develop
  • sphinx-rtd-theme ^0.5.2 develop
  • typeguard ^2.12.1 develop
  • xdoctest ^0.15.4 develop
  • Pint ^0.17
  • click ^8.0.1
  • dask ^2021.11.2
  • h5netcdf ^0.11.0
  • netCDF4 ^1.5.7
  • numpy ^1.20.3
  • pandas ^1.2.4
  • python ^3.7.1
  • requests ^2.25.1
  • xarray ^0.18.2