tengen

Reference solar irradiance spectrum datasets manager

https://github.com/rayference/tengen

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Keywords

radiative-transfer reference-data solar-spectrum

Keywords from Contributors

mesh sequences interactive hacking network-simulation

Last synced: 10 months ago · JSON representation ·

Repository

Reference solar irradiance spectrum datasets manager

Basic Info

Host: GitHub
Owner: rayference
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 514 KB

Statistics

Stars: 3
Watchers: 0
Forks: 0
Open Issues: 1
Releases: 2

Topics

radiative-transfer reference-data solar-spectrum

Created about 5 years ago · Last pushed 10 months ago

Metadata Files

Readme Changelog License Citation

Tengen

Reference solar irradiance spectrum datasets manager.

GitHub license GitHub release (latest SemVer)

Aim

A number of reference solar irradiance spectra has been made available. However, their original format are various and often non-standard. The aim of this repository is to gather all of these reference solar irradiance spectra at the same place under a unique and well-defined standard format and in a manner that supports data traceability.

Motivation

The need to organise and manage reference solar irradiance spectrum datasets originated in the development of the Eradiate radiative transfer model. Such a radiative transfer model takes a solar irradiance spectrum as input to a radiative transfer simulation. The radiative transfer model usually does not work directly with the original data but instead stores the corresponding data under a specific format. To convert the data to the specific format, the original data is transformed. This comes with two challenges:

the transformation algorithm must not introduce any error
the data traceability must be preserved

which Tengen aims to address.

Install

After cloning the repository and navigating to the root directory, install the project with uv:

shell uv sync

Usage

Simply run the notebooks you are interested in.

Notebooks

The work of downloading and converting raw data for each solar irradiance spectrum to a unique format is stored in Jupyter notebooks, under notebooks/. The idea is to have one notebook per solar irradiance spectrum, or per group of spectra if the latter somehow come together, e.g. different observation time periods or different spectral resolutions associated to the same observation data.

For example, the thuillier_2003.ipynb notebook downloads the raw data for the well-known Thuillier (2003) reference solar irradiance spectrum and converts it to the unique format.

Run a notebook

To generate the dataset(s), run the corresponding notebook(s).

Run from the command line

You can run a notebook from the command line, using the nbconvert library. For example, the whi_2008.ipynb notebook is executed with:

shell jupyter nbconvert --to notebook --execute notebooks/whi_2008.ipynb

Run all notebooks with:

shell jupyter nbconvert --to notebook --execute notebooks/*.ipynb

Write a notebook

Each notebook follows a template defined by notebooks/tempplate.ipynb. A notebook is divided into four sections: * a Setup section: this is where imports are made and global information about the dataset is set * a Download section: this is where the function to download the raw data is implemented * a Format section: this is where the function to format the raw data to the Tengen format is implemented * a Run section: identical to all notebooks, executing the cells in this section will download and format the dataset(s) and save them in temporary files or in the cache depending on the value of UPDATE_CACHE.

To write a new notebook, begin by copying the template and modify it to provide the required information and methods implementation. In case of doubt, take example on existing notebooks.

Before pushing your notebook to the repository, make sure to run nbstripout on it to remove cells outputs:

shell nbstripout notebooks/your_notebook.ipynb

Dataset format and schema

Every notebook produces solar irradiance spectrum datasets with the same unique format and schema, which are described here.

Format

Datasets comply with the netCDF format.

Schema

Variables

The dataset contains one data variable: * the solar spectral irradiance, denoted ssi.

The solar spectral irradiance has two dimensions: * a time dimension, denoted t, * a wavelength dimension, denoted w.

The time dimension refers to the time at which the solar spectral irradiance was observed. Associated to these two dimensions are two coordinate variables, denoted t and w, respectively.

Metadata

Dataset metadata comply with the NetCDF Climate and Forecast (CF) Metadata Conventions.

The following dataset metadata are set:

title: the title of the dataset
institution: the institution where the original data was produced
source: the method of production of the original data
history: the history of transformations that the original data has undergone
references: the publications of web-based references that describe the original data and/or the methods used to produce it
data_url: the URL where the original data has been downloaded from
data_url_datetime: the date and time at which the original data has been downloaded

Traceability

Data traceability means that one is able to track all the transformations that a dataset has undergone from its original form to its current form. Tengen cannot guarantee data traceability but it strives to provide the means to do so. When a notebook is run, the original data is downloaded and converted, i.e., transformed, to the Tengen format.

To preserve the traceability of the data, the following information is stored in the dataset metadata:

that date and time at which the dataset was created, including the corresponding Tengen version (history)
the original data URL (data_url)
the date and time at which the original data was downloaded (data_url_datetime)

The attribute history create a link between the transformed data and the transformation algorithms (this repository) whereas the attributes data_url and data_url_datetime create a link between the original data and the transformed data.

If any one of these two links is broken, the traceability of the data is lost.

Since the existence and accessibility of the original data cannot be guaranteed, data that was downloaded from a URL may not be available anymore at a later date.

This is the reason why Tengen cannot guarantee data traceability. This is also the reason why a cache system is provided.

Cache

A cache is managed that stores the original (raw) and formatted data. By default, running a notebook does not populate the cache. To make it so, modify the following line in the Setup section of a notebbok:

python UPDATE_CACHE = False # change to True to update the cache when running this notebook

and change the value to True as indicated in the comment.

Owner

Name: Rayference
Login: rayference
Kind: organization
Location: Belgium

Repositories: 4
Profile: https://github.com/rayference

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Tengen
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Yvan
    family-names: Nollet
    affiliation: Rayference
    orcid: 'https://orcid.org/0000-0002-6241-444X'
identifiers:
  - type: url
    value: 'https://github.com/nollety/tengen/releases/tag/v23.2.0'
    description: The GitHub release URL of tag v23.2.0
repository-code: 'https://github.com/nollety/tengen'
abstract: Reference solar irradiance spectrum datasets manager.
keywords:
  - solar-spectrum
  - radiative-transfer
  - reference-data
license: MIT
commit: b518e4c2da0b4bb0c0d402efbed4badedc20f91c
version: v23.2.0
date-released: '2023-01-17'

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 87
Total Committers: 2
Avg Commits per committer: 43.5
Development Distribution Score (DDS): 0.011

Past Year

Commits: 29
Committers: 1
Avg Commits per committer: 29.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Yvan Nollet	y**t@r**u	86
dependabot[bot]	4****]	1

Committer Domains (Top 20 + Academic)

rayference.eu: 1

Issues and Pull Requests

Last synced: over 2 years ago

All Time

Total issues: 1
Total pull requests: 99
Average time to close issues: N/A
Average time to close pull requests: about 2 months
Total issue authors: 1
Total pull request authors: 2
Average comments per issue: 0.0
Average comments per pull request: 0.98
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 97

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

nollety (1)

Pull Request Authors

dependabot[bot] (97)
nollety (2)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

dependencies (97) python (78) github_actions (19)

Dependencies

pyproject.toml pypi

Pygments ^2.9.0 develop
black ^21.12b0 develop
coverage ^5.4 develop
darglint ^1.8.0 develop
flake8 ^3.9.2 develop
flake8-bandit ^2.1.2 develop
flake8-bugbear ^21.4.3 develop
flake8-docstrings ^1.6.0 develop
flake8-rst-docstrings ^0.2.3 develop
ipykernel ^5.5.5 develop
jupyterlab ^3.2.4 develop
matplotlib ^3.5.0 develop
mypy ^0.902 develop
pep8-naming ^0.11.1 develop
pre-commit ^2.13.0 develop
pre-commit-hooks ^4.0.1 develop
pytest ^6.2.4 develop
reorder-python-imports ^2.5.0 develop
safety ^1.10.3 develop
sphinx ^4.0.2 develop
sphinx-autobuild ^2021.3.14 develop
sphinx-click ^3.0.1 develop
sphinx-rtd-theme ^0.5.2 develop
typeguard ^2.12.1 develop
xdoctest ^0.15.4 develop
Pint ^0.17
click ^8.0.1
dask ^2021.11.2
h5netcdf ^0.11.0
netCDF4 ^1.5.7
numpy ^1.20.3
pandas ^1.2.4
python ^3.7.1
requests ^2.25.1
xarray ^0.18.2

tengen

Science Score: 44.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Tengen

Aim

Motivation

Install

Usage

Notebooks

Run a notebook

Run from the command line

Write a notebook

Dataset format and schema

Format

Schema

Variables

Metadata

Traceability

Cache

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies