Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Reference solar irradiance spectrum datasets manager
Basic Info
Statistics
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 1
- Releases: 2
Topics
Metadata Files
README.md
Tengen
Reference solar irradiance spectrum datasets manager.
Aim
A number of reference solar irradiance spectra has been made available. However, their original format are various and often non-standard. The aim of this repository is to gather all of these reference solar irradiance spectra at the same place under a unique and well-defined standard format and in a manner that supports data traceability.
Motivation
The need to organise and manage reference solar irradiance spectrum datasets originated in the development of the Eradiate radiative transfer model. Such a radiative transfer model takes a solar irradiance spectrum as input to a radiative transfer simulation. The radiative transfer model usually does not work directly with the original data but instead stores the corresponding data under a specific format. To convert the data to the specific format, the original data is transformed. This comes with two challenges:
- the transformation algorithm must not introduce any error
- the data traceability must be preserved
which Tengen aims to address.
Install
After cloning the repository and navigating to the root directory, install the project with uv:
shell
uv sync
Usage
Simply run the notebooks you are interested in.
Notebooks
The work of downloading and converting raw data for each solar irradiance
spectrum to a unique format is stored in Jupyter
notebooks, under notebooks/.
The idea is to have one notebook per solar irradiance spectrum, or per group
of spectra if the latter somehow come together, e.g. different observation
time periods or different spectral resolutions associated to the same
observation data.
For example, the thuillier_2003.ipynb notebook downloads the raw data for
the well-known Thuillier (2003) reference solar irradiance spectrum and
converts it to the unique format.
Run a notebook
To generate the dataset(s), run the corresponding notebook(s).
Run from the command line
You can run a notebook from the command line, using the nbconvert library.
For example, the whi_2008.ipynb notebook is executed with:
shell
jupyter nbconvert --to notebook --execute notebooks/whi_2008.ipynb
Run all notebooks with:
shell
jupyter nbconvert --to notebook --execute notebooks/*.ipynb
Write a notebook
Each notebook follows a template defined by notebooks/tempplate.ipynb. A notebook is divided into four sections:
* a Setup section: this is where imports are made and global information about the dataset is set
* a Download section: this is where the function to download the raw data is implemented
* a Format section: this is where the function to format the raw data to the Tengen format is implemented
* a Run section: identical to all notebooks, executing the cells in this section will download and format the dataset(s) and save them in temporary files or in the cache depending on the value of UPDATE_CACHE.
To write a new notebook, begin by copying the template and modify it to provide the required information and methods implementation. In case of doubt, take example on existing notebooks.
Before pushing your notebook to the repository, make sure to run nbstripout
on it to remove cells outputs:
shell
nbstripout notebooks/your_notebook.ipynb
Dataset format and schema
Every notebook produces solar irradiance spectrum datasets with the same unique format and schema, which are described here.
Format
Datasets comply with the netCDF format.
Schema
Variables
The dataset contains one data variable:
* the solar spectral irradiance, denoted ssi.
The solar spectral irradiance has two dimensions:
* a time dimension, denoted t,
* a wavelength dimension, denoted w.
The time dimension refers to the time at which the solar spectral irradiance was observed.
Associated to these two dimensions are two coordinate variables, denoted t and w, respectively.
| Symbol | Long name | Standard name | Units |
| :----: | :-------------------------: | :------------------------------------: | :-------------: |
| ssi | solar spectral irradiance | solar_irradiance_per_unit_wavelength | W / m **2/ nm |
| w | wavelength | radiation_wavelength | nm |
| t | time | time | days |
Metadata
Dataset metadata comply with the NetCDF Climate and Forecast (CF) Metadata Conventions.
The following dataset metadata are set:
title: the title of the datasetinstitution: the institution where the original data was producedsource: the method of production of the original datahistory: the history of transformations that the original data has undergonereferences: the publications of web-based references that describe the original data and/or the methods used to produce itdata_url: the URL where the original data has been downloaded fromdata_url_datetime: the date and time at which the original data has been downloaded
Traceability
Data traceability means that one is able to track all the transformations that a dataset has undergone from its original form to its current form. Tengen cannot guarantee data traceability but it strives to provide the means to do so. When a notebook is run, the original data is downloaded and converted, i.e., transformed, to the Tengen format.

To preserve the traceability of the data, the following information is stored in the dataset metadata:
- that date and time at which the dataset was created, including the
corresponding Tengen version (
history) - the original data URL (
data_url) - the date and time at which the original data was downloaded (
data_url_datetime)
The attribute history create a link between the transformed data and the
transformation algorithms (this repository) whereas the attributes data_url
and data_url_datetime create a link between the original data and the
transformed data.

If any one of these two links is broken, the traceability of the data is lost.
Since the existence and accessibility of the original data cannot be guaranteed, data that was downloaded from a URL may not be available anymore at a later date.

This is the reason why Tengen cannot guarantee data traceability. This is also the reason why a cache system is provided.
Cache
A cache is managed that stores the original (raw) and formatted data. By default, running a notebook does not populate the cache. To make it so, modify the following line in the Setup section of a notebbok:
python
UPDATE_CACHE = False # change to True to update the cache when running this notebook
and change the value to True as indicated in the comment.
Owner
- Name: Rayference
- Login: rayference
- Kind: organization
- Location: Belgium
- Repositories: 4
- Profile: https://github.com/rayference
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Tengen
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Yvan
family-names: Nollet
affiliation: Rayference
orcid: 'https://orcid.org/0000-0002-6241-444X'
identifiers:
- type: url
value: 'https://github.com/nollety/tengen/releases/tag/v23.2.0'
description: The GitHub release URL of tag v23.2.0
repository-code: 'https://github.com/nollety/tengen'
abstract: Reference solar irradiance spectrum datasets manager.
keywords:
- solar-spectrum
- radiative-transfer
- reference-data
license: MIT
commit: b518e4c2da0b4bb0c0d402efbed4badedc20f91c
version: v23.2.0
date-released: '2023-01-17'
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yvan Nollet | y****t@r****u | 86 |
| dependabot[bot] | 4****] | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 1
- Total pull requests: 99
- Average time to close issues: N/A
- Average time to close pull requests: about 2 months
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.98
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 97
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- nollety (1)
Pull Request Authors
- dependabot[bot] (97)
- nollety (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Pygments ^2.9.0 develop
- black ^21.12b0 develop
- coverage ^5.4 develop
- darglint ^1.8.0 develop
- flake8 ^3.9.2 develop
- flake8-bandit ^2.1.2 develop
- flake8-bugbear ^21.4.3 develop
- flake8-docstrings ^1.6.0 develop
- flake8-rst-docstrings ^0.2.3 develop
- ipykernel ^5.5.5 develop
- jupyterlab ^3.2.4 develop
- matplotlib ^3.5.0 develop
- mypy ^0.902 develop
- pep8-naming ^0.11.1 develop
- pre-commit ^2.13.0 develop
- pre-commit-hooks ^4.0.1 develop
- pytest ^6.2.4 develop
- reorder-python-imports ^2.5.0 develop
- safety ^1.10.3 develop
- sphinx ^4.0.2 develop
- sphinx-autobuild ^2021.3.14 develop
- sphinx-click ^3.0.1 develop
- sphinx-rtd-theme ^0.5.2 develop
- typeguard ^2.12.1 develop
- xdoctest ^0.15.4 develop
- Pint ^0.17
- click ^8.0.1
- dask ^2021.11.2
- h5netcdf ^0.11.0
- netCDF4 ^1.5.7
- numpy ^1.20.3
- pandas ^1.2.4
- python ^3.7.1
- requests ^2.25.1
- xarray ^0.18.2