xai_lightningprocesses

Source code belonging to the paper "Identifying Lightning Processes in ERA5 Soundings with Deep Learning"

https://github.com/noxthot/xai_lightningprocesses

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Source code belonging to the paper "Identifying Lightning Processes in ERA5 Soundings with Deep Learning"

Basic Info

Host: GitHub
Owner: noxthot
License: mit
Language: Python
Default Branch: main
Size: 739 KB

Statistics

Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 3

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Identifying Lightning Processes in ERA5 Soundings with Deep Learning

This is the source code belonging to the paper of the same name [1].

Setup

While the code should be platform-independent, it was mainly tested using Ubuntu 20.04 and 22.04.

Paths

Processed data is found in Seafile/mlvapto/. Simply create a symbolic link (Ubuntu) or link (Windows) to that directory and name the link data. Raw data needs to be put into data_raw_aut (Austria) and data_raw_eu (EU).

Python

PDM

Use pdm to install/sync all required modules: bash pdm sync

To run code / IPython / jupyter lab in this environment: ```bash pdm run python

pdm run ipython

pdm run jupyter lab ```

To add a package: bash pdm add <PACKAGE_NAME>

Java

Pyspark needs a specific openjdk version (openjdk-8).

To install in Ubuntu: sudo apt install openjdk-8-jdk

For other systems, follow this link.

Hadoop

Our code uses Spark which takes use of hadoop. The code also works without having a local hadoop installation, but it prints a warning. To install hadoop under Ubuntu, simply download the latest stable hadoop version here. Unpack the archive and add this line to your ~/.bashrc: export HADOOP_HOME="/path/to/your/unpacked/hadoop-x.x.x"

Usage

The provided code is quite memory-consuming and was executed on a workstation with 32GB RAM.

Data

How to retrieve the data is described in data-preprocessing/README_preprocessing.md.

Runnable Files (Neural Network)

The order of the following list defines the order in which the scripts should be run. - etl.py: This data pipeline transforms the raw data (see previous subsection) into the format that is required for training, testing and analysing with the following files. - train.py: Trains a neural network on the transformed data. - test.py: Evaluates the performance of the trained neural network on previously unseen test data. This file was used to compute the corresponding confusion matrix in Table 3. - test_shap.py: Computes the shapley values using the trained model on the test data. - validation_scores.py: Computes the classification threshold such that the diurnal cycle is least biased on the validation data. - analyse_shap_and_features.ipynb: Visualizes the shap and real values of the vertical profiles distinguishing between true positives, false positives, false negatives, aswell as providing some plots regarding cloud top and bottom height. This file was used to generate figures 1, 2, 3 and 4 of the paper. - flash_case_study_final.ipynb: Visualizes network classifications at a specific time on a map of austria. This file generates figure 5 of the paper.

Runnable Files (Reference model):

reference_model.R: Trains the reference model.
reference_valpred.R: Stores the model output on the validation data (used for calculating the classification threshold later on).
reference_test.py: Evaluates the trained reference model on previously unseen test data. This file was used to compute the corresponding confusion matrix in Table 3.

Helper files

ccc.py: Defining some global constants.
stats.py and stats_flash.py: Used to calculate some of the constants defined in ccc.py.
utils*.py: Helper functions containing various routines.

References

[1] Ehrensperger, G., Simon, T., Mayr, G. J., and Hell, T.: Identifying lightning processes in ERA5 soundings with deep learning, Geosci. Model Dev., 18, 1141–1153, https://doi.org/10.5194/gmd-18-1141-2025, 2025.

Owner

Name: GregorE
Login: noxthot
Kind: user
Location: Axams, Tirol

Website: https://ehrensperger.dev
Repositories: 2
Profile: https://github.com/noxthot

Loves to do full stack development in Julia and C++, script in bash and Python. Keen about data wrangling and science. Promoting DevOps culture.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ehrensperger"
  given-names: "Gregor"
  orcid: "https://orcid.org/0000-0003-4816-0233"
- family-names: "Hell"
  given-names: "Tobias"
  orcid: "https://orcid.org/0000-0002-2841-3670"
- family-names: "Mayr"
  given-names: "Georg"
  orcid: "https://orcid.org/0000-0001-6661-9453"
- family-names: "Simon"
  given-names: "Thorsten"
  orcid: "https://orcid.org/0000-0002-3778-7738"
title: "xai_lightningprocesses"
version: 1.0
doi: 10.5281/zenodo.7321880
date-released: 2022-11-15
url: "https://github.com/noxthot/xai_lightningprocesses"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

xai_lightningprocesses

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Identifying Lightning Processes in ERA5 Soundings with Deep Learning

Setup

Paths

Python

PDM

Java

Hadoop

Usage

Data

Runnable Files (Neural Network)

Runnable Files (Reference model):

Helper files

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year