diive

Time series processing library

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: wiley.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

analyses ch4-flux co2-flux data-correction ecosystem-fluxes eddy-covariance eddypro fluxnet gap-filling h2o-flux jupyter-notebooks n2o-flux outlier-detection plotting post-processing quality-screening random-forest-regression time-series time-series-analysis xgboost-regression

Last synced: 6 months ago · JSON representation ·

Repository

Time series processing library

Basic Info

Host: GitHub
Owner: holukas
License: gpl-3.0
Language: Python
Default Branch: main
Homepage: https://www.swissfluxnet.ethz.ch/
Size: 837 MB

Statistics

Stars: 13
Watchers: 2
Forks: 2
Open Issues: 71
Releases: 50

Topics

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog License Citation

README.md

diive is currently under active developement with frequent updates.

Time series data processing

diive is a Python library for time series processing, in particular ecosystem data. Originally developed by the ETH Grassland Sciences group for Swiss FluxNet.

Recent updates: CHANGELOG
Recent releases: Releases

Overview of example notebooks

For many examples see notebooks here: Notebook overview
More notebooks are added constantly.

Current Features

Analyses

Daily correlation: calculate daily correlation between two time series (notebook example)
Decoupling: Investigate binned aggregates (median) of a variable z in binned classes of x and y (notebook example)
Data gaps identification: (notebook example)
Grid aggregator: calculate z-aggregates in bins (classes) of x and y (notebook example)
Histogram calculation: calculate histogram from Series (notebook example)
Optimum range: find x range for optimum y
Percentiles: Calculate percentiles 0-100 for series (notebook example)

Corrections

Offset correction for measurement: correct measurement by offset in comparison to replicate (notebook example)
Offset correction radiation: correct nighttime offset of radiation data and set nighttime to zero
Offset correction relative humidity: correct RH values > 100%
Offset correction wind direction: correct wind directions by offset, calculated based on reference time period (notebook example)
Set to threshold: set values above or below a threshold value to threshold value

Create variable

Functions to create various variables.

Time since: calculate time since last occurrence, e.g. since last precipitation (notebook example)
Daytime/nighttime flag: calculate daytime flag, nighttime flag and potential radiation from latitude and longitude (notebook example)
Vapor pressure deficit: calculate VPD from air temperature and RH (notebook example)
Calculate ET from LE: calculate evapotranspiration from latent heat flux (notebook example)

Eddy covariance high-resolution

Flux detection limit: calculate flux detection limit from high-resolution data (20 Hz)
Maximum covariance: find maximum covariance between turbulent wind and scalar
Turbulence: wind rotation to calculate turbulent departures of wind components and scalar (e.g. CO2)

Files

Input/output functions.

Detect files: detect expected and unexpected (irregular) files in a list of files
Split files: split multiple files into smaller parts and export them as (compressed) CSV files
Read single data files: read file using parameters (notebook example)
Read single data files: read file using pre-defined filetypes (notebook example)
Read multiple data files: read files using pre-defined filetype (notebook example)

Fits

Bin fitter: (notebook example)

Flux

Specific analyses of eddy covariance flux data.

USTAR threshold scenarios: display data availability under different USTAR threshold scenarios

Flux processing chain

Post-processing of eddy covariance flux data. For info about the Swiss FluxNet flux levels, see here.

Flux processing chain (notebook example)
- The notebook example shows the application of:
  - Level-2 quality flags
  - Level-3.1 storage correction
  - Level-3.2 outlier removal
  - Level-3.3: USTAR filtering using constant thresholds
  - Level-4.1: gap-filling using long-term random forest
Quick flux processing chain (notebook example)

Formats

Format data to specific formats.

Format: convert EddyPro fluxnet output files for upload to FLUXNET database (notebook example)
Parquet files: load and save parquet files (notebook example)

Gap-filling

Fill gaps in time series with various methods.

XGBoostTS (notebook example (minimal), notebook example (more extensive))
RandomForestTS (notebook example)
Long-term gap-filling using RandomForestTS (notebook example)
Linear interpolation (notebook example)
Quick random forest gap-filling (notebook example)
MDS gap-filling of ecosystem fluxes (notebook example), approach by Reichstein et al., 2005

Outlier Detection

Multiple tests combined

Step-wise outlier detection: combine multiple outlier flags to one single overall flag

Single tests

Create single outlier flags where 0=OK and 2=outlier.

Absolute limits: define absolute limits (notebook example)
Absolute limits daytime/nighttime: define absolute limits separately for daytime and nighttime data (notebook example)
Hampel filter: based on Median Absolute Deviation (MAD) in a moving window (notebook example)
Hampel filter daytime/nighttime, separately for daytime and nighttime data (notebook example)
Local standard deviation: Identify outliers based on the local standard deviation from a running median (notebook example)
Local outlier factor: Identify outliers based on local outlier factor, across all data (notebook example)
Local outlier factor daytime/nighttime: Identify outliers based on local outlier factor, daytime nighttime separately (notebook example)
Manual removal: Remove time periods (from-to) or single records from time series (notebook example)
Missing values: Simply creates a flag that indicated available and missing data in a time series (notebook example)
Trimming: Remove values below threshold and remove an equal amount of records from high end of data (notebook example)
z-score: Identify outliers based on the z-score across all time series data (notebook example)
z-score increments daytime/nighttime: Identify outliers based on the z-score of double increments (notebook example)
z-score daytime/nighttime: Identify outliers based on the z-score, separately for daytime and nighttime (notebook example)
z-score rolling: Identify outliers based on the rolling z-score (notebook example)

Plotting

Cumulatives across all years for multiple variables (notebook example)
Cumulatives per year (notebook example)
Diel cycle per month (notebook example)
Heatmap date/time: showing values (z) of time series as date (y) vs time (x) (notebook example)
Heatmap year/month: plot monthly ranks across years (notebook example)
Histogram: includes options to show z-score limits and to highlight the peak distribution bin (notebook example)
Long-term anomalies: calculate and plot long-term anomaly for a variable, per year, compared to a reference period. (notebook example)
Ridgeline plot: looks a bit like a landscape (notebook example)
Time series plot: Simple (interactive) time series plot (notebook example)
ScatterXY plot (notebook example)
Various classes to generate heatmaps, bar plots, time series plots and scatter plots, among others

Quality control

Stepwise MeteoScreening from database (notebook example)

Resampling

Diel cycle: calculate diel cycle per month (notebook example)

Stats

Time series stats (notebook example)

Timestamps

Continuous timestamp: create continuous timestamp based on number of records in the file and the file duration
Time resolution: detect time resolution from data (notebook example)
Timestamps: create and insert additional timestamps in various formats

Installation

diive is currently under active developement using Python v3.11.

Using pip

pip install diive

Using poetry

poetry add diive

From source

Directly use .tar.gz file of the desired version.

pip install https://github.com/holukas/diive/archive/refs/tags/v0.76.2.tar.gz

Create and use a conda environment for diive

One way to install and use diive with a specific Python version on a local machine:

Install miniconda
Start miniconda prompt
Create a environment named diive-env that contains Python 3.11: conda create --name diive-env python=3.11
Activate the new environment: conda activate diive-env
Install diive using pip: pip install diive
To start JupyterLab type jupyter lab in the prompt

Owner

Name: Lukas Hörtnagl
Login: holukas
Kind: user
Location: Switzerland
Company: ETH Zurich

Website: https://www.swissfluxnet.ethz.ch/
Repositories: 2
Profile: https://github.com/holukas

Python apps for eddy covariance data analyses. Some repos are (still) on GitLab: https://gitlab.ethz.ch/users/holukas/groups

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Hörtnagl"
  given-names: "Lukas"
  orcid: "https://orcid.org/0000-0002-5569-0761"
title: "diive"
# version: 2.0.4
# doi: 10.5281/zenodo.1234
# date-released: 2017-12-18
url: "https://github.com/holukas/diive"

GitHub Events

Total

Create event: 20
Issues event: 120
Release event: 11
Watch event: 8
Delete event: 13
Issue comment event: 18
Push event: 158
Pull request event: 23

Last Year

Create event: 20
Issues event: 120
Release event: 11
Watch event: 8
Delete event: 13
Issue comment event: 18
Push event: 158
Pull request event: 23

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 196
Total pull requests: 51
Average time to close issues: 18 days
Average time to close pull requests: 5 days
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 0.09
Average comments per pull request: 0.1
Merged pull requests: 35
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 93
Pull requests: 11
Average time to close issues: 16 days
Average time to close pull requests: 6 days
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.1
Average comments per pull request: 0.0
Merged pull requests: 8
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

holukas (200)
luanakrebs (1)
inkenbrandt (1)

Pull Request Authors

holukas (51)
inkenbrandt (4)

Top Labels

Issue Labels

notebook (46) gap-filling (41) outlier-detection (36) fluxprocessingchain (35) quality-screening (23) unittest (21) createvar (18) plot (16) bug (15) vis (14) io (11) enhancement (9) analyses (9) random-forest (7) correction (5) files (5) resampling (3) xgboost (3) transform (2) stats (2) times (2) documentation (2) format (1) database (1) hires (1) python (1) config (1) cleanup (1)

Pull Request Labels

outlier-detection (4) gap-filling (3) fluxprocessingchain (3) notebook (2) analyses (2) unittest (2) stats (1) random-forest (1) enhancement (1) io (1) hires (1) files (1) times (1) bug (1) quality-screening (1)

Packages

Total packages: 1
Total downloads:
- pypi 235 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 38
Total maintainers: 1

pypi.org: diive

Time series processing

Documentation: https://diive.readthedocs.io/
License: GNU General Public License v3.0
Latest release: 0.89.0
published 7 months ago

Versions: 38
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 235 Last month

Rankings

Dependent packages count: 10.0%

Dependent repos count: 21.7%

Average: 30.6%

Downloads: 60.2%

Maintainers (1)

amp335

Last synced: 6 months ago

Dependencies

poetry.lock pypi

163 dependencies

pyproject.toml pypi

bokeh ^3.2.2
eli5 ^0.13.0
fitter ^1.6.0
matplotlib ^3.7.3
pandas ^2.1.0
prophet ^1.1.4
pyarrow ^13.0.0
pymannkendall ^1.4.3
python >=3.9,<3.11
pyyaml ^6.0.1
scikit-learn ^1.3.0
scipy ^1.11.2
seaborn ^0.12.2
shap ^0.42.1
statsmodels ^0.14.0
thymeboost ^0.1.16
uncertainties ^3.1.7
xgboost ^2.0.0
yellowbrick ^1.5