saqc

This is a read-only mirror, comments, issues and pull requests are very welcome on https://git.ufz.de/rdm-software/saqc.

https://github.com/helmholtz-ufz/saqc

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (20.3%) to scientific vocabulary

Keywords

python quality-control
Last synced: 6 months ago · JSON representation ·

Repository

This is a read-only mirror, comments, issues and pull requests are very welcome on https://git.ufz.de/rdm-software/saqc.

Basic Info
Statistics
  • Stars: 9
  • Watchers: 5
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
python quality-control
Created about 6 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md



Project Status: Active – The project has reached a stable, usable state and is being actively developed.

SaQC: System for automated Quality Control

SaQC is a tool/framework/application to quality control time series data. It provides a growing collection of algorithms and methods to analyze, annotate and process timeseries data. It supports the end to end enrichment of metadata and provides various user interfaces: 1) a Python API, 2) a command line interface with a text based configuration system and a web based user interface

SaQC is designed with a particular focus on the needs of active data professionals, including sensor hardware-oriented engineers, domain experts, and data scientists, all of whom can benefit from its capabilities to improve the quality standards of given data products.

For a (continously improving) overview of features, typical usage patterns, the specific system components and how to customize SaQC to your own needs, please refer to our online documentation.

Installation

SaQC is available on the Python Package Index (PyPI) and can be installed using pip: sh python -m pip install saqc Additionally SaQC is available via conda and can be installed with:

sh conda create -c conda-forge -n saqc saqc

For more details, see the installation guide.

Usage

SaQC is both, a command line application controlled by a text based configuration and a python module with a simple API.

SaQC as a command line application

The command line application is controlled by a semicolon-separated text file listing the variables in the dataset and the routines to inspect, quality control and/or process them. The content of such a configuration could look like this:

``` varname ; test

----------; ---------------------------------------------------------------------

SM2 ; align(freq="15Min") 'SM(1|2)+' ; flagMissing() SM1 ; flagRange(min=10, max=60) SM2 ; flagRange(min=10, max=40) SM2 ; flagZScore(window="30d", thresh=3.5, method='modified', center=False) Dummy ; flagGeneric(field=["SM1", "SM2"], func=(isflagged(x) | isflagged(y))) ```

As soon as the basic inputs, dataset and configuration file, are prepared, run SaQC: sh saqc \ --config PATH_TO_CONFIGURATION \ --data PATH_TO_DATA \ --outfile PATH_TO_OUTPUT

A full SaQC run against provided example data can be invoked with: sh saqc \ --config https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/config.csv \ --data https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/data.csv \ --outfile saqc_test.csv

SaQC as a python module

The following snippet implements the same configuration given above through the Python-API:

```python import pandas as pd from saqc import SaQC

data = pd.readcsv( "https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/data.csv", indexcol=0, parse_dates=True, )

qc = SaQC(data=data) qc = (qc .align("SM2", freq="15Min") .flagMissing("SM(1|2)+", regex=True) .flagRange("SM1", min=10, max=60) .flagRange("SM2", min=10, max=40) .flagZScore("SM2", window="30d", thresh=3.5, method='modified', center=False) .flagGeneric(field=["SM1", "SM2"], target="Dummy", func=lambda x, y: (isflagged(x) | isflagged(y)))) ```

A more detailed description of the Python API is available in the respective section of the documentation.

Get involved

Contributing

You found a bug or you want to suggest new features? Please refer to our contributing guidelines to see how you can contribute to SaQC.

User support

If you need help or have questions, send us an email to saqc-support@ufz.de

Copyright and License

Copyright(c) 2021, Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ. All rights reserved.

For full details, see LICENSE.

Publications

Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)

How to cite SaQC

If SaQC is advancing your research, please cite as:

Schäfer, David, Palm, Bert, Lünenschloß, Peter, Schmidt, Lennart, & Bumberger, Jan. (2023). System for automated Quality Control - SaQC (2.3.0). Zenodo. https://doi.org/10.5281/zenodo.5888547

or

Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)


Owner

  • Name: Helmholtz Centre for Environmental Research – UFZ
  • Login: Helmholtz-UFZ
  • Kind: organization
  • Location: Leipzig, Germany

We conduct research to support a sustainable use of our natural resources to benefit both mankind and the environment.

Citation (CITATION.cff)

cff-version: 1.2.0
title: SaQC - System for automated Quality Control
message: "Please cite this software using these metadata."
type: software
version: 2.0.0
doi: 10.5281/zenodo.5888547
date-released: "2021-11-25"
license: "GPL-3.0"
repository-code: "https://git.ufz.de/rdm-software/saqc"
keywords:
  - time series data
  - environmental sensor data
authors:
  - given-names: David
    family-names: Schäfer
    email: david.schaefer@ufz.de
    affiliation: >-
      Helmholtz Centre for Environmental Research -
      UFZ
    orcid: 'https://orcid.org/0000-0003-4517-6459'
  - given-names: Bert
    family-names: Palm
    email: bert.palm@ufz.de
    affiliation: >-
      Helmholtz Centre for Environmental Research -
      UFZ
    orcid: 'https://orcid.org/0000-0001-5106-9057'
  - given-names: Peter
    family-names: Lünenschloß
    email: peter.luenenschloss@ufz.de
    affiliation: >-
      Helmholtz Centre for Environmental Research -
      UFZ
    orcid: 'https://orcid.org/0000-0000-0000-0000'

GitHub Events

Total
  • Push event: 7
Last Year
  • Push event: 7

Issues and Pull Requests

Last synced: almost 2 years ago

All Time
  • Total issues: 1
  • Total pull requests: 14
  • Average time to close issues: 7 days
  • Average time to close pull requests: 9 months
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.71
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 11
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • AlexandreDecan (1)
Pull Request Authors
  • dependabot[bot] (11)
  • palmb (3)
Top Labels
Issue Labels
Pull Request Labels
dependencies (11)

Dependencies

.github/workflows/main.yml actions
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
docs/requirements.txt pypi
  • jupyter-sphinx ==0.3.2
  • m2r ==0.2.1
  • recommonmark ==0.7.1
  • sphinx <6
  • sphinx-automodapi ==0.14.1
  • sphinx-markdown-tables ==0.0.17
  • sphinx-tabs ==3.4.1
  • sphinx_autodoc_typehints ==1.18.2
  • sphinxcontrib-fulltoc ==1.2.0
requirements.txt pypi
  • Click ==8.1.3
  • dtw ==1.4.0
  • hypothesis ==6.55.0
  • matplotlib ==3.5.3
  • numba ==0.56.3
  • numpy ==1.21.6
  • outlier-utils ==0.0.3
  • pandas ==1.3.5
  • pyarrow ==9.0.0
  • scikit-learn ==1.0.2
  • scipy ==1.7.3
  • typing_extensions ==4.3.0
setup.py pypi
  • Click *
  • dtw *
  • matplotlib >=3.4
  • numba *
  • numpy *
  • outlier-utils *
  • pandas >=1.2,<1.5
  • pyarrow *
  • scikit-learn *
  • scipy *
  • typing_extensions *
tests/requirements.txt pypi
  • Markdown ==3.3.7 test
  • beautifulsoup4 ==4.11.1 test
  • pytest ==7.1.3 test
  • pytest-lazy-fixture ==0.6.3 test
  • requests ==2.27.1 test
pyproject.toml pypi