saqc
This is a read-only mirror, comments, issues and pull requests are very welcome on https://git.ufz.de/rdm-software/saqc.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: sciencedirect.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (20.3%) to scientific vocabulary
Keywords
Repository
This is a read-only mirror, comments, issues and pull requests are very welcome on https://git.ufz.de/rdm-software/saqc.
Basic Info
- Host: GitHub
- Owner: Helmholtz-UFZ
- License: other
- Language: Python
- Default Branch: develop
- Homepage: https://git.ufz.de/rdm-software/saqc
- Size: 23.8 MB
Statistics
- Stars: 9
- Watchers: 5
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
SaQC: System for automated Quality Control
SaQC is a tool/framework/application to quality control time series data.
It provides
a growing collection of algorithms and methods to analyze, annotate and
process timeseries data. It supports the end to end enrichment of metadata
and provides various user interfaces: 1) a Python API, 2) a command line interface
with a text based configuration system and a
web based user interface
SaQC is designed with a particular focus on the needs of active data professionals,
including sensor hardware-oriented engineers, domain experts, and data scientists,
all of whom can benefit from its capabilities to improve the quality standards of given data products.
For a (continously improving) overview of features, typical usage patterns,
the specific system components and how to customize SaQC to your own
needs, please refer to our
online documentation.
Installation
SaQC is available on the Python Package Index (PyPI) and
can be installed using pip:
sh
python -m pip install saqc
Additionally SaQC is available via conda and can be installed with:
sh
conda create -c conda-forge -n saqc saqc
For more details, see the installation guide.
Usage
SaQC is both, a command line application controlled by a text based configuration
and a python module with a simple API.
SaQC as a command line application
The command line application is controlled by a semicolon-separated text file listing the variables in the dataset and the routines to inspect, quality control and/or process them. The content of such a configuration could look like this:
``` varname ; test
----------; ---------------------------------------------------------------------
SM2 ; align(freq="15Min") 'SM(1|2)+' ; flagMissing() SM1 ; flagRange(min=10, max=60) SM2 ; flagRange(min=10, max=40) SM2 ; flagZScore(window="30d", thresh=3.5, method='modified', center=False) Dummy ; flagGeneric(field=["SM1", "SM2"], func=(isflagged(x) | isflagged(y))) ```
As soon as the basic inputs, dataset and configuration file, are
prepared, run SaQC:
sh
saqc \
--config PATH_TO_CONFIGURATION \
--data PATH_TO_DATA \
--outfile PATH_TO_OUTPUT
A full SaQC run against provided example data can be invoked with:
sh
saqc \
--config https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/config.csv \
--data https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/data.csv \
--outfile saqc_test.csv
SaQC as a python module
The following snippet implements the same configuration given above through the Python-API:
```python import pandas as pd from saqc import SaQC
data = pd.readcsv( "https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/data.csv", indexcol=0, parse_dates=True, )
qc = SaQC(data=data) qc = (qc .align("SM2", freq="15Min") .flagMissing("SM(1|2)+", regex=True) .flagRange("SM1", min=10, max=60) .flagRange("SM2", min=10, max=40) .flagZScore("SM2", window="30d", thresh=3.5, method='modified', center=False) .flagGeneric(field=["SM1", "SM2"], target="Dummy", func=lambda x, y: (isflagged(x) | isflagged(y)))) ```
A more detailed description of the Python API is available in the respective section of the documentation.
Get involved
Contributing
You found a bug or you want to suggest new features? Please refer to our contributing guidelines to see how you can contribute to SaQC.
User support
If you need help or have questions, send us an email to saqc-support@ufz.de
Copyright and License
Copyright(c) 2021, Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ. All rights reserved.
- Documentation: Creative Commons Attribution 4.0 International

- Source code: GNU General Public License 3
For full details, see LICENSE.
Publications
Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)
How to cite SaQC
If SaQC is advancing your research, please cite as:
Schäfer, David, Palm, Bert, Lünenschloß, Peter, Schmidt, Lennart, & Bumberger, Jan. (2023). System for automated Quality Control - SaQC (2.3.0). Zenodo. https://doi.org/10.5281/zenodo.5888547
or
Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)
Owner
- Name: Helmholtz Centre for Environmental Research – UFZ
- Login: Helmholtz-UFZ
- Kind: organization
- Location: Leipzig, Germany
- Website: https://www.ufz.de/
- Repositories: 23
- Profile: https://github.com/Helmholtz-UFZ
We conduct research to support a sustainable use of our natural resources to benefit both mankind and the environment.
Citation (CITATION.cff)
cff-version: 1.2.0
title: SaQC - System for automated Quality Control
message: "Please cite this software using these metadata."
type: software
version: 2.0.0
doi: 10.5281/zenodo.5888547
date-released: "2021-11-25"
license: "GPL-3.0"
repository-code: "https://git.ufz.de/rdm-software/saqc"
keywords:
- time series data
- environmental sensor data
authors:
- given-names: David
family-names: Schäfer
email: david.schaefer@ufz.de
affiliation: >-
Helmholtz Centre for Environmental Research -
UFZ
orcid: 'https://orcid.org/0000-0003-4517-6459'
- given-names: Bert
family-names: Palm
email: bert.palm@ufz.de
affiliation: >-
Helmholtz Centre for Environmental Research -
UFZ
orcid: 'https://orcid.org/0000-0001-5106-9057'
- given-names: Peter
family-names: Lünenschloß
email: peter.luenenschloss@ufz.de
affiliation: >-
Helmholtz Centre for Environmental Research -
UFZ
orcid: 'https://orcid.org/0000-0000-0000-0000'
GitHub Events
Total
- Push event: 7
Last Year
- Push event: 7
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 1
- Total pull requests: 14
- Average time to close issues: 7 days
- Average time to close pull requests: 9 months
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 1.0
- Average comments per pull request: 0.71
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 11
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- AlexandreDecan (1)
Pull Request Authors
- dependabot[bot] (11)
- palmb (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- conda-incubator/setup-miniconda v2 composite
- jupyter-sphinx ==0.3.2
- m2r ==0.2.1
- recommonmark ==0.7.1
- sphinx <6
- sphinx-automodapi ==0.14.1
- sphinx-markdown-tables ==0.0.17
- sphinx-tabs ==3.4.1
- sphinx_autodoc_typehints ==1.18.2
- sphinxcontrib-fulltoc ==1.2.0
- Click ==8.1.3
- dtw ==1.4.0
- hypothesis ==6.55.0
- matplotlib ==3.5.3
- numba ==0.56.3
- numpy ==1.21.6
- outlier-utils ==0.0.3
- pandas ==1.3.5
- pyarrow ==9.0.0
- scikit-learn ==1.0.2
- scipy ==1.7.3
- typing_extensions ==4.3.0
- Click *
- dtw *
- matplotlib >=3.4
- numba *
- numpy *
- outlier-utils *
- pandas >=1.2,<1.5
- pyarrow *
- scikit-learn *
- scipy *
- typing_extensions *
- Markdown ==3.3.7 test
- beautifulsoup4 ==4.11.1 test
- pytest ==7.1.3 test
- pytest-lazy-fixture ==0.6.3 test
- requests ==2.27.1 test