epylabel
This repository contains the code for the manuscript Ensemble-labeling of infectious diseases time series to evaluate early warning systems with which you can reproduce the manuscript's results and figures.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Repository
This repository contains the code for the manuscript Ensemble-labeling of infectious diseases time series to evaluate early warning systems with which you can reproduce the manuscript's results and figures.
Basic Info
- Host: GitHub
- Owner: robert-koch-institut
- License: mit
- Language: Python
- Default Branch: main
- Size: 4.84 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
Readme.md
Documentation
Epylabel: Ensemble-labeling of infectious diseases time series
Andreas Hicketier¹, Moritz Bach¹, Philip Oedi¹, Alexander Ullrich¹, & Auss Abbood²
¹ Robert Koch-Institut | Unit 32
² Robert Koch-Institut | ZIG 1
Cite
Hicketier, A., Bach, M., Oedi, P., Ullrich, A., & Abbood, A. (2024). Epylabel: Ensemble-labeling of infectious diseases time series. Zenodo. https://doi.org/10.5281/zenodo.12665040
Abstract
This repository contains the code for the manuscript "Ensemble-labeling of Infectious Diseases Time Series to Evaluate Early Warning Systems" (Epylabel), with which the manuscript's results and figures can be reproduced. Developed at the Robert Koch Institute within the DAKI-FWS project, this Python/R-based tool combines several individual labeling techniques through a majority-voting ensemble to detect diverse outbreak patterns across varying spatial resolutions. The resulting labels were used to benchmark machine learning models and compare them with traditional outbreak detection methods.
Table of Content <!-- TOCSTART: {"headingdepth": 2} --> - Project Information - Installation - Running the Code - Code - Data - Collaborate - Publication platforms - License <!-- TOC_END -->
<!-- HEADER_END -->
This repository contains the code for the manuscript Ensemble-labeling of infectious diseases time series to evaluate early warning systems with which you can reproduce the manuscript's results and figures.
Project Information
This code was developed at the Robert Koch Institute as part of the project Daten- und KI-gestütztes Frühwarnsystem zur Stabilisierung der deutschen Wirtschaft funded by the Federal Ministry for Economic Affairs and Climate Action. The project launched 1st December 2021 and ends on 30th November 2024. Together with over a dozen research and industry partners, we work on preventing economic loss as seen during the COVID-19 pandemic with the help of early warning systems. These are not limited to infectious diseases but within a work package on early warning for infectious diseases, this code was developed. For more information on the project, visit the DAKI-FWS Website and the Webiste Digitale-Technologien of the German Federal Ministry for Economic Affairs and Climate Action.
Administrative and organizational information
This work was conducted by staff from Unit 32 | Surveillance with technical supervision by Alexander Ullrich and Auss Abbood from ZIG 1 | Information Centre for International Health Protection (INIG). The publication of the code as well as the quality management of the metadata is done by department MF 4 | Domain Specific Data and Research Data Management. Questions regarding data management and the publication infrastructure can be directed to the Open Data Team of the Department MF4 at OpenData@rki.de.
Motivation
Early warnings systems (EWS) can help make informed public health decisions. Depending on the EWS, various evaluation strategies exist such as simulating data with outbreaks or using expert-labeled data. In the absence of ground truth knowledge about outbreaks, we can use post-hoc labeling methods. While these perform well for a selection of well-behaved disease time series, they do not perform as well on heterogeneous COVID-19 time series. To address this gap for evaluation, we propose an adaptive labeling method that produces useful labels on highly heterogeneous, non-stationary COVID-19 time series.
This repository allows you to use our self-developed ensemble labeling method. It helps detect various outbreak types like waves or short peaks as occurring on different spatial resolutions and uses a majority vote to assign outbreak labels post-hoc for evaluation of EWSs. This repository also contains evaluation experiments where our self-produced labels were used to train machine learning models, which we compared with traditional outbreak detection methods.
Installation
Our scripts make use of Python and R. Please make sure you have both programming languages installed. We also encourage users to use conda as an environment management tool for this repo. After installing Anaconda or Miniconda, run the following commands in a properly configured shell:
commandline
conda env create -f environment.yml
conda activate epylabel
Running the Code
Warning: This repo uses rpy2, a Python library that enables running R code and libraries in Python. As of now, this library is not supported for Windows and this repo may not work for you if you use Windows.
Reproduce Labels
To reproduce the labels presented in the manuscript run python paper_labels.py after the appropriate conda environment has been activated. Note, you need to navigate to the folder containing this script for it to work.
Generate Figures
You can also reproduce the figures from the manuscript using python paper_plots.py
Generating Docs
You can build the docs with Sphinx:
commandline
sphinx-build -b html docs/source/ docs/build/
Code
This repo is using a pipeline approach to compose the ensemble of labeling methods. Each labeling method inherits from the abstract class Transformation (see labeler.py). Theses Classes need to implement the transform() method that either return labels or transformed data.
The Pipeline class allows you to execute transform operations of various labeling methods successively.
Lastly, the Ensemble class implements the routine for the majority vote of each single labeling method in the ensemble. The code can be extended to use more labeling methods. Each method would only need to inherit from Transformation.
If another ensemble voting mechanism is desired, a new Ensemble class can be implemented where you specify your voting approach in the transform() method. This way, our code is open to new implementations and variations.
Below, you can find a shortened and commented version of paper_labels.py to illustrate how generating labels with our ensemble approach works.
```python import pandas as pd
from epylabel.labeler import (Bcp,Changerate,Ensemble,Shapelet,WaveFinder) from epylabel.pipeline import Pipeline from paper_labels import StandardForm
Instatiate single labeling methods with adequate parameters
cr = Changerate() bcp = Bcp() wv = WaveFinder() sp = Shapelet()
Instatiate ensemble
ens = Ensemble(n_min=2)
Download RKI COVID-19 data
datarkiurl = ( "https://raw.githubusercontent.com/robert-koch-institut/" "COVID-197-Tage-InzidenzinDeutschland/main/" "COVID-19-Faelle7-Tage-InzidenzDeutschland.csv" ) datarki = pd.readcsv(datarki_url)
Rearrange data
datawide = Pipeline([StandardForm()]).transform(datarki) datawidefaelle = Pipeline( [ StandardForm("Faelleneu"), ] ).transform(datarki)
Label data with single labeling methods
bcplabels = Pipeline( [ cr, bcp, ] ).transform(datawidefaelle) splabels = Pipeline([sp]).transform(datawide) wvlabels = Pipeline([wv]).transform(data_wide)
Combine labeling methods in ensemble
bcpspwvlabels = Pipeline([ens]).transform(bcplabels, splabels, wvlabels)
```
Data
The code in this repository depends on reported COVID-19 cases in Germany. The main function paper_labels.py, which is more closely explained in the next section, downloads data from the Robert Koch Institute's Open Data Repository on GitHub for which it then produces the labels as described in the manuscript.
There are three datasets that will be downloaded to build timeseries of newly reported cases. New cases are in the CSV's column Faelle_neu. Region identifiers which are named Bundesland_id for federal countries and Landkreis_id for counties, are renamed to location by the script. The reporting date Meldedatum is renamed to target and the case numbers to value. Without a regional stratification, i.e., timeseries for Germany only, the column location gets the value 0. Age stratification of the data is ignored.
The repository is using the latest data form the RKI "7-Tage-Inzidenz der COVID-19-Fälle in Deutschland" dataset provided on Github:
https://github.com/robert-koch-institut/COVID-19_7-Tage-Inzidenz_in_Deutschland
All versions of the currently daily updated data, are also published on Zenodo.org:
Robert Koch-Institut (2024): 7-Tage-Inzidenz der COVID-19-Fälle in Deutschland, Berlin: Zenodo. DOI: 10.5281/zenodo.7129007
| Description | URL | | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | COVID-19 cases in Germany per county | https://raw.githubusercontent.com/robert-koch-institut/COVID-19_7-Tage-Inzidenz_in_Deutschland/main/COVID-19-Faelle_7-Tage-Inzidenz_Landkreise.csv | | COVID-19 cases in Germany per federal state | https://raw.githubusercontent.com/robert-koch-institut/COVID-19_7-Tage-Inzidenz_in_Deutschland/main/COVID-19-Faelle_7-Tage-Inzidenz_Bundeslaender.csv | | COVID-19 cases in Germany without startification | https://raw.githubusercontent.com/robert-koch-institut/COVID-19_7-Tage-Inzidenz_in_Deutschland/main/COVID-19-Faelle_7-Tage-Inzidenz_Deutschland.csv |
After the transformation, the data has the following structure:
| Column | Datatype | Description | | - | --------------------- |--------------------------------------------------- | | value | integer | Number of reported COVID-19 cases | | target | string | Reporting date (yyyy-mm-dd) | | location | string |The five-digit community identification code for counties, two-digit code for federal countries, and a 0 for the whole of Germany |
Formatting
Data is downloaded as a comma-separated .csv file. The character encoding is UTF-8. Values are separated by ",".
Collaborate
If you want to participate in our project, feel free to fork this repo and send us pull requests. To make sure everything is working please use pre-commit. It will run a few tests and lints before a commit can be made. To install pre-commit, run
pre-commit install
Publication platforms
This software publication is available on Zenodo.org, GitHub.com and OpenCoDE:
- https://zenodo.org/communities/robertkochinstitut
- https://github.com/robert-koch-institut
- https://gitlab.opencode.de/robert-koch-institut
License
Epylabel: Ensemble-labeling of infectious diseases time series is free and open-source software, published under the terms of the MIT license. <!-- FOOTER_END -->
Owner
- Name: Robert Koch-Institut
- Login: robert-koch-institut
- Kind: organization
- Location: Berlin
- Website: http://www.rki.de
- Twitter: rki_de
- Repositories: 16
- Profile: https://github.com/robert-koch-institut
Das RKI ist die zentrale Einrichtung der deutschen Bundesregierung auf dem Gebiet der Krankheitsüberwachung und -prävention.
Citation (citation.cff)
cff-version: 1.2.0
type: software
title: 'Epylabel: Ensemble-labeling of infectious diseases time series'
abstract: >-
This repository contains the code for the manuscript Ensemble-labeling of
infectious diseases time series to evaluate early warning systems with which
you can reproduce the manuscript's results and figures.
date-released: '2024-07-19'
keywords:
- COVID-19
- SARS-CoV-2
- Inzidenz
- Incidence
- 7-Tage-Inzidenz
- Infections
- Infektion
- Gesundheitsberichterstattung
- Public health surveillance
- Epidemiologie
- Epidemiology
- Germany
- Open Data
- Open Source
- Python
- R
- RKI
message: Cite me!
url: https://robert-koch-institut.github.io/epylabel
license: MIT
doi: 10.5281/zenodo.12665040
version: '1.0'
authors:
- family-names: Hicketier
given-names: Andreas
affiliation: Robert Koch-Institut
orcid: 0009-0000-5882-852X
email: hicketiera@rki.de
- family-names: Bach
given-names: Moritz
affiliation: Robert Koch-Institut
orcid: 0009-0003-3062-0585
- family-names: Oedi
given-names: Philip
affiliation: Robert Koch-Institut
orcid: 0009-0001-7112-505X
- family-names: Ullrich
given-names: Alexander
affiliation: Robert Koch-Institut
orcid: 0000-0002-4894-6124
- family-names: Abbood
given-names: Auss
affiliation: Robert Koch-Institut
orcid: 0000-0002-4428-168X
GitHub Events
Total
- Watch event: 1
- Push event: 2
- Pull request event: 2
Last Year
- Watch event: 1
- Push event: 2
- Pull request event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 17 minutes
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 17 minutes
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- RKIOpenData (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- robert-koch-institut/OpenData-Website main composite
- actions/checkout v4 composite
- robert-koch-institut/OpenData-Workflows/Create_release_on_tag_push main composite
- robert-koch-institut/OpenData-Workflows/Sync_OpenData_repo_to_OpenCoDE main composite