timeseriesflattener

timeseriesflattener: A Python package for summarizing features from (medical) time series - Published in JOSS (2023)

https://github.com/aarhus-psychiatry-research/timeseriesflattener

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

electronic-healthcare-data irregular-time-series machine-learning python python3 time-series-analysis

Keywords from Contributors

dependency-distance descriptive-statistics readability readability-scores spacy spacy-extension syntactic-analysis climate-science dimensionality-reduction pca

Scientific Fields

Mathematics Computer Science - 84% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Converting irregularly spaced time series, such as eletronic health records, into dataframes for tabular classification.

Basic Info
Statistics
  • Stars: 19
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 100
Topics
electronic-healthcare-data irregular-time-series machine-learning python python3 time-series-analysis
Created about 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Zenodo

README.md

Timeseriesflattener

github actions pytest python versions

PyPI version status

Time series from e.g. electronic health records often have a large number of variables, are sampled at irregular intervals and tend to have a large number of missing values. Before this type of data can be used for prediction modelling with machine learning methods such as logistic regression or XGBoost, the data needs to be reshaped.

In essence, the time series need to be flattened so that each prediction time is represented by a set of predictor values and an outcome value. These predictor values can be constructed by aggregating the preceding values in the time series within a certain time window.

timeseriesflattener aims to simplify this process by providing an easy-to-use and fully-specified pipeline for flattening complex time series.

🔧 Installation

To get started using timeseriesflattener simply install it using pip by running the following line in your terminal:

pip install timeseriesflattener

⚡ Quick start

```py import datetime as dt

import numpy as np import polars as pl

Load a dataframe with times you wish to make a prediction

predictiontimesdf = pl.DataFrame( {"id": [1, 1, 2], "date": ["2020-01-01", "2020-02-01", "2020-02-01"]} )

Load a dataframe with raw values you wish to aggregate as predictors

predictordf = pl.DataFrame( { "id": [1, 1, 1, 2], "date": ["2020-01-15", "2019-12-10", "2019-12-15", "2020-01-02"], "predictorvalue": [1, 2, 3, 4], } )

Load a dataframe specifying when the outcome occurs

outcomedf = pl.DataFrame({"id": [1], "date": ["2020-03-01"], "outcomevalue": [1]})

Specify how to aggregate the predictors and define the outcome

from timeseriesflattener import ( MaxAggregator, MinAggregator, OutcomeSpec, PredictionTimeFrame, PredictorSpec, ValueFrame, )

predictorspec = PredictorSpec( valueframe=ValueFrame( initdf=predictordf, entityidcolname="id", valuetimestampcolname="date" ), lookbehinddistances=[dt.timedelta(days=1)], aggregators=[MaxAggregator(), MinAggregator()], fallback=np.nan, columnprefix="pred", )

outcomespec = OutcomeSpec( valueframe=ValueFrame( initdf=outcomedf, entityidcolname="id", valuetimestampcolname="date" ), lookaheaddistances=[dt.timedelta(days=1)], aggregators=[MaxAggregator(), MinAggregator()], fallback=np.nan, columnprefix="outc", )

Instantiate TimeseriesFlattener and add the specifications

from timeseriesflattener import Flattener

result = Flattener( predictiontimeframe=PredictionTimeFrame( initdf=predictiontimesdf, entityidcolname="id", timestampcolname="date" ) ).aggregatetimeseries(specs=[predictorspec, outcomespec]) result.df

```

Output:

| | id | date | predictiontimeuuid | predtestfeaturewithin30daysmeanfallbacknan | outctestoutcomewithin31daysmaximumfallback0_dichotomous | | --: | --: | :------------------ | :-------------------- | -------------------------------------------------: | --------------------------------------------------------------: | | 0 | 1 | 2020-01-01 00:00:00 | 1-2020-01-01-00-00-00 | 2.5 | 0 | | 1 | 1 | 2020-02-01 00:00:00 | 1-2020-02-01-00-00-00 | 1 | 1 | | 2 | 2 | 2020-02-01 00:00:00 | 2-2020-02-01-00-00-00 | 4 | 0 |

📖 Tutorial

💬 Where to ask questions

| Type | | | ------------------------------- | ---------------------- | | 🚨 Bug Reports | GitHub Issue Tracker | | 🎁 Feature Requests & Ideas | GitHub Issue Tracker | | 👩‍💻 Usage Questions | GitHub Discussions | | 🗯 General Discussion | GitHub Discussions |

🎓 Projects

PSYCOP projects use timeseriesflattener, see more at the monorepo.

Owner

  • Name: PSYCOP, Aarhus Psychiatry Research
  • Login: Aarhus-Psychiatry-Research
  • Kind: organization
  • Location: Aarhus, Denmark

PSYchiatric Clinical Outcome Prediction (PSYCOP). Aarhus University Hospital - Psychiatry

JOSS Publication

timeseriesflattener: A Python package for summarizing features from (medical) time series
Published
March 29, 2023
Volume 8, Issue 83, Page 5197
Authors
Martin Bernstorff ORCID
Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark, Center for Humanities Computing, Aarhus University, Aarhus, Denmark
Kenneth Enevoldsen ORCID
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark, Center for Humanities Computing, Aarhus University, Aarhus, Denmark
Jakob Damgaard ORCID
Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark, Psychosis Research Unit, Aarhus University Hospital - Psychiatry, Denmark
Andreas Danielsen ORCID
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark, Psychosis Research Unit, Aarhus University Hospital - Psychiatry, Denmark
Lasse Hansen ORCID
Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark, Center for Humanities Computing, Aarhus University, Aarhus, Denmark
Editor
Marcel Stimberg ORCID
Tags
time series electronic health records medical time series feature extraction

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Bernstorff"
    given-names: "Martin"
    orcid: "https://orcid.org/0000-0002-0234-5390"
  - family-names: "Enevoldsen"
    given-names: "Kenneth"
    orcid: "https://orcid.org/0000-0001-8733-0966"
  - family-names: "Damgaard"
    given-names: "Jakob Grøhn"
    orcid: "https://orcid.org/0000-0001-7092-2391"
  - family-names: "Danielsen"
    given-names: "Andreas"
    orcid: "https://orcid.org/0000-0002-6585-3616"
  - family-names: "Hansen"
    given-names: "Lasse"
    orcid: "https://orcid.org/0000-0003-1113-4779"
message: If you use this software, please cite our article in Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: "Bernstorff"
    given-names: "Martin"
    orcid: "https://orcid.org/0000-0002-0234-5390"
  - family-names: "Enevoldsen"
    given-names: "Kenneth"
    orcid: "https://orcid.org/0000-0001-8733-0966"
  - family-names: "Damgaard"
    given-names: "Jakob Grøhn"
    orcid: "https://orcid.org/0000-0001-7092-2391"
  - family-names: "Danielsen"
    given-names: "Andreas"
    orcid: "https://orcid.org/0000-0002-6585-3616"
  - family-names: "Hansen"
    given-names: "Lasse"
    orcid: "https://orcid.org/0000-0003-1113-4779"
  date-published: 2023-01-26
  doi: 10.21105/joss.05197
  issn: 2475-9066
  issue: 83
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5197
  title: "timeseriesflattener: A Python package for summarizing features from (medical) time series"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05197"
  volume: 8
title: "timeseriesflattener: A Python package for summarizing features from (medical) time series"

GitHub Events

Total
  • Create event: 25
  • Release event: 3
  • Issues event: 3
  • Watch event: 3
  • Delete event: 23
  • Member event: 1
  • Issue comment event: 37
  • Push event: 17
  • Pull request review event: 20
  • Pull request event: 38
Last Year
  • Create event: 25
  • Release event: 3
  • Issues event: 3
  • Watch event: 3
  • Delete event: 23
  • Member event: 1
  • Issue comment event: 37
  • Push event: 17
  • Pull request review event: 20
  • Pull request event: 38

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,557
  • Total Committers: 13
  • Avg Commits per committer: 119.769
  • Development Distribution Score (DDS): 0.449
Past Year
  • Commits: 61
  • Committers: 6
  • Avg Commits per committer: 10.167
  • Development Distribution Score (DDS): 0.311
Top Committers
Name Email Commits
Martin Bernstorff m****f@g****m 858
Lasse l****0@g****m 240
dependabot[bot] 4****] 117
github-actions g****s@g****m 105
bokajgd b****d@g****m 83
sarakolding s****g@l****k 54
github-actions a****n@g****m 40
Kenneth Enevoldsen k****n@g****m 34
semantic-release s****e 9
frillecode f****5@g****m 7
Yaroslav Halchenko d****n@o****m 4
erikperfalk e****k@g****m 4
signekb s****k@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 18
  • Total pull requests: 83
  • Average time to close issues: 25 days
  • Average time to close pull requests: 12 days
  • Total issue authors: 4
  • Total pull request authors: 5
  • Average comments per issue: 2.44
  • Average comments per pull request: 1.65
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 68
Past Year
  • Issues: 2
  • Pull requests: 46
  • Average time to close issues: 13 days
  • Average time to close pull requests: 14 days
  • Issue authors: 1
  • Pull request authors: 4
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.43
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 40
Top Authors
Issue Authors
  • MartinBernstorff (63)
  • HLasse (23)
  • dependabot[bot] (3)
  • bokajgd (2)
  • sarakolding (1)
  • XiaoJia849 (1)
Pull Request Authors
  • dependabot[bot] (153)
  • MartinBernstorff (69)
  • HLasse (35)
  • sarakolding (12)
  • bokajgd (4)
Top Labels
Issue Labels
in-progress (45) Stale (9) dependencies (3)
Pull Request Labels
dependencies (155) Stale (7) closed-by-stalebot (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 284 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 102
  • Total maintainers: 1
pypi.org: timeseriesflattener

A package for converting time series data from e.g. electronic health records into wide format data.

  • Documentation: https://timeseriesflattener.readthedocs.io/
  • License: MIT License Copyright (c) 2022 PSYCOP Group, Aarhus University Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 2.5.2
    published 6 months ago
  • Versions: 102
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 284 Last month
Rankings
Dependent packages count: 4.7%
Downloads: 5.7%
Average: 10.7%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/generate_paper_pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
.github/actions/test/action.yml actions
  • MartinBernstorff/cache-poetry-and-venv latest composite
  • actions/setup-python v4 composite
  • snok/install-poetry v1 composite
.github/actions/test_tutorials/action.yml actions
  • MartinBernstorff/cache-poetry-and-venv latest composite
  • actions/setup-python v4 composite
  • snok/install-poetry v1 composite
.github/workflows/dependabot_automerge.yml actions
  • hmarr/auto-approve-action v2 composite
.github/workflows/documentation.yml actions
  • JamesIves/github-pages-deploy-action 4.1.4 composite
  • MartinBernstorff/cache-poetry-and-venv latest composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • snok/install-poetry v1 composite
.github/workflows/main_test_and_release.yml actions
  • ./.github/actions/test * composite
  • ./.github/actions/test_tutorials * composite
  • actions/checkout v3 composite
  • actions/checkout v2 composite
  • relekang/python-semantic-release v7.32.0 composite
.github/workflows/pre-commit.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pre-commit/action v3.0.0 composite
pyproject.toml pypi
  • black >=22.8.0,<22.10.1 develop
  • docformatter >=1.5.0, <1.5.2 develop
  • flake8 >=5.0.0, <5.0.6 develop
  • furo 2022.9.29 develop
  • mypy >=0.971,<0.992 develop
  • myst-parser >=0.18.1,<0.18.2 develop
  • pre-commit >=2.20.0, <2.20.2 develop
  • pylint >=2.15.5,<2.16.0 develop
  • pytest >=7.1.3, <7.1.5 develop
  • pytest-cov >=3.0.0,<4.0.1 develop
  • pytest-xdist >=2.4.0, <2.5.2 develop
  • sphinx >=5.3.0,<5.4.0 develop
  • sphinx-copybutton >=0.5.1,<0.5.2 develop
  • sphinx_design >=0.3.0,<0.3.1 develop
  • sphinxext-opengraph >=0.7.3,<0.7.4 develop
  • SQLAlchemy >=1.4.41, <1.5.42
  • catalogue >=2.0.0, <2.1.0
  • coloredlogs >14.0.0,<15.1.0
  • dask >=2022.9.0,<2022.12.0
  • deepchecks >=0.8.0,<0.10.0
  • dill >=0.3.0, <0.3.6
  • frozendict >=2.3.4,<2.4.0
  • jupyter >=1.0.0,<1.1.0
  • numpy >=1.23.3,<1.23.6
  • pandas >=1.4.0,<1.6.0
  • protobuf <=3.20.3
  • psutil >=5.9.1, <6.0.0
  • psycopmlutils >=0.2.4, <0.3.0
  • pyarrow >=9.0.0,<9.1.0
  • pydantic >=1.9.0, <1.10.0
  • pyodbc >=4.0.34, <4.0.36
  • python >=3.9, <3.11
  • scikit-learn >=1.1.2, <1.1.3
  • scipy >=1.8.0,<1.9.4
  • skimpy >=0.0.7,<0.1.0
  • srsly >=2.4.4,<2.4.6
  • wandb >=0.12.0,<0.13.5
  • wasabi >=0.9.1,<0.10.2