EHRtemporalVariability

R package for delineating temporal dataset shifts in Eletronic Health Records

https://github.com/hms-dbmi/ehrtemporalvariability

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

biomedical-data-science biomedical-informatics data-quality data-quality-monitoring dataset-shifts electronic-health-records time variability visualization
Last synced: 6 months ago · JSON representation

Repository

R package for delineating temporal dataset shifts in Eletronic Health Records

Basic Info
  • Host: GitHub
  • Owner: hms-dbmi
  • License: apache-2.0
  • Language: HTML
  • Default Branch: master
  • Homepage:
  • Size: 11.7 MB
Statistics
  • Stars: 17
  • Watchers: 6
  • Forks: 8
  • Open Issues: 0
  • Releases: 0
Topics
biomedical-data-science biomedical-informatics data-quality data-quality-monitoring dataset-shifts electronic-health-records time variability visualization
Created over 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.md

EHRtemporalVariability

R package for delineating temporal dataset shifts in Electronic Health Records

What is this repository for?

Functions to delineate temporal dataset shifts in Electronic Health Records through the projection and visualization of dissimilarities among data temporal batches. This is done through the estimation of data statistical distributions over time and their projection in non-parametric statistical manifolds, uncovering the patterns of the data latent temporal variability. Dataset shifts can be explored and identified through visual analytics formats such as Data Temporal Heatmaps (DTHs) and Information Geometric Temporal (IGT) plots 1-3. An additional EHRtemporalVariability Shiny app can be used to load and explore the package results and even to allow the use of these functions to those users non-experienced in R coding.

Sample DTH and IGT plot of variable 'Diagnosis Code #1 - PheWAS Code' of the NHDS data

Background

When making data science tasks on a longtime dataset, one must be aware that changes of reference and nature-induced changes in the data acquisition context can occur. These changes will likely be reflected as changes in the statistical distributions of data in form of dataset shifts. Temporal variability artifacts can introduce undesired heterogeneity in data over time, what can potentially hinder data quality and challenge the secondary use of data, particularly for population and data-driven research, as well as machine-learning. Statistical process control or time-series approaches can help detecting changes in summary statistics of data. However, there is a risk for loss of information, especially when deployed when using categorical variables with a particularly high number of values, as well as in multimodal statistical distributions in which multiple sub-phenotypes are present. EHRtemporalVariability provides the means to visually and analytically delineate dataset shifts in multi-modal and highly coded information, with no distributional assumptions made.

Our method is based upon the estimation and comparison of data statistical distributions over time 1-3. DTHs allow users to explore changes in absolute and relative frequencies over time—and at multiple variable values simultaneously (e.g., frequencies of phenotypes). IGT plots project time batches as a series of points. The distances between them correspond to the dissimilarity of their statistical distributions. This yields an empirical layout of temporal relationships between batches, namely a non-parametric temporal statistical manifold. IGT plots allow users to visually identify four types of changes: trends, represented as continuously flowing time batches; abrupt changes, shown as gaps between groups of batches; temporal subgroups, depicted as clusters of batches; and seasonality, portrayed as temporal cycles. Additionally, batches are labeled by date and color-coded to distinguish seasonal effects. For more information on how to use EHRtemporalVariability to delinate and identify these changes, please see the EHRtemporalVariability vignette

Package' Status

  • Version: 1.2.1
  • Authors: Carlos Sáez (UPV-HMS), Alba Gutiérrez-Sacristán (HMS), Isaac Kohane (HMS), Juan M García-Gómez (UPV), Paul Avillach (HMS)
  • Maintainer: Carlos Sáez (UPV-HMS)

Copyright: 2019 - Biomedical Data Science Lab, Universitat Politècnica de València, Spain (UPV) - Department of Biomedical Informatics, Harvard Medical School (HMS)

Documentation

Citation

If you use EHRtemporalVariability, please cite:

Sáez C, Gutiérrez-Sacristán A, Kohane I, García-Gómez JM, Avillach P. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. GigaScience, Volume 9, Issue 8, August 2020, giaa079. https://doi.org/10.1093/gigascience/giaa079

Consider also citing any of the original methods and case studies describing the approach:

[1]: Sáez C, Rodrigues PP, Gama J, Robles M, García-Gómez JM. Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Mining and Knowledge Discovery. 2015;29:950–75. https://doi.org/10.1007/s10618-014-0378-6

[2]: Sáez C, Zurriaga O, Pérez-Panadés J, Melchor I, Robles M, García-Gómez JM. Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in spain: A systematic approach to quality control of repositories. Journal of the American Medical Informatics Association. 2016;23:1085–95. https://doi.org/10.1093/jamia/ocw010

[3]: Sáez C, García-Gómez JM. Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. International Journal of Medical Informatics. 2018;119:109–24. https://doi.org/10.1016/j.ijmedinf.2018.09.015

Download

Install the latest released version from CRAN

R install.packages("EHRtemporalVariability")

Download the latest development code of EHRtemporalVariability from GitHub using devtools with

R devtools::install_github("hms-dbmi/EHRtemporalVariability")

Owner

  • Name: Harvard Medical School - Department of Biomedical Informatics
  • Login: hms-dbmi
  • Kind: organization
  • Location: Boston

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 36
  • Total Committers: 3
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.111
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Carlos Saez c****i@u****s 32
carsaesi s****s@g****m 3
Carlos Sáez 3****i 1
Committer Domains (Top 20 + Academic)
upv.es: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 348 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 1
cran.r-project.org: EHRtemporalVariability

Delineating Temporal Dataset Shifts in Electronic Health Records

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 348 Last month
Rankings
Forks count: 9.1%
Stargazers count: 15.1%
Average: 26.1%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Downloads: 40.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.3.0 depends
  • dplyr * depends
  • MASS * imports
  • RColorBrewer * imports
  • lubridate * imports
  • methods * imports
  • plotly * imports
  • scales * imports
  • viridis * imports
  • xts * imports
  • zoo * imports
  • BiocStyle * suggests
  • dbscan * suggests
  • devtools * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • webshot * suggests