medicaldata

Data Package for Medical Datasets

https://github.com/higgi13425/medicaldata

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.8%) to scientific vocabulary

Keywords

datasets
Last synced: 6 months ago · JSON representation

Repository

Data Package for Medical Datasets

Basic Info
Statistics
  • Stars: 57
  • Watchers: 2
  • Forks: 11
  • Open Issues: 1
  • Releases: 1
Topics
datasets
Created over 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

# medicaldata 

## Overview

This is a data package with 19 medical datasets for teaching Reproducible Medical Research with R. The link to the pkgdown reference website for {medicaldata} is [here](https://higgi13425.github.io/medicaldata/) and in the links at the right. This package will be useful for anyone teaching R to medical professionals, including doctors, nurses, pharmacists, trainees, and students. 

These datasets range from reconstructed versions of James Lind's scurvy dataset (1757) and the original Streptomycin for Tuberculosis trial (1948), a 2012 RCT of indomethacin to prevent post-ERCP pancreatitis that I was involved in, to cohort data on SARS-CoV2 testing results (2020). Many of the datasets come from the American Statistical Association's TSHS (Teaching Statistics in the Health Sciences) [Resources Portal](https://www.causeweb.org/tshs/category/dataset/), maintained by [Carol Bigelow](https://www.umass.edu/sphhs/person/carol-bigelow) at the University of Massachusetts (with permission). A growing number of datasets in the dev version were generously donated by [Frank Harrell](https://www.fharrell.com) from his website [here](https://hbiostat.org/data/). These datasets are currently only in the [dev version](https://github.com/higgi13425/medicaldata/) of the package on github.com, which should make it to CRAN in June of 2023. ## How to Install and Use {medicaldata} Datasets 1. Install the stable, current CRAN version with `install.packages("medicaldata")`. If you want to try out the in-development version (which may have new datasets and vignettes, but which may also be intermittently wonky), install with: `remotes::install_github("higgi13425/medicaldata")` 2. Then load the package with `library(medicaldata)` 3. Then you can list the datasets available with `data(package = "medicaldata")` 4. Then assign a particular dataset to a named object in your environment with:
`covid <- medicaldata::covid_testing`
where `covid` is the name of the new object, and `covid_testing` is the name of the dataset.
5. Articles (vignettes) on how to use the datasets can be found at the pkgdown [website](https://higgi13425.github.io/medicaldata/) under the **Articles** tab. 6. You can click on the links below to view the description document and/or codebook for each dataset. This information is also available under the Reference tab above, or within R by using `help(dataset_name)`.
## Please Donate Datasets If you have access to data from a randomized, controlled clinical trial, or a prospective cohort study, or even a case-control study, please consider obtaining the appropriate permissions, anonymizing the data, and donating the dataset for teaching purposes to add to this package. Open an issue on the github page (source code link at the top right) to open the discussion of a data donation. I am happy to help with anonymization. ## List of Datasets Click on links below for more details about the dataset itself in the Description Document, and more details about the variables included in the dataset in the Codebook. Note that each dataset also has a help file that you can use within R or RStudio, by entering `help("dataset_name")` in the Console pane. The fourth column of the table below (scroll to the right or widen your browser window) describes the study design, as requested by Dan Sjoberg of {gtsummary} fame. | Dataset | Description document | Codebook | Design | |:----------|:----------|:---------|:---------| |strep_tb|[strep_tb_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/strep_tb_desc.html) | [strep_tb_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/strep_tb_codebook.html)| Randomized Controlled Trial (RCT)| |scurvy|[scurvy_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/scurvy_desc.html) | [scurvy_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/scurvy_codebook.html)| RCT | |indo_rct|[indo_rct_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/indo_rct_desc.html) | [indo_rct_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/indo_rct_codebook.html)| RCT | |polyps|[polyps_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/polyps_desc.html)| [polyps_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/polyps_codebook.html)| RCT | |cervical dystonia (dev) |[cdystonia_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/cdystonia_desc.html)| [cdystonia_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/cdystonia_codebook.html)| RCT | | covid_testing | [covid_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/covid_desc.html) | [covid_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/covid_testing_codebook.html)| Retrospective cross-sectional | | blood_storage | [blood_storage_desc](https://www.causeweb.org/tshs/datasets/Blood%20Storage%20Dataset%20Introduction.pdf) | [blood_storage_codebook](https://www.causeweb.org/tshs/datasets/Blood%20Storage%20Data%20Dictionary.pdf) | Retrospective Cohort Study | | cytomegalovirus | [cytomegalovirus_desc](https://www.causeweb.org/tshs/datasets/Cytomegalovirus%20Dataset%20Introduction.pdf) | [cytomegalovirus_codebook](https://www.causeweb.org/tshs/datasets/Cytomegalovirus%20Data%20Dictionary.pdf)| Retrospective Cohort Study | | esoph_ca| [esoph_ca_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/esoph_ca_desc.html) | [esoph_ca_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/esoph_ca_codebook.html)| Case-control study | | laryngoscope | [laryngoscope_desc](https://www.causeweb.org/tshs/datasets/Laryngoscope%20Dataset%20Introduction.pdf) | [laryngoscope_codebook](https://www.causeweb.org/tshs/datasets/Laryngoscope%20Data%20Dictionary.pdf)| RCT | | licorice_gargle | [licorice_gargle_desc](https://www.causeweb.org/tshs/datasets/Licorice%20Gargle%20Dataset%20Introduction.pdf) | [licorice_gargle_codebook](https://www.causeweb.org/tshs/datasets/Licorice%20Gargle%20Data%20Dictionary.pdf)| RCT | | opt | [opt_desc](https://www.causeweb.org/tshs/datasets/OPT_Dataset_Introduction.pdf) | [opt_codebook](https://www.causeweb.org/tshs/datasets/OPT_Data_Dictionary.pdf) | RCT | | cath (dev)|[cath_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/cath_desc.html) | [cath_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/cath_codebook.html)| Retrospective Cohort Study | | smartpill | [smartpill_desc](https://www.causeweb.org/tshs/datasets/Smart%20Pill%20Dataset%20Introduction.pdf) | [smartpill_codebook](https://www.causeweb.org/tshs/datasets/Smart%20Pill%20Data%20Dictionary.pdf) | Prospective Cohort Study | | supraclavicular | [supraclavicular_desc](https://www.causeweb.org/tshs/datasets/Supraclavicular%20Dataset%20Introduction.pdf) | [supraclavicular_codebook](https://www.causeweb.org/tshs/datasets/Supraclavicular%20Data%20Dictionary.pdf) | RCT | | indometh | [indometh_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/indometh_desc.html) | [indometh_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/indometh_codebook.html) | Prospective Cohort Pharmacokinetic (PK) Study | | theoph | [theoph_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/theoph_desc.html) | [theoph_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/theoph_codebook.html) | Prospective Cohort PK Study | | diabetes (dev) | [diabetes_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/diabetes_desc.html) | [diabetes_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/diabetes_codebook.html) | Prospective Longitudinal Cohort Study | | thiomon (dev) | [thiomon_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/thiomon_desc.html) | [thiomon_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/thiomon_codebook.html) | Retrospective Cohort Study, suitable for ML | | abm (dev) | [abm_desc](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/description_docs/abm_desc.html) | [abm_codebook](https://htmlpreview.github.io/?https://github.com/higgi13425/medicaldata/blob/master/man/codebooks/abm_codebook.html) | Retrospective Cohort Study | ## Messy Datasets I am doing a beta test of messy datasets, largely in Excel, with many annoying non-tidy and non-rectangular features that will help teach data cleaning/wrangling. These are not actually in the package itself (as they are not R files), but can be found in the GitHub repository. You can download and open these from the GitHub repo in all of their messy Excel glory by clicking on the URL links in the table below. You can also find them [here in the list on the GitHub repo](https://github.com/higgi13425/medicaldata/tree/master/data-raw/messy_data), where you can click on one of the *.xlsx files, then click on the `View Raw` button to download it. You can read these datasets directly into R from the urls in the table below with the example code found in the following code chunk, which reads in the `messy_infarct` dataset and assigns it to the object `infarct`. It may be easiest to copy the entire code chunk below by hovering over the copy icon in the top right corner, then clicking to copy. ```{r, eval=FALSE} # install.packages('openxlsx') # if not already installed library(openxlsx) url <- "https://github.com/higgi13425/medicaldata/raw/master/data-raw/messy_data/messy_infarct.xlsx" # replace the filename "messy_infarct.xlsx" at the end of this long url path with the filename that you want to load. # Or just copy the whole path from the URL column below. infarct <- openxlsx::read.xlsx(url) head(infarct) ``` ### Available Messy Datasets (beta) | Dataset | URL | Type of Messiness | |:----------|:----------|:---------| | messy_cirrhosis | "https://github.com/higgi13425/medicaldata/raw/master/data-raw/messy_data/messy_cirrhosis.xlsx" | Pivot Table | | messy_infarct | "https://github.com/higgi13425/medicaldata/raw/master/data-raw/messy_data/messy_infarct.xlsx" | Pivot Table | | messy_aki | "https://github.com/higgi13425/medicaldata/raw/master/data-raw/messy_data/messy_aki.xlsx" | unique ids, header and footer rows, empty rows & cols, messy varnames, no units, typos in factors, visit date in headers, dates | | messy_bp | "https://github.com/higgi13425/medicaldata/raw/master/data-raw/messy_data/messy_bp.xlsx" | unite and separate, vars without units, visit num in headers, data entry errors | | messy_glucose | "https://github.com/higgi13425/medicaldata/raw/master/data-raw/messy_data/messy_glucose.xlsx" | factors, vars without units, visit num in headers, header rows, empty rows/cols| [![R-CMD-check](https://github.com/higgi13425/medicaldata/workflows/R-CMD-check/badge.svg)](https://github.com/higgi13425/medicaldata/actions) [![CRAN status](https://www.r-pkg.org/badges/version/medicaldata)](https://cran.r-project.org/package=medicaldata) [![](https://cranlogs.r-pkg.org/badges/medicaldata)](https://cran.r-project.org/package=medicaldata) [![DOI](https://zenodo.org/badge/385090155.svg)](https://zenodo.org/badge/latestdoi/385090155) [![R-CMD-check](https://github.com/higgi13425/medicaldata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/higgi13425/medicaldata/actions/workflows/R-CMD-check.yaml)

Owner

  • Name: Peter Higgins
  • Login: higgi13425
  • Kind: user
  • Location: Ann Arbor, Michigan
  • Company: University of Michigan

GitHub Events

Total
  • Issues event: 1
  • Watch event: 8
Last Year
  • Issues event: 1
  • Watch event: 8

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 272
  • Total Committers: 2
  • Avg Commits per committer: 136.0
  • Development Distribution Score (DDS): 0.004
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Peter Higgins h****5@y****m 271
Daniel Sjoberg d****g@g****m 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 7
  • Total pull requests: 1
  • Average time to close issues: 10 days
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 7
  • Total pull request authors: 1
  • Average comments per issue: 2.43
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • therneau (1)
  • avehtari (1)
  • daszlosek (1)
  • ddsjoberg (1)
  • jromanowska (1)
  • rosemm (1)
  • vonthein (1)
Pull Request Authors
  • ddsjoberg (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 697 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 4
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: medicaldata

Data Package for Medical Datasets

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 4
  • Downloads: 697 Last month
Rankings
Forks count: 7.3%
Stargazers count: 8.4%
Downloads: 12.8%
Average: 14.3%
Dependent repos count: 14.6%
Dependent packages count: 28.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1 depends
  • flextable * suggests
  • janitor * suggests
  • knitr * suggests
  • markdown * suggests
  • rmarkdown * suggests
  • tidyverse * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/document.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite