dsBinVal

dsBinVal: Conducting distributed ROC analysis using DataSHIELD - Published in JOSS (2023)

https://github.com/difuture-lmu/dsbinval

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

auc calibration datashield distributed-computing roc

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

ROC-GLM and calibration analysis for DataSHIELD

Basic Info
  • Host: GitHub
  • Owner: difuture-lmu
  • License: lgpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 1.85 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 1
  • Open Issues: 3
  • Releases: 2
Topics
auc calibration datashield distributed-computing roc
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
---



```{r, include=FALSE}
options(width = 80)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "Readme_files/"
)

## Helper to determine OPAL version:
v_major  = seq(4L, 10L)
v_minor1 = seq_len(10L) - 1
v_minor2 = seq_len(10L) - 1

versions = expand.grid(v_major, v_minor1, v_minor2)
od = order(versions[, 1], versions[, 2], versions[, 3])
versions = versions[od, ]
vstrings = paste(versions[, 1], versions[, 2], versions[, 3], sep = ".")

getOPALVersion = function(opal, versions) {
  k = 1
  ov = opalr::opal.version_compare(opal, versions[k])
  while (ov != 0) {
    if (ov > 0) k = k + 1
    if (ov < 0) stop("Version is smaller than the smallest one from vector.")
    ov = opalr::opal.version_compare(opal, versions[k])
  }
  return(versions[k])
}

pkgs = c("here", "opalr", "DSI", "DSOpal", "dsBaseClient")
for (pkg in pkgs) {
  if (! requireNamespace(pkg, quietly = TRUE))
    install.packages(pkg, repos = c(getOption("repos"), "https://cran.obiba.org"))
}
devtools::install(quiet = TRUE, upgrade = "always")
library(DSI)
library(DSOpal)
library(dsBaseClient)

## Install packages on the DataSHIELD test machine:
surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

opal = opalr::opal.login(username = username, password = password, url = surl)
opal_version = getOPALVersion(opal, vstrings)

check1 = opalr::dsadmin.install_github_package(opal = opal, pkg = "dsBinVal", username = "difuture-lmu", ref = "main")
if (! check1)
  stop("[", Sys.time(), "] Was not able to install dsBinVal!")

check2 = opalr::dsadmin.publish_package(opal = opal, pkg = "dsBinVal")
if (! check2)
  stop("[", Sys.time(), "] Was not able to publish methods of dsBinVal!")

opalr::opal.logout(opal)

# Build model for the example, therefore download the CNSIM data sets from:
# https://github.com/datashield/DSLite/tree/master/data

if (FALSE) {
  dpath = "~/Downloads"
  dnames = paste0(dpath, "/CNSIM", seq_len(3), ".rda")
  dLoader = function(n) {
    load(n)
    dn = ls()
    dn = dn[grep("CNSIM", dn)]
    return(get(dn))
  }
  CNSIM = na.omit(do.call(rbind, lapply(dnames, dLoader)))
  mod = glm(DIS_DIAB ~ ., data = CNSIM, family = binomial())
  save(mod, file = here::here("Readme_files/mod.rda"))
}

```
[![R-CMD-check](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml) [![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0) [![codecov](https://codecov.io/gh/difuture-lmu/dsBinVal/branch/main/graph/badge.svg?token=E8AZRM6XJX)](https://codecov.io/gh/difuture-lmu/dsBinVal) [![DOI](https://joss.theoj.org/papers/10.21105/joss.04545/status.svg)](https://doi.org/10.21105/joss.04545)

# ROC-GLM and Calibration for DataSHIELD


The package provides functionality to conduct and visualize ROC analysis and calibration on decentralized data. The basis is the [DataSHIELD](https://www.datashield.org/) infrastructure for distributed computing. This package provides the calculation of the [**ROC-GLM**](https://www.jstor.org/stable/2676973?seq=1) with [**AUC confidence intervals**](https://www.jstor.org/stable/2531595?seq=1) as well as [**calibration curves**](https://www.geeksforgeeks.org/calibration-curves/) and the [**Brier score**](https://en.wikipedia.org/wiki/Brier_score). In order to calculate the ROC-GLM or assess calibration it is necessary to push models and predict them at the servers which is also provided by this package. Note that DataSHIELD uses [privacy filter](https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/714768398/Disclosure+control) from DataSHIELD v5 onwards that are also used in this package. Additionally, this package uses the old option `datashield.privacyLevel` (to indicate the minimal amount of values required to allow sharing an aggregation) as fallback. Instead of setting the option, we directly retrieve the fallback privacy level from the [`DESCRIPTION`](https://github.com/difuture-lmu/dsBinVal/blob/master/DESCRIPTION) file each time a function calls for it. This options is set to 5 by default. The methodology of the package is explained in detail [here](https://arxiv.org/abs/2203.10828).

## Installation

At the moment, there is no CRAN version available. Install the development version from GitHub:

```{r,eval=FALSE}
remotes::install_github("difuture-lmu/dsBinVal")
```

#### Register methods

It is necessary to register the assign and aggregate methods in the OPAL administration. These methods are registered automatically when publishing the package on OPAL (see [`DESCRIPTION`](https://github.com/difuture-lmu/dsBinVal/blob/main/DESCRIPTION)).

Note that the package needs to be installed at both locations, the server and the analysts machine.

## Installation on DataSHIELD

The two options are to use the Opal API:

- Log into Opal ans switch to the `Administration/DataSHIELD/` tab
- Click the `Add DataSHIELD package` button
- Select `GitHub` as source, and use `difuture-lmu` as user, `dsBinVal` as name, and `main` as Git reference.

The second option is to use the `opalr` package to install `dsBinVal` directly from `R`:
```{r, eval=FALSE}
### User credentials (here from the opal test server):
surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

### Install package and publish methods:
opal = opalr::opal.login(username = username, password = password, url = surl)

opalr::dsadmin.install_github_package(opal = opal, pkg = "dsBinVal", username = "difuture-lmu", ref = "main")
opalr::dsadmin.publish_package(opal = opal, pkg = "dsBinVal")

opalr::opal.logout(opal)
```

## Usage

A more sophisticated example is available [here](https://github.com/difuture-lmu/datashield-roc-glm-demo).

```{r}
library(dsBinVal)
```

#### Log into DataSHIELD server

```{r}
builder = newDSLoginBuilder()

surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

builder$append(
  server   = "ds1",
  url      = surl,
  user     = username,
  password = password,
  table    = "CNSIM.CNSIM1"
)
builder$append(
  server   = "ds2",
  url      = surl,
  user     = username,
  password = password,
  table    = "CNSIM.CNSIM2"
)
builder$append(
  server   = "ds3",
  url      = surl,
  user     = username,
  password = password,
  table    = "CNSIM.CNSIM3"
)

connections = datashield.login(logins = builder$build(), assign = TRUE)
```

#### Load test model, push to DataSHIELD, and calculate predictions

```{r}
# Load the model fitted locally on CNSIM:
load(here::here("Readme_files/mod.rda"))
# Model was calculated by:
#> glm(DIS_DIAB ~ ., data = CNSIM, family = binomial())

# Push the model to the DataSHIELD servers:
pushObject(connections, mod)

# Create a clean data set without NAs:
ds.completeCases("D", newobj = "D_complete")

# Calculate scores and save at the servers:
pfun =  "predict(mod, newdata = D, type = 'response')"
predictModel(connections, mod, "pred", "D_complete", predict_fun = pfun)

datashield.symbols(connections)
```

#### Calculate l2-sensitivity

```{r}
# In order to securely calculate the ROC-GLM, we have to assess the
# l2-sensitivity to set the privacy parameters of differential
# privacy adequately:
l2s = dsL2Sens(connections, "D_complete", "pred")
l2s

# Due to the results presented in https://arxiv.org/abs/2203.10828, we set the privacy parameters to
# - epsilon = 0.2, delta = 0.1 if        l2s <= 0.01
# - epsilon = 0.3, delta = 0.4 if 0.01 < l2s <= 0.03
# - epsilon = 0.5, delta = 0.3 if 0.03 < l2s <= 0.05
# - epsilon = 0.5, delta = 0.5 if 0.05 < l2s <= 0.07
# - epsilon = 0.5, delta = 0.5 if 0.07 < l2s BUT results may be not good!
```

#### Calculate ROC-GLM

```{r}
# The response must be encoded as integer/numeric vector:
ds.asInteger("D_complete$DIS_DIAB", "truth")
roc_glm = dsROCGLM(connections, truth_name = "truth", pred_name = "pred",
  dat_name = "D_complete", seed_object = "pred")
roc_glm

plot(roc_glm)
```

#### Assess calibration

```{r}
dsBrierScore(connections, "truth", "pred")

### Calculate and plot calibration curve:
cc = dsCalibrationCurve(connections, "truth", "pred")
cc

plot(cc)
```

## Deploy information:

__Build by `r Sys.info()[["login"]]` (`r Sys.info()[["sysname"]]`) on `r as.character(Sys.time())`.__

This readme is built automatically after each push to the repository and weekly on Monday. The autobuilt is computed by installing the package on the DataSHIELD test server and is therefore a test if the functionality of the package works on DataSHIELD servers. Additionally, the functionality is tested using the [GH Actions](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml) with [`tests/testthat/test_on_active_server.R`](https://github.com/difuture-lmu/dsBinVal/blob/main/tests/testthat/test_on_active_server.R). The system information of the local and remote machines are:


```{r, include=FALSE}
ri_l  = sessionInfo()
ri_ds = datashield.aggregate(connections, quote(getDataSHIELDInfo()))
client_pkgs = c("DSI", "DSOpal", "dsBaseClient", "dsBinVal")
remote_pkgs = c("dsBase", "resourcer", "dsBinVal")
```

- Local machine:
    - `R` version: `r ri_l$R.version$version.string`
    - Version of DataSHELD client packages:


```{r, echo=FALSE}
dfv = installed.packages()[client_pkgs, ]
dfv = data.frame(Package = rownames(dfv), Version = unname(dfv[, "Version"]))
knitr::kable(dfv)
```

- Remote DataSHIELD machines:
    - OPAL version of the test instance: `r opal_version`
    - `R` version of `r names(ri_ds)[1]`: `r ri_ds[[1]]$session$R.version$version.string`
    - `R` version of `r names(ri_ds)[2]`: `r ri_ds[[2]]$session$R.version$version.string`
    - Version of server packages:


```{r, echo=FALSE}
dfv = do.call(cbind, lapply(names(ri_ds), function(nm) {
  out = ri_ds[[nm]]$pcks[remote_pkgs, "Version", drop = FALSE]
  colnames(out) = paste0(nm, ": ", colnames(out))
  as.data.frame(out)
}))
dfv = cbind(Package = rownames(dfv), dfv)
rownames(dfv) = NULL
knitr::kable(dfv)
```

```{r, include=FALSE}
datashield.logout(connections)
```

Owner

  • Name: difuture-lmu
  • Login: difuture-lmu
  • Kind: organization

JOSS Publication

dsBinVal: Conducting distributed ROC analysis using DataSHIELD
Published
February 21, 2023
Volume 8, Issue 82, Page 4545
Authors
Daniel Schalk ORCID
Department of Statistics, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany
Verena Sophia Hoffmann
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany
Bernd Bischl
Department of Statistics, LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany
Ulrich Mansmann
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany
Editor
Charlotte Soneson ORCID
Tags
DataSHIELD distributed computing distributed analysis privacy-preserving diagnostic tests prognostic model model validation ROC-GLM discrimination calibration Brier score

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 339
  • Total Committers: 134
  • Avg Commits per committer: 2.53
  • Development Distribution Score (DDS): 0.442
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
schalkdaniel d****k@t****e 189
Daniel Schalk s****2@g****m 18
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
runner r****r@M****l 1
and 104 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 13
  • Total pull requests: 6
  • Average time to close issues: 27 days
  • Average time to close pull requests: 22 days
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.92
  • Average comments per pull request: 0.17
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AnthonyOfSeattle (7)
  • schalkdaniel (6)
Pull Request Authors
  • schalkdaniel (5)
  • csoneson (1)
Top Labels
Issue Labels
enhancement (2) important (1)
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.1.0 depends
  • DSI * imports
  • checkmate * imports
  • digest * imports
  • stringr * imports
  • DSOpal * suggests
  • ggplot2 * suggests
  • opalr * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
.github/workflows/lint.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/render-readme.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite