dsBinVal

dsBinVal: Conducting distributed ROC analysis using DataSHIELD - Published in JOSS (2023)

https://github.com/difuture-lmu/dsbinval

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: arxiv.org, joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

auc calibration datashield distributed-computing roc

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

ROC-GLM and calibration analysis for DataSHIELD

Basic Info

Host: GitHub
Owner: difuture-lmu
License: lgpl-3.0
Language: R
Default Branch: main
Homepage:
Size: 1.85 MB

Statistics

Stars: 0
Watchers: 0
Forks: 1
Open Issues: 3
Releases: 2

Topics

auc calibration datashield distributed-computing roc

Created almost 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
---



```{r, include=FALSE}
options(width = 80)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "Readme_files/"
)

## Helper to determine OPAL version:
v_major  = seq(4L, 10L)
v_minor1 = seq_len(10L) - 1
v_minor2 = seq_len(10L) - 1

versions = expand.grid(v_major, v_minor1, v_minor2)
od = order(versions[, 1], versions[, 2], versions[, 3])
versions = versions[od, ]
vstrings = paste(versions[, 1], versions[, 2], versions[, 3], sep = ".")

getOPALVersion = function(opal, versions) {
  k = 1
  ov = opalr::opal.version_compare(opal, versions[k])
  while (ov != 0) {
    if (ov > 0) k = k + 1
    if (ov < 0) stop("Version is smaller than the smallest one from vector.")
    ov = opalr::opal.version_compare(opal, versions[k])
  }
  return(versions[k])
}

pkgs = c("here", "opalr", "DSI", "DSOpal", "dsBaseClient")
for (pkg in pkgs) {
  if (! requireNamespace(pkg, quietly = TRUE))
    install.packages(pkg, repos = c(getOption("repos"), "https://cran.obiba.org"))
}
devtools::install(quiet = TRUE, upgrade = "always")
library(DSI)
library(DSOpal)
library(dsBaseClient)

## Install packages on the DataSHIELD test machine:
surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

opal = opalr::opal.login(username = username, password = password, url = surl)
opal_version = getOPALVersion(opal, vstrings)

check1 = opalr::dsadmin.install_github_package(opal = opal, pkg = "dsBinVal", username = "difuture-lmu", ref = "main")
if (! check1)
  stop("[", Sys.time(), "] Was not able to install dsBinVal!")

check2 = opalr::dsadmin.publish_package(opal = opal, pkg = "dsBinVal")
if (! check2)
  stop("[", Sys.time(), "] Was not able to publish methods of dsBinVal!")

opalr::opal.logout(opal)

# Build model for the example, therefore download the CNSIM data sets from:
# https://github.com/datashield/DSLite/tree/master/data

if (FALSE) {
  dpath = "~/Downloads"
  dnames = paste0(dpath, "/CNSIM", seq_len(3), ".rda")
  dLoader = function(n) {
    load(n)
    dn = ls()
    dn = dn[grep("CNSIM", dn)]
    return(get(dn))
  }
  CNSIM = na.omit(do.call(rbind, lapply(dnames, dLoader)))
  mod = glm(DIS_DIAB ~ ., data = CNSIM, family = binomial())
  save(mod, file = here::here("Readme_files/mod.rda"))
}

```
[![R-CMD-check](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml) [![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0) [![codecov](https://codecov.io/gh/difuture-lmu/dsBinVal/branch/main/graph/badge.svg?token=E8AZRM6XJX)](https://codecov.io/gh/difuture-lmu/dsBinVal) [![DOI](https://joss.theoj.org/papers/10.21105/joss.04545/status.svg)](https://doi.org/10.21105/joss.04545)

# ROC-GLM and Calibration for DataSHIELD


The package provides functionality to conduct and visualize ROC analysis and calibration on decentralized data. The basis is the [DataSHIELD](https://www.datashield.org/) infrastructure for distributed computing. This package provides the calculation of the [**ROC-GLM**](https://www.jstor.org/stable/2676973?seq=1) with [**AUC confidence intervals**](https://www.jstor.org/stable/2531595?seq=1) as well as [**calibration curves**](https://www.geeksforgeeks.org/calibration-curves/) and the [**Brier score**](https://en.wikipedia.org/wiki/Brier_score). In order to calculate the ROC-GLM or assess calibration it is necessary to push models and predict them at the servers which is also provided by this package. Note that DataSHIELD uses [privacy filter](https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/714768398/Disclosure+control) from DataSHIELD v5 onwards that are also used in this package. Additionally, this package uses the old option `datashield.privacyLevel` (to indicate the minimal amount of values required to allow sharing an aggregation) as fallback. Instead of setting the option, we directly retrieve the fallback privacy level from the [`DESCRIPTION`](https://github.com/difuture-lmu/dsBinVal/blob/master/DESCRIPTION) file each time a function calls for it. This options is set to 5 by default. The methodology of the package is explained in detail [here](https://arxiv.org/abs/2203.10828).

## Installation

At the moment, there is no CRAN version available. Install the development version from GitHub:

```{r,eval=FALSE}
remotes::install_github("difuture-lmu/dsBinVal")
```

#### Register methods

It is necessary to register the assign and aggregate methods in the OPAL administration. These methods are registered automatically when publishing the package on OPAL (see [`DESCRIPTION`](https://github.com/difuture-lmu/dsBinVal/blob/main/DESCRIPTION)).

Note that the package needs to be installed at both locations, the server and the analysts machine.

## Installation on DataSHIELD

The two options are to use the Opal API:

- Log into Opal ans switch to the `Administration/DataSHIELD/` tab
- Click the `Add DataSHIELD package` button
- Select `GitHub` as source, and use `difuture-lmu` as user, `dsBinVal` as name, and `main` as Git reference.

The second option is to use the `opalr` package to install `dsBinVal` directly from `R`:
```{r, eval=FALSE}
### User credentials (here from the opal test server):
surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

### Install package and publish methods:
opal = opalr::opal.login(username = username, password = password, url = surl)

opalr::dsadmin.install_github_package(opal = opal, pkg = "dsBinVal", username = "difuture-lmu", ref = "main")
opalr::dsadmin.publish_package(opal = opal, pkg = "dsBinVal")

opalr::opal.logout(opal)
```

## Usage

A more sophisticated example is available [here](https://github.com/difuture-lmu/datashield-roc-glm-demo).

```{r}
library(dsBinVal)
```

#### Log into DataSHIELD server

```{r}
builder = newDSLoginBuilder()

surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

builder$append(
  server   = "ds1",
  url      = surl,
  user     = username,
  password = password,
  table    = "CNSIM.CNSIM1"
)
builder$append(
  server   = "ds2",
  url      = surl,
  user     = username,
  password = password,
  table    = "CNSIM.CNSIM2"
)
builder$append(
  server   = "ds3",
  url      = surl,
  user     = username,
  password = password,
  table    = "CNSIM.CNSIM3"
)

connections = datashield.login(logins = builder$build(), assign = TRUE)
```

#### Load test model, push to DataSHIELD, and calculate predictions

```{r}
# Load the model fitted locally on CNSIM:
load(here::here("Readme_files/mod.rda"))
# Model was calculated by:
#> glm(DIS_DIAB ~ ., data = CNSIM, family = binomial())

# Push the model to the DataSHIELD servers:
pushObject(connections, mod)

# Create a clean data set without NAs:
ds.completeCases("D", newobj = "D_complete")

# Calculate scores and save at the servers:
pfun =  "predict(mod, newdata = D, type = 'response')"
predictModel(connections, mod, "pred", "D_complete", predict_fun = pfun)

datashield.symbols(connections)
```

#### Calculate l2-sensitivity

```{r}
# In order to securely calculate the ROC-GLM, we have to assess the
# l2-sensitivity to set the privacy parameters of differential
# privacy adequately:
l2s = dsL2Sens(connections, "D_complete", "pred")
l2s

# Due to the results presented in https://arxiv.org/abs/2203.10828, we set the privacy parameters to
# - epsilon = 0.2, delta = 0.1 if        l2s <= 0.01
# - epsilon = 0.3, delta = 0.4 if 0.01 < l2s <= 0.03
# - epsilon = 0.5, delta = 0.3 if 0.03 < l2s <= 0.05
# - epsilon = 0.5, delta = 0.5 if 0.05 < l2s <= 0.07
# - epsilon = 0.5, delta = 0.5 if 0.07 < l2s BUT results may be not good!
```

#### Calculate ROC-GLM

```{r}
# The response must be encoded as integer/numeric vector:
ds.asInteger("D_complete$DIS_DIAB", "truth")
roc_glm = dsROCGLM(connections, truth_name = "truth", pred_name = "pred",
  dat_name = "D_complete", seed_object = "pred")
roc_glm

plot(roc_glm)
```

#### Assess calibration

```{r}
dsBrierScore(connections, "truth", "pred")

### Calculate and plot calibration curve:
cc = dsCalibrationCurve(connections, "truth", "pred")
cc

plot(cc)
```

## Deploy information:

__Build by `r Sys.info()[["login"]]` (`r Sys.info()[["sysname"]]`) on `r as.character(Sys.time())`.__

This readme is built automatically after each push to the repository and weekly on Monday. The autobuilt is computed by installing the package on the DataSHIELD test server and is therefore a test if the functionality of the package works on DataSHIELD servers. Additionally, the functionality is tested using the [GH Actions](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml) with [`tests/testthat/test_on_active_server.R`](https://github.com/difuture-lmu/dsBinVal/blob/main/tests/testthat/test_on_active_server.R). The system information of the local and remote machines are:


```{r, include=FALSE}
ri_l  = sessionInfo()
ri_ds = datashield.aggregate(connections, quote(getDataSHIELDInfo()))
client_pkgs = c("DSI", "DSOpal", "dsBaseClient", "dsBinVal")
remote_pkgs = c("dsBase", "resourcer", "dsBinVal")
```

- Local machine:
    - `R` version: `r ri_l$R.version$version.string`
    - Version of DataSHELD client packages:


```{r, echo=FALSE}
dfv = installed.packages()[client_pkgs, ]
dfv = data.frame(Package = rownames(dfv), Version = unname(dfv[, "Version"]))
knitr::kable(dfv)
```

- Remote DataSHIELD machines:
    - OPAL version of the test instance: `r opal_version`
    - `R` version of `r names(ri_ds)[1]`: `r ri_ds[[1]]$session$R.version$version.string`
    - `R` version of `r names(ri_ds)[2]`: `r ri_ds[[2]]$session$R.version$version.string`
    - Version of server packages:


```{r, echo=FALSE}
dfv = do.call(cbind, lapply(names(ri_ds), function(nm) {
  out = ri_ds[[nm]]$pcks[remote_pkgs, "Version", drop = FALSE]
  colnames(out) = paste0(nm, ": ", colnames(out))
  as.data.frame(out)
}))
dfv = cbind(Package = rownames(dfv), dfv)
rownames(dfv) = NULL
knitr::kable(dfv)
```

```{r, include=FALSE}
datashield.logout(connections)
```

Owner

Name: difuture-lmu
Login: difuture-lmu
Kind: organization

Repositories: 3
Profile: https://github.com/difuture-lmu

JOSS Publication

dsBinVal: Conducting distributed ROC analysis using DataSHIELD

Published

February 21, 2023

DOI

10.21105/joss.04545

Volume 8, Issue 82, Page 4545

Authors

Daniel Schalk

Department of Statistics, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany

Verena Sophia Hoffmann
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany

Bernd Bischl
Department of Statistics, LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany

Ulrich Mansmann
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany

Editor

Charlotte Soneson

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 339
Total Committers: 134
Avg Commits per committer: 2.53
Development Distribution Score (DDS): 0.442

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
schalkdaniel	d**k@t**e	189
Daniel Schalk	s**2@g**m	18
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
and 104 more...

Committer Domains (Top 20 + Academic)

t-online.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 13
Total pull requests: 6
Average time to close issues: 27 days
Average time to close pull requests: 22 days
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 0.92
Average comments per pull request: 0.17
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

AnthonyOfSeattle (7)
schalkdaniel (6)

Pull Request Authors

schalkdaniel (5)
csoneson (1)

Top Labels

Issue Labels

enhancement (2) important (1)

Pull Request Labels

Dependencies

DESCRIPTION cran

R >= 3.1.0 depends
DSI * imports
checkmate * imports
digest * imports
stringr * imports
DSOpal * suggests
ggplot2 * suggests
opalr * suggests
testthat * suggests

.github/workflows/R-CMD-check.yaml actions

actions/checkout v2 composite
actions/upload-artifact main composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/draft-pdf.yml actions

actions/checkout v2 composite
actions/upload-artifact v1 composite
openjournals/openjournals-draft-action master composite

.github/workflows/lint.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
r-lib/actions/setup-r v2 composite

.github/workflows/render-readme.yaml actions

actions/checkout v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

.github/workflows/test-coverage.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

dsBinVal

Science Score: 93.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

JOSS Publication

dsBinVal: Conducting distributed ROC analysis using DataSHIELD

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies