dsBinVal
dsBinVal: Conducting distributed ROC analysis using DataSHIELD - Published in JOSS (2023)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: arxiv.org, joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
auc
calibration
datashield
distributed-computing
roc
Scientific Fields
Earth and Environmental Sciences
Physical Sciences -
40% confidence
Last synced: 4 months ago
·
JSON representation
Repository
ROC-GLM and calibration analysis for DataSHIELD
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 3
- Releases: 2
Topics
auc
calibration
datashield
distributed-computing
roc
Created over 3 years ago
· Last pushed over 1 year ago
Metadata Files
Readme
Changelog
Contributing
License
Code of conduct
README.Rmd
---
output: github_document
---
```{r, include=FALSE}
options(width = 80)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "Readme_files/"
)
## Helper to determine OPAL version:
v_major = seq(4L, 10L)
v_minor1 = seq_len(10L) - 1
v_minor2 = seq_len(10L) - 1
versions = expand.grid(v_major, v_minor1, v_minor2)
od = order(versions[, 1], versions[, 2], versions[, 3])
versions = versions[od, ]
vstrings = paste(versions[, 1], versions[, 2], versions[, 3], sep = ".")
getOPALVersion = function(opal, versions) {
k = 1
ov = opalr::opal.version_compare(opal, versions[k])
while (ov != 0) {
if (ov > 0) k = k + 1
if (ov < 0) stop("Version is smaller than the smallest one from vector.")
ov = opalr::opal.version_compare(opal, versions[k])
}
return(versions[k])
}
pkgs = c("here", "opalr", "DSI", "DSOpal", "dsBaseClient")
for (pkg in pkgs) {
if (! requireNamespace(pkg, quietly = TRUE))
install.packages(pkg, repos = c(getOption("repos"), "https://cran.obiba.org"))
}
devtools::install(quiet = TRUE, upgrade = "always")
library(DSI)
library(DSOpal)
library(dsBaseClient)
## Install packages on the DataSHIELD test machine:
surl = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"
opal = opalr::opal.login(username = username, password = password, url = surl)
opal_version = getOPALVersion(opal, vstrings)
check1 = opalr::dsadmin.install_github_package(opal = opal, pkg = "dsBinVal", username = "difuture-lmu", ref = "main")
if (! check1)
stop("[", Sys.time(), "] Was not able to install dsBinVal!")
check2 = opalr::dsadmin.publish_package(opal = opal, pkg = "dsBinVal")
if (! check2)
stop("[", Sys.time(), "] Was not able to publish methods of dsBinVal!")
opalr::opal.logout(opal)
# Build model for the example, therefore download the CNSIM data sets from:
# https://github.com/datashield/DSLite/tree/master/data
if (FALSE) {
dpath = "~/Downloads"
dnames = paste0(dpath, "/CNSIM", seq_len(3), ".rda")
dLoader = function(n) {
load(n)
dn = ls()
dn = dn[grep("CNSIM", dn)]
return(get(dn))
}
CNSIM = na.omit(do.call(rbind, lapply(dnames, dLoader)))
mod = glm(DIS_DIAB ~ ., data = CNSIM, family = binomial())
save(mod, file = here::here("Readme_files/mod.rda"))
}
```
[](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml) [](https://www.gnu.org/licenses/lgpl-3.0) [](https://codecov.io/gh/difuture-lmu/dsBinVal) [](https://doi.org/10.21105/joss.04545)
# ROC-GLM and Calibration for DataSHIELD
The package provides functionality to conduct and visualize ROC analysis and calibration on decentralized data. The basis is the [DataSHIELD](https://www.datashield.org/) infrastructure for distributed computing. This package provides the calculation of the [**ROC-GLM**](https://www.jstor.org/stable/2676973?seq=1) with [**AUC confidence intervals**](https://www.jstor.org/stable/2531595?seq=1) as well as [**calibration curves**](https://www.geeksforgeeks.org/calibration-curves/) and the [**Brier score**](https://en.wikipedia.org/wiki/Brier_score). In order to calculate the ROC-GLM or assess calibration it is necessary to push models and predict them at the servers which is also provided by this package. Note that DataSHIELD uses [privacy filter](https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/714768398/Disclosure+control) from DataSHIELD v5 onwards that are also used in this package. Additionally, this package uses the old option `datashield.privacyLevel` (to indicate the minimal amount of values required to allow sharing an aggregation) as fallback. Instead of setting the option, we directly retrieve the fallback privacy level from the [`DESCRIPTION`](https://github.com/difuture-lmu/dsBinVal/blob/master/DESCRIPTION) file each time a function calls for it. This options is set to 5 by default. The methodology of the package is explained in detail [here](https://arxiv.org/abs/2203.10828).
## Installation
At the moment, there is no CRAN version available. Install the development version from GitHub:
```{r,eval=FALSE}
remotes::install_github("difuture-lmu/dsBinVal")
```
#### Register methods
It is necessary to register the assign and aggregate methods in the OPAL administration. These methods are registered automatically when publishing the package on OPAL (see [`DESCRIPTION`](https://github.com/difuture-lmu/dsBinVal/blob/main/DESCRIPTION)).
Note that the package needs to be installed at both locations, the server and the analysts machine.
## Installation on DataSHIELD
The two options are to use the Opal API:
- Log into Opal ans switch to the `Administration/DataSHIELD/` tab
- Click the `Add DataSHIELD package` button
- Select `GitHub` as source, and use `difuture-lmu` as user, `dsBinVal` as name, and `main` as Git reference.
The second option is to use the `opalr` package to install `dsBinVal` directly from `R`:
```{r, eval=FALSE}
### User credentials (here from the opal test server):
surl = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"
### Install package and publish methods:
opal = opalr::opal.login(username = username, password = password, url = surl)
opalr::dsadmin.install_github_package(opal = opal, pkg = "dsBinVal", username = "difuture-lmu", ref = "main")
opalr::dsadmin.publish_package(opal = opal, pkg = "dsBinVal")
opalr::opal.logout(opal)
```
## Usage
A more sophisticated example is available [here](https://github.com/difuture-lmu/datashield-roc-glm-demo).
```{r}
library(dsBinVal)
```
#### Log into DataSHIELD server
```{r}
builder = newDSLoginBuilder()
surl = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"
builder$append(
server = "ds1",
url = surl,
user = username,
password = password,
table = "CNSIM.CNSIM1"
)
builder$append(
server = "ds2",
url = surl,
user = username,
password = password,
table = "CNSIM.CNSIM2"
)
builder$append(
server = "ds3",
url = surl,
user = username,
password = password,
table = "CNSIM.CNSIM3"
)
connections = datashield.login(logins = builder$build(), assign = TRUE)
```
#### Load test model, push to DataSHIELD, and calculate predictions
```{r}
# Load the model fitted locally on CNSIM:
load(here::here("Readme_files/mod.rda"))
# Model was calculated by:
#> glm(DIS_DIAB ~ ., data = CNSIM, family = binomial())
# Push the model to the DataSHIELD servers:
pushObject(connections, mod)
# Create a clean data set without NAs:
ds.completeCases("D", newobj = "D_complete")
# Calculate scores and save at the servers:
pfun = "predict(mod, newdata = D, type = 'response')"
predictModel(connections, mod, "pred", "D_complete", predict_fun = pfun)
datashield.symbols(connections)
```
#### Calculate l2-sensitivity
```{r}
# In order to securely calculate the ROC-GLM, we have to assess the
# l2-sensitivity to set the privacy parameters of differential
# privacy adequately:
l2s = dsL2Sens(connections, "D_complete", "pred")
l2s
# Due to the results presented in https://arxiv.org/abs/2203.10828, we set the privacy parameters to
# - epsilon = 0.2, delta = 0.1 if l2s <= 0.01
# - epsilon = 0.3, delta = 0.4 if 0.01 < l2s <= 0.03
# - epsilon = 0.5, delta = 0.3 if 0.03 < l2s <= 0.05
# - epsilon = 0.5, delta = 0.5 if 0.05 < l2s <= 0.07
# - epsilon = 0.5, delta = 0.5 if 0.07 < l2s BUT results may be not good!
```
#### Calculate ROC-GLM
```{r}
# The response must be encoded as integer/numeric vector:
ds.asInteger("D_complete$DIS_DIAB", "truth")
roc_glm = dsROCGLM(connections, truth_name = "truth", pred_name = "pred",
dat_name = "D_complete", seed_object = "pred")
roc_glm
plot(roc_glm)
```
#### Assess calibration
```{r}
dsBrierScore(connections, "truth", "pred")
### Calculate and plot calibration curve:
cc = dsCalibrationCurve(connections, "truth", "pred")
cc
plot(cc)
```
## Deploy information:
__Build by `r Sys.info()[["login"]]` (`r Sys.info()[["sysname"]]`) on `r as.character(Sys.time())`.__
This readme is built automatically after each push to the repository and weekly on Monday. The autobuilt is computed by installing the package on the DataSHIELD test server and is therefore a test if the functionality of the package works on DataSHIELD servers. Additionally, the functionality is tested using the [GH Actions](https://github.com/difuture-lmu/dsBinVal/actions/workflows/R-CMD-check.yaml) with [`tests/testthat/test_on_active_server.R`](https://github.com/difuture-lmu/dsBinVal/blob/main/tests/testthat/test_on_active_server.R). The system information of the local and remote machines are:
```{r, include=FALSE}
ri_l = sessionInfo()
ri_ds = datashield.aggregate(connections, quote(getDataSHIELDInfo()))
client_pkgs = c("DSI", "DSOpal", "dsBaseClient", "dsBinVal")
remote_pkgs = c("dsBase", "resourcer", "dsBinVal")
```
- Local machine:
- `R` version: `r ri_l$R.version$version.string`
- Version of DataSHELD client packages:
```{r, echo=FALSE}
dfv = installed.packages()[client_pkgs, ]
dfv = data.frame(Package = rownames(dfv), Version = unname(dfv[, "Version"]))
knitr::kable(dfv)
```
- Remote DataSHIELD machines:
- OPAL version of the test instance: `r opal_version`
- `R` version of `r names(ri_ds)[1]`: `r ri_ds[[1]]$session$R.version$version.string`
- `R` version of `r names(ri_ds)[2]`: `r ri_ds[[2]]$session$R.version$version.string`
- Version of server packages:
```{r, echo=FALSE}
dfv = do.call(cbind, lapply(names(ri_ds), function(nm) {
out = ri_ds[[nm]]$pcks[remote_pkgs, "Version", drop = FALSE]
colnames(out) = paste0(nm, ": ", colnames(out))
as.data.frame(out)
}))
dfv = cbind(Package = rownames(dfv), dfv)
rownames(dfv) = NULL
knitr::kable(dfv)
```
```{r, include=FALSE}
datashield.logout(connections)
```
Owner
- Name: difuture-lmu
- Login: difuture-lmu
- Kind: organization
- Repositories: 3
- Profile: https://github.com/difuture-lmu
JOSS Publication
dsBinVal: Conducting distributed ROC analysis using DataSHIELD
Published
February 21, 2023
Volume 8, Issue 82, Page 4545
Authors
Daniel Schalk
Department of Statistics, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany
Department of Statistics, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany
Verena Sophia Hoffmann
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany
Bernd Bischl
Department of Statistics, LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany
Department of Statistics, LMU Munich, Munich, Germany, Munich Center for Machine Learning, Munich, Germany
Ulrich Mansmann
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany
Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany, DIFUTURE (DataIntegration for Future Medicine, www.difuture.de), LMU Munich, Munich, Germany
Tags
DataSHIELD distributed computing distributed analysis privacy-preserving diagnostic tests prognostic model model validation ROC-GLM discrimination calibration Brier scoreGitHub Events
Total
Last Year
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| schalkdaniel | d****k@t****e | 189 |
| Daniel Schalk | s****2@g****m | 18 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| runner | r****r@M****l | 1 |
| and 104 more... | ||
Committer Domains (Top 20 + Academic)
t-online.de: 1
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 13
- Total pull requests: 6
- Average time to close issues: 27 days
- Average time to close pull requests: 22 days
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.92
- Average comments per pull request: 0.17
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AnthonyOfSeattle (7)
- schalkdaniel (6)
Pull Request Authors
- schalkdaniel (5)
- csoneson (1)
Top Labels
Issue Labels
enhancement (2)
important (1)
Pull Request Labels
Dependencies
DESCRIPTION
cran
- R >= 3.1.0 depends
- DSI * imports
- checkmate * imports
- digest * imports
- stringr * imports
- DSOpal * suggests
- ggplot2 * suggests
- opalr * suggests
- testthat * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v2 composite
- actions/upload-artifact main composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/draft-pdf.yml
actions
- actions/checkout v2 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite
.github/workflows/lint.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
.github/workflows/render-readme.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite