secuTrialR

secuTrialR: Seamless interaction with clinical trial databases in R - Published in JOSS (2020)

https://github.com/swissclinicaltrialorganisation/secutrialr

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
✓
Committers with academic emails
1 of 16 committers (6.3%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords from Contributors

precision confidence-intervals sample-size-calculation shiny-app

Scientific Fields

Sociology Social Sciences - 64% confidence

Last synced: 6 months ago · JSON representation

Repository

Handling of data from the clinical data management system secuTrial

Basic Info

Host: GitHub
Owner: SwissClinicalTrialOrganisation
License: other
Language: R
Default Branch: master
Homepage: https://swissclinicaltrialorganisation.github.io/secuTrialR/
Size: 8.78 MB

Statistics

Stars: 9
Watchers: 10
Forks: 12
Open Issues: 22
Releases: 10

Created almost 7 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License

README.Rmd

---
output:
  md_document:
    variant: markdown_github
  pdf_document: default
  html_document: default
---





# secuTrialR

`r badger::badge_custom("dev version", as.character(packageVersion("secuTrialR")), "blue", "https://github.com/SwissClinicalTrialOrganisation/secuTrialR")` [![](https://www.r-pkg.org/badges/version/secuTrialR?color=green)](https://cran.r-project.org/package=secuTrialR)    [![Actions Status](https://github.com/SwissClinicalTrialOrganisation/secuTrialR/workflows/R-CMD-check/badge.svg)](https://github.com/SwissClinicalTrialOrganisation/secuTrialR/actions) 

An R package to handle data from the clinical data management system (CDMS) [secuTrial](https://www.secutrial.com/en/).

## Installing from GitHub with devtools

Please note that `R versions >= 3.5` should be used to run `secuTrialR`.

```{r, eval = FALSE}
devtools::install_github("SwissClinicalTrialOrganisation/secuTrialR")
```

## Recommended export options

While the package strives to allow loading of as many types of secuTrial data exports
as possible, there are certain export options which are less likely to cause issues.
If possible it is suggested to export data which adheres to a suggested option set.
Thus, we suggest to work with exports which:
- are **zipped**
- are **English**
- have **reference values** stored **in a separate table**
- contain **Add-IDs**, **centre information**, **structure information**,  **form status**, **project setup**
- do **NOT** have the **meta data duplicated** into all tables
- are **UTF-8** encoded
- are **"CSV format"** or **"CSV format for MS Excel"**
- do **NOT** contain form **data of hidden fields**

If you use `read_secuTrial()` to read your export then it will inform you regarding deviations.

We also recommend using short names when exporting your data. Some users have reported issues importing data with long names, but the issues do not exist when using short names. That may (or may not) be related to upgarding SecuTrial.

## Basic usage

An extensive applied manual/vignette is available
[here](https://github.com/SwissClinicalTrialOrganisation/secuTrialR/blob/master/vignettes/secuTrialR-package-vignette.pdf)
and probably the best place to get started.

Load the package
```{r, echo = TRUE, warning=FALSE, message=FALSE}
library(secuTrialR)
```
Load a dataset 
```{r}
export_location <- system.file("extdata", "sT_exports", "lnames",
                               "s_export_CSV-xls_CTU05_long_ref_miss_en_utf8.zip",
                               package = "secuTrialR")
ctu05 <- read_secuTrial(export_location)
```
This will load all sheets from the export into an object of class `secuTrialdata`, which is basically a list. It will always contain `export_details` (which are parsed from the HTML ExportOptions file that secuTrial generates). By default, it will also contain all other files in the dataset. secuTrialR automatically strips file names of dates. The new file names can be seen via `ctu05$export_options$data_names`. The function also adds [labels to variables](#variable-labels) and data.frames, converts [categorical variables to `factor`s](#prepare-factors) and ensures that [dates are `Date`s and date-times are `POSIXct`](#prepare-dates).
`read_secuTrial` is a wrapper for the functions described below, so it is possible to achieve more flexibility by using the individual functions (if necessary).
Individual tables can be extracted from the `ctu05` object via `tab <- ctu05$tab`, where `tab` is the table of interest.

Wrapped functions


#### Load the dataset
```{r}
# prepare path to example export
export_location <- system.file("extdata", "sT_exports", "BMD",
                               "s_export_CSV-xls_BMD_short_en_utf8.zip",
                               package = "secuTrialR")
# load all export data
bmd_export <- read_secuTrial_raw(data_dir = export_location)

# load a second dataset
export_location <- system.file("extdata", "sT_exports", "lnames",
                               "s_export_CSV-xls_CTU05_long_ref_miss_en_utf8.zip",
                               package = "secuTrialR")
ctu05_raw <- read_secuTrial_raw(export_location)

# View names of the bmd_export object
names(bmd_export)
```

`read_secuTrial_raw` returns an object of class `secuTrialdata`, which is basically a list. It will always contain `export_details` (which are parsed from the HTML ExportOptions file that secuTrial generates). By default, it will also contain all other files in the dataset. secuTrialR automatically strips file names of dates. The new file names can be seen via `bmd_export$export_options$data_names`.


`bmd_export` is a list, with class `secuTrialdata`. To prevent it from printing all data to the console, a special print method returns some useful information about the objects within `bmd_export` instead. The information returned includes the original file name in the datafile, it's name in the `secuTrialdata` object, together with the number of rows and columns and a column indicating whether the object is metadata or not:
```{r}
bmd_export
```

Individual tables can be extracted from the `bmd_export` object via `tab <- bmd_export$tab`, where `tab` is the table of interest.



#### Variable labels
For creating tables, it is often useful to have access to variable labels. secuTrialR supports two main methods for handling them - a named list, or via variable attributes. The list approach works as follows.
```{r}
labs <- labels_secuTrial(bmd_export)
# query the list with the variable name of interest
labs[["age"]]

```

The attribute based approach adds labels as an attribute to a variable, which can then be accessed via `label(var)`.
```{r}
labelled <- label_secuTrial(bmd_export)
label(labelled$bmd$age)
```
Labels can be added to new variables or changed via 
```{r}
label(labelled$bmd$age) <- "Age (years)"
label(labelled$bmd$age)
```
Where units have been defined in the SecuTrial database, they can be accessed or changed analogously (here, age had no unit assigned, but we can add one).
```{r}
units(labelled$bmd$age)
units(labelled$bmd$age) <- "years"
units(labelled$bmd$age)
```
There is a drawback to the attribute based approach - labels will not be propagated if variables are derived and may be lost if variables are edited.

Currently, `label_secuTrial` should be used prior to `dates_secuTrial` or `factorize_secuTrial` so that labels and units are propagated to factor and date variables.


 
#### Prepare factors
It is often useful to have categorical variables as factors (R knows how to handle factors). secuTrialR can prepare factors easily.
```{r, error=TRUE}
factors <- factorize_secuTrial(ctu05_raw)
```
This functions loops through each table of the dataset, creating new factor variables where necessary. The new variables are the same as the original but with `.factor` appended (i.e. a new variable called `sex.factor` would be added to the relevant form).

```{r}
# original variable
str(factors$ctu05baseline$gender)
# factor
str(factors$ctu05baseline$gender.factor)
# cross tabulation
table(original = factors$ctu05baseline$gender, factor = factors$ctu05baseline$gender.factor)
```


#### Prepare dates
Date(time)s are a very common data type. They cannot be easily used though in their export format. This is also easily rectified in secuTrialR:


```{r}
dates <- dates_secuTrial(ctu05_raw)
```

Date variables are converted to `Date` class, and datetimes are converted to `POSIXct` class. Rather than overwriting the original variable, new variables are added with the new class. This is a safetly mechanism in case `NA`s are accidentally created.

```{r}
dates$ctu05baseline[c(1, 7), c("aspirin_start", "aspirin_start.date",
                              "hiv_date", "hiv_date.datetime")]
```

secuTrial exports containing date variables sometimes include incomplete dates. e.g. the day or the month may be missing.
During date conversion (i.e. `dates_secuTrial()`) `secuTrialR` currently creates `NA`s from such incomplete date entries.

Incomplete dates are not approximated to exact dates, since this can lead to false conclusions and biases.
Users are, however, informed about this behaviour with a `warning()`. Subsequent approximation of incomplete dates can be manually performed.

Recommended literature on incomplete dates/date imputation:\
[Dubois and Hebert 2001](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/F50311F9FFAB56176CDDC9FFBF66F655/S1041610202008025a.pdf/imputation_of_missing_dates_of_death_or_institutionalization_for_timetoevent_analyses_in_the_canadian_study_of_health_and_aging.pdf) \
[Bowman 2006](https://www.lexjansen.com/phuse/2006/po/PO11.pdf) \


#### Recommended approach if not using `read_secuTrial`

```{r, eval=FALSE}
f <- "PATH_TO_FILE"
d <- read_secuTrial_raw(f)
l <- label_secuTrial(d)
fa <- factorize_secuTrial(l)
dat <- dates_secuTrial(fa)

# or, if you like pipes
library(magrittr)
f <- "PATH_TO_FILE"
d <- read_secuTrial_raw(f)
dat <- d %>% 
  label_secuTrial() %>%
  factorize_secuTrial() %>%
  dates_secuTrial()
```



### Exploratory helpers
`secuTrialR` has a couple of functions to help get to grips with a secuTrial data export. They are intended to be used in an exploratory manner only.

#### as.data.frame
Working with a list can be tiresome so `secuTrialR` provides a `as.data.frame` method to save the `data.frames` in the list to an environment of your choice. 
As a demonstration, we'll create a new environment (`env`) and create the `data.frame`s in there. In practice, using `.GlobalEnv` would probably be more useful.

```{r}
env <- new.env()
ls(env)
names(ctu05)
as.data.frame(ctu05, envir = env)
ls(env)
```

There are also options for selecting specific forms (option `data.frames`), changing names based on a named vector (option `data.frames`) or regex (options `regex` and `rep`), and specifying whether metadata objects should be returned (option `meta`).


#### Recruitment over time
Recruitment is an important cornerstone for every clinical trial. `secuTrialR` allows for straigt forward visualizion of recuitment
over time for a given export file.

```{r, eval = TRUE}
# show plot
# note that there is no line for Universitätsspital 
# Basel because only one participant is registered for this centre
plot_recruitment(ctu05, cex = 1.5, rm_regex = "\\(.*\\)$")
# return the plot data
plot_recruitment(ctu05, return_data = TRUE)
```

Furthermore, recruitment per year and center can be returned.

```{r, eval = TRUE}
annual_recruitment(ctu05, rm_regex = "\\(.*\\)$")
```


#### Form status summary statistics
If you are not sure about how complete the data in you export is, it may be useful to get a quick overview of how well the forms
have been filled.

```{r, eval = TRUE}
count_summary <- form_status_summary(ctu05)
tail(count_summary)
```

As you can see, the majority of forms has been completeley filled. None of the forms were saved empty, with warnings or with errors.
For a more participant id centered statistic you can perform the following.

```{r, eval = FALSE}
form_status_counts(ctu05)
```

This will give you a count based overview per participant id and form. Please note that both `form_status_summary` 
and `form_status_counts` only work with saved forms since unsaved form data is not available in secuTrial exports.

#### Visit plan
secuTrialR can provide a depiction of the visit structure, although only where the visit plan is fixed:
```{r, eval = FALSE}
vs <- visit_structure(ctu05)
plot(vs)
```


#### Data dictionary
It can be difficult to find the variable you're looking for. secuTrialR provides the `dictionary_secuTrial` function to help:

```{r}
head(dictionary_secuTrial(ctu05))
```

#### Linking different forms

Linkages amongst forms can be explored with the `links_secuTrial` function. This relies on the `igraph` package to create a network. It is possible to interact with the network, e.g. move nodes around in order to read the labels better. The device ID is returned to the console, but can be ignored. Forms are plotted in deep yellow, variables in light blue.

```{r, eval=FALSE}
links_secuTrial(bmd_export)
```
![](inst/extdata/graphics/map.png)


#### Sampling random participants

During study monitoring it is common practice to check random participants from a study database. These
participants should be retrieved in a reproducible fashion. The below function allows this for a loaded 
secuTrial data export.

```{r}
# retrieve at least 25 percent of participants recorded after March 18th 2019 
# from the centres "Inselspital Bern" and "Charité Berlin"
return_random_participants(ctu05, percent = 0.25, seed = 1337, date = "2019-03-18",
                           centres = c("Inselspital Bern (RPACK)", "Charité Berlin (RPACK)"))
```

## For contributors
### Testing with devtools

```{r, eval = FALSE}
# run tests
devtools::test("secuTrialR")
# spell check -> will contain some technical terms beyond the below list which is fine
ignore_words <- c("AdminTool", "allforms", "casenodes", "CDMS", "codebook",
                  "codebooks", "datetime" ,"dir" ,"Hmisc" ,"igraph",
                  "labelled", "mnp", "savedforms", "secutrial", "secuTrial", 
                  "secuTrialdata", "tcltk", "tibble")
devtools::spell_check("secuTrialR", ignore = ignore_words)
```

### Linting with lintr


```{r, eval = FALSE}
# lint the package -> should be clean
library(lintr)
lint_package("secuTrialR", linters = with_defaults(camel_case_linter = NULL,
                                                   object_usage_linter = NULL,
                                                   line_length_linter(125)))
```

### Building the vignette
```{r, eval = FALSE}
library(rmarkdown)
render("vignettes/secuTrialR-package-vignette.Rmd",
       output_format=c("pdf_document"))
```

### Generating the README file

The README file is automatically generated on GitHub via a GitHub action.

### Handling dependencies

Dependencies to other R packages are to be declared in the `DESCRIPTION` file under `Imports:` and in
the specific `roxygen2` documentation of the functions relying on the dependency. It is suggested to
be as explicit as possible. i.e. Just import functions that are needed and not entire packages.

Example to import `str_match` `str_length` `str_wrap` from the `stringr` package (see [read_secuTrial_raw.R](R/read_secuTrial_raw.R)):
```{r, eval = FALSE}
#' @importFrom stringr str_match str_length str_wrap
```

### Preparing a release on CRAN

```bash
# build the package archive
R CMD build secuTrialR
# check the archive (should return "Status: OK", no WARNINGs, no NOTEs)
# in this example for version 0.9.0
R CMD check secuTrialR_0.9.0.tar.gz
```

### Versioning and releases

The version number is made up of three digits. The first digit
is reserved for major releases which may break backwards compatibility.
The second and third digits are used for medium and minor changes respectively.
Versions released on CRAN will be tagged and saved as releases on GitHub.
The version released on CRAN is regarded as the stable version while
the master branch on GitHub is regarded as the current development version.

#### Release checklist

Compile/Update:
* README.Rmd
* vignette
* pkgdown page
* NEWS.md

### Guidelines for contributors

Requests for new features and bug fixes should first be documented as an [Issue](https://github.com/SwissClinicalTrialOrganisation/secuTrialR/issues) on GitHub.
Subsequently, in order to contribute to this R package you should fork the main repository.
After you have made your changes please run the 
[tests](README.md#testing-with-devtools)
and 
[lint](README.md#linting-with-lintr) your code as 
indicated above. Please also increment the version number and recompile the `README.md` to increment the dev-version badge (requires installing the package after editing the `DESCRIPTION` file). If all tests pass and linting confirms that your 
coding style conforms you can send a pull request (PR). Changes should also be mentioned in the `NEWS` file.
The PR should have a description to help the reviewer understand what has been 
added/changed. New functionalities must be thoroughly documented, have examples 
and should be accompanied by at least one [test](tests/testthat/) to ensure long term 
robustness. The PR will only be reviewed if all travis checks are successful. 
The person sending the PR should not be the one merging it.

A depiction of the core functionalities for loading can be found [here](inst/extdata/graphics/secuTrialR.png).

### Citation  [![DOI](https://joss.theoj.org/papers/10.21105/joss.02816/status.svg)](https://doi.org/10.21105/joss.02816)

If you use and benefit from `secuTrialR` in your work please cite it as:  
Wright et al., (2020). secuTrialR: Seamless interaction with clinical trial databases in R.
Journal of Open Source Software, 5(55), 2816, https://doi.org/10.21105/joss.02816

Owner

Name: Swiss Clinical Trial Organisation
Login: SwissClinicalTrialOrganisation
Kind: organization
Email: info@scto.ch

Website: https://www.scto.ch/
Repositories: 6
Profile: https://github.com/SwissClinicalTrialOrganisation

The Swiss Clinical Trial Organisation (SCTO) is the central cooperation platform for patient-oriented clinical research in Switzerland.

JOSS Publication

secuTrialR: Seamless interaction with clinical trial databases in R

Published

November 20, 2020

DOI

10.21105/joss.02816

Volume 5, Issue 55, Page 2816

Authors

Patrick R. Wright

University Hospital Basel, Clinical Trial Unit, Basel, Switzerland, Data Management Platform of the Swiss Clinical Trial Organisation (SCTO)

Alan G. Haynes

CTU Bern, University of Bern, Statistics and Methodology Platform of the Swiss Clinical Trial Organisation (SCTO)

Milica Markovic

University Hospital Basel, Clinical Trial Unit, Basel, Switzerland, Data Management Platform of the Swiss Clinical Trial Organisation (SCTO)

Editor

Charlotte Soneson

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 1
Issue comment event: 6
Pull request event: 1

Last Year

Create event: 1
Release event: 1
Issues event: 1
Issue comment event: 6
Pull request event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 1,192
Total Committers: 16
Avg Commits per committer: 74.5
Development Distribution Score (DDS): 0.428

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Patrick R. Wright	p**t@u**h	682
aghaynes	a**s@g**m	360
markomi	m**c@h**m	58
Render action	r**n@g**m	29
markomi	m**c@u**h	27
DrEspresso	C**a@u**h	15
Gilles Dutilh	g**h@g**m	6
sgrieder	4****r	5
a-lenz	a**z@c**h	3
Daniel S. Katz	d**z@i**g	1
Anka	r**r@M**l	1
Anka	r**r@M**l	1
dutilhg	d**g@k**h	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1

Committer Domains (Top 20 + Academic)

usb.ch: 3 kolmogorow.usb.ch: 1 ieee.org: 1 ctu.unibe.ch: 1 github.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 64
Total pull requests: 51
Average time to close issues: 3 months
Average time to close pull requests: 7 days
Total issue authors: 12
Total pull request authors: 6
Average comments per issue: 2.05
Average comments per pull request: 1.94
Merged pull requests: 46
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

PatrickRWright (32)
aghaynes (12)
DrEspresso (8)
miljukov-o (2)
sachsmc (2)
nicoleb7 (2)
sgrieder (1)
gillesdutilh (1)
Inessakraft (1)
suvi-subra (1)
markomi (1)
pianeu (1)

Pull Request Authors

PatrickRWright (24)
aghaynes (20)
DrEspresso (4)
sgrieder (3)
markomi (3)
danielskatz (1)

Top Labels

Issue Labels

enhancement (23) bug (9) documentation (7) partly addressed (1) wontfix (1)

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- cran 620 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 0
(may contain duplicates)
Total versions: 12
Total maintainers: 1

cran.r-project.org: secuTrialR

Handling of Data from the Clinical Data Management System 'secuTrial'

Homepage: https://github.com/SwissClinicalTrialOrganisation/secuTrialR
Documentation: http://cran.r-project.org/web/packages/secuTrialR/secuTrialR.pdf
License: MIT + file LICENSE
Latest release: 1.3.3
published over 1 year ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 620 Last month

Rankings

Forks count: 6.0%

Stargazers count: 19.8%

Average: 24.9%

Dependent packages count: 29.8%

Downloads: 33.6%

Dependent repos count: 35.5%

Maintainers (1)

alan.haynes@unibe.ch

Last synced: 6 months ago

conda-forge.org: r-secutrialr

Homepage: https://github.com/SwissClinicalTrialOrganisation/secuTrialR
License: MIT
Latest release: 1.0.9
published almost 5 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 34.0%

Forks count: 39.0%

Average: 44.2%

Dependent packages count: 51.2%

Stargazers count: 52.6%

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.5 depends
dplyr * imports
haven >= 2.2.0 imports
lubridate * imports
magrittr * imports
purrr * imports
readr * imports
readxl * imports
rlang * imports
stringr * imports
tibble * imports
tidyr * imports
igraph * suggests
knitr * suggests
lintr * suggests
rmarkdown * suggests
tcltk * suggests
testthat * suggests
tufte * suggests

.github/workflows/R-CMD-full.yaml actions

actions/cache v1 composite
actions/checkout v2 composite
actions/upload-artifact master composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-tinytex v2 composite

.github/workflows/render-readme-pkgdown.yaml actions

actions/checkout master composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

secuTrialR

Science Score: 95.0%

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

secuTrialR: Seamless interaction with clinical trial databases in R

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: secuTrialR

Rankings

Maintainers (1)

conda-forge.org: r-secutrialr

Rankings

Dependencies