Estimating statistics from multi-state models using simulation with multistateutils

Estimating statistics from multi-state models using simulation with multistateutils - Published in JOSS (2018)

https://github.com/stulacy/multistateutils

Scientific Fields

Engineering Computer Science - 33% confidence

Last synced: 9 months ago · JSON representation

Repository

Utility funtions for parametric multi-state modelling in R

Basic Info

Host: GitHub
Owner: stulacy
License: gpl-3.0
Language: R
Default Branch: master
Homepage:
Size: 2.94 MB

Statistics

Stars: 7
Watchers: 1
Forks: 3
Open Issues: 3
Releases: 5

Created about 8 years ago · Last pushed over 5 years ago

Metadata Files

Readme Changelog Contributing License

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
set.seed(17)  # reproducible examples / runtimes
```

[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/multistateutils)](https://cran.r-project.org/package=multistateutils)
[![Travis-CI Build Status](https://travis-ci.org/stulacy/multistateutils.svg?branch=master)](https://travis-ci.org/stulacy/multistateutils)
[![Coverage Status](https://img.shields.io/codecov/c/github/stulacy/multistateutils/master.svg)](https://codecov.io/github/stulacy/multistateutils?branch=master)
[![DOI](https://zenodo.org/badge/126024410.svg)](https://zenodo.org/badge/latestdoi/126024410)
[![JOSS Status](http://joss.theoj.org/papers/67db63fbfec0d7b9aacc672bb37e2173/status.svg)](http://joss.theoj.org/papers/67db63fbfec0d7b9aacc672bb37e2173)


# multistateutils

`multistateutils` provides a number of useful functions for analyzing parametric multi-state models. In this way it does not aim to supplement the modelling strategies found in `mstate`, `msm`, or `flexsurv`, but rather provide tools for subsequent analysis. It is designed to be used with semi-Markov multi-state models of healthcare data, but can be used for any system that can be discretized into the states being entered at observed time-points, with parametric families providing appropriate fits for these transition rates. 

It currently provides several features:

  - Estimate transition probabilities
  - Estimate length of stay
  - Draw predicted pathways through the state space with dynamic prediction

Examples of these features are provided below in *Examples*.

## Installation

You can install the latest release version from CRAN with:

```{r installcran, eval=FALSE}
install.packages("multistateutils")
```

Or the development version can also be installed directly from GitHub.

```{r gh-installation, eval = FALSE}
install.packages("devtools")  # install devtools if it isn't already
devtools::install_github("stulacy/multistateutils", build_vignettes=TRUE)
```

## Examples

This section provides a brief overview of the features in `multistateutils`, please see the *Examples* vignette for a thorough description.

### Setup

In these examples we'll use the `ebmt3` data set provided in the `mstate` package. It describes recovery after bone marrow transplant. We'll model it with an illness-death model, with *transplant* as the starting state, platelet recovery *pr* as the illness state, and relapse-free survival *rfs* as the death state.

```{r}
library(mstate)
data(ebmt3)
tmat <- trans.illdeath()                             # Form transition matrix
long <- msprep(time=c(NA, 'prtime', 'rfstime'),      # Convert data to long
               status=c(NA, 'prstat', 'rfsstat'), 
               data=ebmt3, 
               trans=tmat, 
               keep=c('age', 'dissub'))
head(long)
```

`multistateutils` is designed for applied modelling with an overall aim of prediction, particularly in health-related areas. For this reason it is designed to work with parametric transition models built with `flexsurv`. The example below fits a Weibull model to each transition using the `age` and `dissub` covariates.

```{r}
library(flexsurv)
models <- lapply(1:3, function(i) {
    flexsurvreg(Surv(time, status) ~ age + dissub, data=long, dist='weibull')
})
```

## Estimating transition probabilities

Transition probabilities are defined as the probability of being in a state $j$ at a time $t$, given being in state $h$ at time $s$, as shown below where $X(t)$ gives the state an individual is in at $t$. This is all conditional on the individual parameterised by their covariates and history, which for this semi-Markov model only influences transition probabilities through state arrival times.

$$P_{h,j}(s, t) = \Pr(X(t) = j\ |\ X(s) = h)$$

We'll estimate the transition probabilities of an individual with the covariates `age=20-40` and `dissub=AML` at 1 year after transplant.

```{r}
newdata <- data.frame(age="20-40", dissub="AML")
```

```{r example1}
library(multistateutils)
predict_transitions(models, newdata, tmat, times=365)
```

This functionality is already provided in `flexsurv::pmatrix.simfs`, but it is limited to assessing only one individual at a time, with $s=0$

```{r example2}
pmatrix.simfs(models, tmat, newdata=newdata, t=365)
```

`predict_transitions` on the other hand can estimate transition probabilities for:

  - Multiple individuals
  - Multiple values of $t$ 
  - Multiple values of $s$ 
  - Any combination of the above, all from the same simulation
  - A mixture of distributions for the transition models

The ability to generate probabilities by varying $s$ and $t$ within a single simulation makes it very efficient for generating dynamic predictions.

```{r}
predict_transitions(models, newdata, tmat, times=c(1, 2)*365, 
                    start_times = c(0.25, 0.75) * 365)
```

## Length of stay

Similarly, the length of stay functionality provided by `flexsurv::totlos.simfs` has also been extended to allow for estimates at multiple time-points, states, and individuals to be calculated simultaneously. As shown below, the function parameters are very similar and the estimates are very close to those produced by `totlos.simf`.

```{r}
length_of_stay(models, 
               newdata=newdata,
               tmat, times=365.25*3,
               start=1)
```

```{r}
totlos.simfs(models, tmat, t=365.25*3, start=1, newdata=newdata)
```

Again, the advantage of this implementation is that estimates can be produced from multiple conditions without needing to rerun the simulation.

```{r}
length_of_stay(models, 
               newdata=data.frame(age=c(">40", ">40"),
                                  dissub=c('CML', 'AML')),
               tmat, times=c(1, 3, 5)*365.25)
```

## State flow diagram

Another feature in `multistateutils` is a visualization of a predicted pathway through the state transition model, calculated using *dynamic prediction* and provided in the function `plot_predicted_pathway`. It estimates state occupancy probabilities at discrete time-points and displays the flow between them in the manner of a Sankey diagram as shown below.

The output of this function is an interactive HTML widget that can be manipulated to layout the diagram to better suit your needs. Unfortunately widgets can't be embedded in GitHub READMEs so the image below is just a screenshot, but have a look at the version in the vignette for an working example.

$$P_{h,j}(s, t) = \Pr(X(t) = j\ |\ X(s) = h)$$

```{r, eval=FALSE}
time_points <- seq(0, 10, by=2) * 365.25
plot_predicted_pathway(models, tmat, newdata, time_points, 'healthy')
```

![](vignettes/state_pathway.png)

## Discrete event simulation

The underlying simulation engine can be easily adapted to run cohort-wide simulation, where the output statistics of interest are global rather individual level measures.
This is useful for estimating healthcare related values, such as the average time spent receiving treatment for a given disease population.

These discrete event simulations are run with the `cohort_simulation` function as below.
They simply require a list of models as specified above, a cohort population, and the transition matrix.
It outputs a long table where each row is an observed state entry for a given individual. 

```{r}
sim <- cohort_simulation(models, ebmt3, tmat)
head(sim)
```

By default the simulation runs until every individual reaches a sink state, but a longitudinal specification can be provided, such that the individuals enter the simulation at set incident times and the simulation terminates after a predefined period.
This functionality is useful when estimating statistics over a finite time-period, such as the expected number of deaths in a 5-year range, but it requires a model of the incidence process (i.e. how often a new patient is diagnosed).
See the *Examples* vignette for further details.

## `msprep2`

A large part of the work involved with multi-state modelling is basic munging of the data into a format suitable for modelling.
The `msprep` function from the `mstate` package helps a lot here by converting a wide data frame where each row corresponds to a user with state entry times given in column, into a long format where each row represents a possible state transition.

It does, however, have some slight limitations:

  - Having to form a wide table in the first place with an entry time and status indicator for each state can be a bit time consuming and isn't necessarily a natural way of storing such data
  - This wide format only allows an individual to enter a state once
  - For censored state transitions it can be awkward having to replicate the censoring time for each non-visited state

For these reasons I've created a modified version of `msprep`, imaginatively called `msprep2`, that performs the same role but accepts the input data as a tidy table of state entry times rather than a wide table at the individual level.

In this format, each row in the input data frame corresponds to a known state entry time, and so non-visited states are simply omitted.
The input data frame only needs 3 columns: an individual id, a state id, and the time of entry.
Covariates and censoring information are provided by separate data frames that link back on the id, providing a cleaner interface and one that works well with data that is stored in relational databases. 

The example below shows the data preparation for 2 individuals:

  1. Enters state 2 at $t=23$ then no more transitions until last follow-up at $t=744$
  2. Enters state 2 at $t=35$ then enters the absorptive state 3 at $t=360$

```{r}
entry <- data.frame(id=c(1, 2, 2),
                    state=c(2, 2, 3),
                    time=c(23, 35, 360))
cens <- data.frame(id=1, censor_time=744)
covars <- data.frame(id=1:2, age=c('>40', '20-40'), dissub=c('CML', 'AML'))
msprep2(entry, tmat, censors = cens, covars = covars)
```

By specifying state entry as a long table there is now no limit to how many times a state can be entered by an individual. 
Let's demonstrate this by extending the illness-death model to allow a patient to recover (i.e. transition from illness back to healthy).

```{r}
states <- c('healthy', 'illness', 'death')
tmat2 <- matrix(c(NA, 3, NA, 1, NA, NA, 2, 4, NA), nrow=3, ncol=3, 
                dimnames=list(states, states))
tmat2
```

I'll create a dummy dataset with one individual moving from healthy->illness->death at times 6 and 11, while patient 2 goes from healthy->illness, then is cured and goes back to healthy, before moving back to illness and finally dying. 

```{r}
multistate_entry <- data.frame(id=c(rep(1, 2),
                                    rep(2, 4)),
                               state=c('illness', 'death',
                                       'illness', 'healthy', 'illness', 'death'),
                               time=c(6, 11,
                                      7, 12, 17, 22))
multistate_entry
```

As seen below, `msprep2` has no problem converting this into a list of possible transitions.
Note that we don't need to pass in anything to `censors` because we have complete follow-up on both patients.

```{r}
msprep2(multistate_entry, tmat2)
```

## Upcoming features

There is currently a web-app (not publicly accessible but the [source code is on Github](https://github.com/stulacy/RDES-Shiny)) that provides a graphical interface for the entire multi-state modelling process and simulation process for a cohort simulation. I'd like to tidy this up and get it functioning with this new version of `multistateutils` and also provide an interface for individual level simulations, such as estimating transition probabilities.

Owner

Name: Stuart Lacy
Login: stulacy
Kind: user
Location: York, UK

Website: http://www.stuartlacy.co.uk
Repositories: 4
Profile: https://github.com/stulacy

Research Software Engineer @wacl-york and R enthusiast.

JOSS Publication

Estimating statistics from multi-state models using simulation with multistateutils

Published

October 08, 2018

DOI

10.21105/joss.00893

Volume 3, Issue 30, Page 893

Authors

Stuart Lacy

Department of Health Sciences, University of York, UK

Editor

Arfon Smith

GitHub Events

Total

Last Year

Committers

Last synced: 10 months ago

All Time

Total Commits: 187
Total Committers: 3
Avg Commits per committer: 62.333
Development Distribution Score (DDS): 0.075

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
stulacy	s**y@g**m	173
Stuart Lacy	s**y@e**k	11
Raoul W	r****w	3

Committer Domains (Top 20 + Academic)

egu.york.ac.uk: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 9
Total pull requests: 2
Average time to close issues: about 1 month
Average time to close pull requests: 18 days
Total issue authors: 5
Total pull request authors: 1
Average comments per issue: 2.67
Average comments per pull request: 1.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

rrrlw (3)
sdaza (2)
jenniferthompson (2)
erscott (1)
ccuadradon (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Estimating statistics from multi-state models using simulation with multistateutils

Science Score: 95.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

Estimating statistics from multi-state models using simulation with multistateutils

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels