Bernadette

Bernadette: Bayesian Inference and Model Selection for Stochastic Epidemics in R - Published in JOSS (2023)

https://github.com/bernadette-eu/bernadette

Scientific Fields

Physics Physical Sciences - 40% confidence

Artificial Intelligence and Machine Learning Computer Science - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Bayesian analysis of infectious disease transmission dynamics, with a particular focus on Coronavirus Disease 2019 (COVID-19)

Basic Info

Host: GitHub
Owner: bernadette-eu
License: gpl-3.0
Language: R
Default Branch: master
Homepage:
Size: 153 MB

Statistics

Stars: 2
Watchers: 1
Forks: 4
Open Issues: 1
Releases: 3

Created over 4 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: 
  github_document:
    pandoc_args: "--webtex"
---



# Bernadette 


[![R-CMD-check](https://github.com/bernadette-eu/Bernadette/workflows/R-CMD-check/badge.svg)](https://github.com/bernadette-eu/Bernadette/actions)
[![CRAN status](https://www.r-pkg.org/badges/version/Bernadette)](https://cran.r-project.org/package=Bernadette)
[![r-universe](https://bernadette-eu.r-universe.dev/badges/Bernadette)](https://Bernadette.r-universe.dev/ui/#package:Bernadette)
[![Last-commit](https://img.shields.io/github/last-commit/bernadette-eu/Bernadette)](https://github.com/bernadette-eu/Bernadette/commits/main)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![downloads](https://cranlogs.r-pkg.org/badges/Bernadette)](https://shinyus.ipub.com/cranview/)
[![total](https://cranlogs.r-pkg.org/badges/grand-total/Bernadette)](https://shinyus.ipub.com/cranview/)
[![status](https://joss.theoj.org/papers/8ee5ffed37ccbd0a8de519553da6ceea/status.svg)](https://joss.theoj.org/papers/8ee5ffed37ccbd0a8de519553da6ceea)
[![DOI](https://zenodo.org/badge/377866248.svg)](https://zenodo.org/badge/latestdoi/377866248)



## Contribution

This R package is provided for use under the [GPL-3.0](https://www.gnu.org/licenses/gpl-3.0.en.html) License by the author.
Please use the [issue tracker](https://github.com/bernadette-eu/Bernadette/issues) for bug reports and feature requests.
All contributions are welcome, in particular those that improve the approach or the robustness of the code base. We also welcome additions and extensions to the underlying model either in the form of options or improvements.

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](https://github.com/bernadette-eu/Bernadette/blob/master/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.

## Overview

The **Bernadette** ("Bayesian inference and model selection for stochastic epidemics") R package provides a framework for Bayesian analysis of infectious disease transmission dynamics via diffusion driven multi-type epidemic models with time-varying coefficients, with a particular focus on Coronavirus Disease 2019 (COVID-19). 
For models fit using Markov chain Monte Carlo, it allows for computation of approximate leave-one-out cross-validation (LOO, LOOIC) or the Widely Applicable Information Criterion (WAIC) using the **loo** package for model checking and comparison.

**Bernadette** complements packages for the modeling of disease transmission dynamics, see the R Epidemics Consortium website (https://www.repidemicsconsortium.org/), by implementing the Bayesian hierarchical modeling approach described in [1]. 
It links the observed age-stratified mortality counts for the population of a given country over time to latent infections via an over-dispersed count model.
The change in infections over time is governed by a deterministic multi-type compartmental model which is driven by potentially non-scalar diffusion processes to adequately capture the temporal evolution of the age-stratified transmission rates. 
**Bernadette** relaxes the assumption of a homogeneous population, incorporates age structure and accounts for the presence of social structures via publicly available contact matrices. 
This allows for further evidence synthesis utilizing information from contact surveys and for sharing statistical strength across age groups and time.
Further, the Bayesian evidence synthesis approach implemented in the **Bernadette** package enables learning the age-stratified virus transmission rates from one group to another, with a particular focus on the transmission rate between and onto vulnerable groups which could support public health authorities in devising interventions targeting these groups.
The effective reproduction number is estimated from the daily age-stratified mortality counts, accounting for variations in transmissibility that are not obvious from reported infection counts. 
While estimation of the uncertainty over the effective reproduction number is itself a challenging problem, it is propagated naturally via Markov chain Monte Carlo.

Even though **Bernadette** is motivated by the analysis of healthcare surveillance data related to COVID-19, it provides a template for implementation to a broader range of infectious disease epidemics and outbreaks.

## Website

The website for the BERNADETTE Marie Sklodowska-Curie Action (MSCA) is available at: https://bernadette-eu.github.io/.

## Installation

Installation of the package requires you to have a working C++ toolchain. 
To ensure that this is working, please first install **rstan** by following these [installation instructions](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started) from the Rstan wiki.

The following must return ```TRUE``` before continuing:

```
if (!require("pkgbuild")) {
  install.packages("pkgbuild")
}
pkgbuild::has_build_tools(debug = TRUE)
```

Having installed **rstan**, you can then install the latest development version of **Bernadette** (requires the [devtools](https://github.com/r-lib/devtools) package) from [GitHub](https://github.com/) with:

```{r, eval=F, echo=T}
if (!require("devtools")) {
  install.packages("devtools")
}
devtools::install_github("bernadette-eu/Bernadette", dependencies = TRUE, build_vignettes = FALSE)
``` 

or the latest version on CRAN
```{r, eval=F, echo=T}
install.packages("Bernadette")
``` 

## Load any required libraries
```{r, eval = T, results='hide', message=FALSE, warning=FALSE}
lib <- c("stats",
         "ggplot2",
         "rstan",
         "bayesplot",
         "Bernadette",
         "loo")
lapply(lib, require, character.only = TRUE)
``` 

## Example - COVID-19 in Greece

The time series of new daily age-specific mortality and incidence counts are ordered by epidemiological date. The time horizon in the example datasets for Greece spans from 2020-08-31 to 2021-03-28 (210 days). 

```{r, results='hide', message=FALSE, warning=FALSE}
Sys.setlocale("LC_ALL", "English")
data(age_specific_mortality_counts)
data(age_specific_cusum_infection_counts)
```

```{r mortality_time_series, echo = FALSE, fig.height=5, fig.width=7}
sub <- age_specific_mortality_counts[c(3, 6:ncol(age_specific_mortality_counts))]
mortality_df_long <- stats::reshape(sub,
                                    direction = "long",
                                    varying   = list(names(sub)[2:ncol(sub)]),
                                    v.names   = "New_Deaths",
                                    idvar     = c("Date"),
                                    timevar   = "Group",
                                    times     = colnames(sub)[-1])

ggplot(mortality_df_long,
       aes(Date       = Date,
           New_Deaths = New_Deaths)) +
  geom_point(aes(x   = Date,
                 y   = New_Deaths,
                 fill= "Reported deaths")) +
  facet_wrap(. ~ Group,
              scales = "free_y", 
              ncol   = 2,
              strip.position = "top")+
  scale_fill_manual(values = c('Reported deaths' = 'black'), guide = "none") +
  labs(x = "Epidemiological Date",
       y = "Number of new deaths") +
  scale_x_date(date_breaks = "1 month", date_labels =  "%b %Y") +
  theme_bw() +
  theme(panel.spacing    = unit(0.1,"cm"),
        axis.text.x      = element_text(angle = 45, hjust = 1),
        legend.position  = "bottom",
        legend.title     = element_blank() )
``` 

Let's import the rest of the datasets required for model fitting. First, import and plot the age distribution for Greece in 2020:

```{r}
age_distr <- age_distribution(country = "Greece", year = 2020)
plot_age_distribution(age_distr)
```

Information is available for sixteen 5-year age groups, with the last one being "75+". We
shall consider three age groups, \{0-39, 40-64, 65+\}, by providing the following lookup table:

```{r}
lookup_table <- data.frame(Initial = age_distr$AgeGrp,
                           Mapping = c(rep("0-39",  8),
                                       rep("40-64", 5),
                                       rep("65+"  , 3)))
aggr_age <- aggregate_age_distribution(age_distr, lookup_table)
plot_age_distribution(aggr_age)
```

Next, import the projected contact matrix for Greece:

```{r, fig.height=5, fig.width=12}
conmat <- contact_matrix(country = "GRC")
plot_contact_matrix(conmat)

```

Aggregate the contact matrix to the age groups \{0-39, 40-64, 65+\}:

```{r}
aggr_cm <- aggregate_contact_matrix(conmat, lookup_table, aggr_age)
plot_contact_matrix(aggr_cm)
```

Obtain estimates of the age-specific infection-fatality-ratio:

```{r}
ifr_mapping <- c(rep("0-39", 8), rep("40-64", 5), rep("65+", 3))
aggr_age_ifr <- aggregate_ifr_react(age_distr, ifr_mapping, age_specific_infection_counts)
```

Provide the infection-to-death distribution with a mean of 24.19 days:

```{r}
ditd <- itd_distribution(ts_length  = nrow(age_specific_mortality_counts),
                         gamma_mean = 24.19231,
                         gamma_cv   = 0.3987261)
```

## Modeling framework

The aforementioned data streams and expert knowledge are integrated into a coherent modeling framework via a Bayesian evidence synthesis approach. Τhe modeling process is separated into a latent epidemic process and an observation process in an effort to reduce sensitivity to observation noise and to allow for more flexibility in modeling different forms of data.

### Diffusion-driven multi-type transmission process

The transmission of COVID-19 is modeled through an age-stratified deterministic Susceptible-Exposed-Infectious-Removed (SEIR) compartmental model with Erlang-distributed latent and infectious periods. In particular, we introduce Erlang-distributed stage durations in the Exposed and Infected compartments and relax the mathematically convenient but unrealistic assumption of exponential stage durations by assuming that each of the Exposed and Infected compartments are defined by two stages, with the same rate of loss of latency ($\tau$) and infectiousness ($\gamma$) in both stages.
Removed individuals are assumed to be immune to reinfection for at least the duration of the study period.

The population is stratified into $\alpha \in \{1,\ldots,A\}$ age groups and the total size of the age group is denoted by $N_{\alpha} = S_t^{\alpha} + E_{1,t}^{\alpha} + E_{2,t}^{\alpha} + I_{1,t}^{\alpha} + I_{2,t}^{\alpha} + R_t^{\alpha}$, where $S_t^{\alpha}$ represents the number of susceptible, $E_t^{\alpha} = \sum_{j=1}^{2}E_{j,t}^{\alpha}$ represents the number of exposed but not yet infectious, $I_t^{\alpha} = \sum_{j=1}^{2}I_{j,t}^{\alpha}$ is the number of infected and $R_t^{\alpha}$ is the number of removed individuals at time $t$ at age group $\alpha$. 
The number of individuals in each compartment is scaled by the total population $N = \sum_{\alpha = 1}^{A}N_{\alpha}$, so that the sum of all compartments equals to one.
The latent epidemic process is expressed by the following non-linear system of ordinary differential equations (ODEs)
$$
\begin{cases}
\frac{dS^{\alpha}_{t}}{dt}   & = -\lambda_{\alpha}(t) S^{\alpha}_{t}
\\
\frac{dE^{\alpha}_{1,t}}{dt} & = \lambda_{\alpha}(t) S^{\alpha}_{t} - \tau E^{\alpha}_{1,t}
\\
\frac{dE^{\alpha}_{2,t}}{dt} & = \tau \left( E^{\alpha}_{1,t} - E^{\alpha}_{2,t}\right)
\\
\frac{dI^{\alpha}_{1,t}}{dt} & = \tau E^{\alpha}_{2,t} - \gamma I^{\alpha}_{1,t}
\\
\frac{dI^{\alpha}_{2,t}}{dt} & = \gamma \left( I^{\alpha}_{1,t} - I^{\alpha}_{2,t}\right)
\\
\frac{dR^{\alpha}_{t}}{dt}   & = \gamma I^{\alpha}_{2,t},
\end{cases}
$$
where the mean latent and infectious periods are $d_E = \frac{2}{\tau}$, $d_I = \frac{2}{\gamma}$, respectively. The number of new infections in age group $\alpha$ at day $t$ is 

$$
\Delta^{\text{infec}}_{t, \alpha} = \int_{t-1}^{t} \tau E^{\alpha}_{2,s} ds.
$$

The time-dependent force of infection $\lambda_{\alpha}(t)$ for age group $\alpha \in \{1,\ldots,A\}$ is expressed as
$$
\lambda_{\alpha}(t) = \sum_{\alpha'=1}^{A}\left[ m_{\alpha,\alpha'}(t) \frac{\left(I^{\alpha'}_{1,t} + I^{\alpha'}_{2,t}\right)}{\mathbb{N}_{\alpha'}}\right],
$$

which is a function of the proportion of infectious individuals in each age group $\alpha' \in \{1,\ldots,A\}$, via the compartments $I^{\alpha'}_{1,t}$, $I^{\alpha'}_{2,t}$ divided by the total size of the age group $\mathbb{N}_{\alpha'}$, and the time-varying person-to-person transmission rate from group $\alpha$ to group $\alpha'$, $m_{\alpha,\alpha'}(t)$.
We parameterize the transmission rate between different age groups $(\alpha,\alpha') \in \{1,\ldots,A\}^2$ by

\(
m_{\alpha,\alpha'}(t) = 
\beta^{\alpha\alpha'}_{t} 
\cdot
C_{\alpha,\alpha'},
\)

breaking down the transmission rate matrix into its biological and social components: the social component is represented by the average number of contacts between individuals of age group $\alpha$ and age group $\alpha'$ via the contact matrix element $C_{\alpha,\alpha'}$; 
$\beta^{\alpha\alpha'}_{t}$ is the time-varying transmissibility of the virus, the probability that a contact between an infectious person in age group $\alpha$ and a susceptible person in age group $\alpha'$ leads to transmission at time $t$.

The formulation below may be viewed as a stochastic extension to the deterministic multi-type SEIR model, using diffusion processes for the coefficients $\beta^{\alpha\alpha'}_{t}$, driven by independent Brownian motions

$$
\begin{cases}
\beta^{\alpha\alpha'}_{t} & = \exp(x^{\alpha\alpha'}_{t})
\\
x^{\alpha\alpha'}_{t} \mid x^{\alpha\alpha'}_{t - 1}, \sigma^{\alpha\alpha'}_x & \sim \operatorname{N}(x^{\alpha\alpha'}_{t - 1}, (\sigma^{\alpha\alpha'})^2_x)
\\
dx^{\alpha\alpha'}_{t} & = \sigma^{\alpha\alpha'}_{x}dW^{\alpha\alpha'}_{t}
\\
dW^{\alpha\alpha'}_{t} & \sim \operatorname{N}(0,d_{t}),
\end{cases}
$$

with volatilities $\sigma^{\alpha\alpha'}_x$, corresponding to the case of little information on the shape of $\beta^{\alpha\alpha'}_{t}$.
The volatility $\sigma^{\alpha\alpha'}_x$ plays the role of the regularizing factor: higher values of $\sigma^{\alpha\alpha'}_x$ lead to greater changes in $\beta^{\alpha\alpha'}_{t}$. 
The exponential transformation avoids negative values which have no biological meaning. 

A major advantage of considering a diffusion process for modeling $\beta^{\alpha\alpha'}_{t}$ is its ability to capture and quantify the randomness of the underlying transmission dynamics, which is particularly useful when the dynamics are not completely understood. 
The diffusion process accounts for fluctuations in transmission that are influenced by non-modeled phenomena, such as new variants, mask-wearing propensity, etc. 
The diffusion process also allows for capturing the effect of unknown extrinsic factors on the age-stratified force of infection,
for monitoring of the temporal evolution of the age-specific transmission rate without the implicit inclusion of external variables and for tackling non-stationarity in the data.

We propose a diffusion-driven multi-type latent transmission model which assigns independent Brownian motions to $\log(\beta^{11}_{t}), \log(\beta^{22}_{t}), \ldots, \log(\beta^{AA}_{t})$ with respective age-stratified volatility parameters $\sigma_{x,\alpha}, \alpha \in \{1,\ldots,A\}$, for reasons of parsimony and interpretability. 
The contact matrix is scaled by the age-stratified transmissibility in order to obtain the transmission rate matrix process
$$
m_{\alpha,\alpha'}(t) = \beta^{\alpha\alpha'}_{t}
\cdot
C_{\alpha,\alpha'}
\equiv 
\beta^{\alpha\alpha}_{t}
\cdot
C_{\alpha,\alpha'}
$$

under the assumption $\beta^{\alpha\alpha'}_t \equiv \beta^{\alpha\alpha}_t, \alpha \neq \alpha'$. 

### Observation process

Denote the number of observed deaths on day $t = 1, \ldots, T$ in age group $\alpha \in \{1,\ldots,A\}$ by $y_{t,\alpha}$. 
A given infection may lead to observation events (i.e deaths) in the future. 
A link between $y_{t,\alpha}$ and the expected number of new age-stratified infections is established via the function

$$
d_{t,\alpha} = \mathbb{E}[y_{t,\alpha}] = \widehat{\text{IFR}}_{\alpha} \times \sum_{s = 1}^{t-1}h_{t-s} \Delta^{\text{infec}}_{s, \alpha}
$$

on the new expected age-stratified mortality counts, $d_{t,\alpha}$, the estimated age-stratified infection fatality rate, $\widehat{\text{IFR}}_{\alpha}$, and the infection-to-death distribution $h$, where $h_s$ gives the probability the death occurs on the $s^{th}$ day after infection, is assumed to be gamma distributed with mean 24.2 days and coefficient of variation 0.39, that is

$$
h \sim \operatorname{Gamma}(6.29, 0.26).
$$
We allow for over-dispersion in the observation processes to account for noise in the underlying data streams, for example due to day-of-week effects on data collection, and link $d_{t,\alpha}$ to $y_{t,\alpha}$ through an over-dispersed count model

$$
y_{t,\alpha}\mid \theta \sim \operatorname{NegBin}\left(d_{t,\alpha}, \xi_{t,\alpha}\right),
$$
where $\xi_{t,\alpha} = \frac{d_{t,\alpha}}{\phi}$, such that $\mathbb{V}[y_{t,\alpha}] = d_{t,\alpha}(1+\phi)$. 
The log-likelihood of the observed deaths is given by

$$
\ell^{Deaths}(y\mid \theta) = \sum_{t=1}^{T}\sum_{\alpha=1}^{A}\text{logNegBin}\left(y_{t,\alpha}\mid d_{t,\alpha}, \xi_{t,\alpha}\right),
$$
where $y \in \mathbb{R}^{T \times A}_{0,+}$ are the surveillance data on deaths for all time-points.

### Prior specification

The unknown quantities that need to be inferred are $\theta= (x^{\alpha,\alpha}_{0}, x^{\alpha,\alpha}_{1}, C_{\alpha,\alpha'}, \sigma_{x,\alpha}, \rho, \phi)$ for $(\alpha,\alpha') \in \{1,\ldots,A\}^2$. 
$\rho$ expresses the initial proportion of exposed (at time $t_0$). 
$L^{synth}$ is the Cholesky factor of the observed contact matrix and $L$ is the Cholesky factor of the random contact matrix $C$, see [1].
The prior distributions for the model parameters that are considered in this example are:

$$
\begin{aligned}
\rho& \sim \operatorname{Beta}(\mathbb{E}[\rho] = 0.1, \mathbb{V}[\rho] = 0.05\cdot\mathbb{E}[\rho])
\\
\tilde{L}_{i,j} & \sim \operatorname{Normal}(0,1),~(i,j) \in \{1, \ldots, A\}^2, i \leq j
\\
L_{i,j} & = L^{synth}_{i,j} + (0.05\cdot L^{synth}_{i,j})\cdot \tilde{L}_{i,j},~(i,j) \in \{1, \ldots, A\}^2, i \leq j
\\
x^{\alpha,\alpha}_{0} & \sim \operatorname{N}(0,1)
\\
x^{\alpha,\alpha}_{1} & \sim \operatorname{N}(0,1)
\\
\sigma_{x,\alpha} & \sim \operatorname{N}_+(0,1)
\\
\phi & \sim \operatorname{Exp}(0.2).
\end{aligned}
$$

## Model fitting

Find the number of cores in your machine and indicate how many will be used for parallel processing:

```{r, eval=F, echo=T}
parallel::detectCores()
rstan_options(auto_write = TRUE)

# Here we sample from six Markov chains in parallel:
chains <- 6

options(mc.cores = chains)
```

The current functionality of the package allows the end-user to define the number of changes of the effective contact rate during the course of 7 days.
Modeling the time evolution of the effective contact rate at a more granular time-scale (i.e. at the time-points
of the observations) would require setting ```ecr_changes = 1``` in the main function ```stan_igbm```. In an effort to reduce the cost of the computational effort, we allow for changes every 7 days, setting ```ecr_changes = 7```.

The following is an optional step. We perform maximization of the joint posterior distribution. The resulting point estimates are, then, used to define the initial points that will be fed to the MCMC sampler.

```{r, eval=F, echo=T}
igbm_fit_init <- stan_igbm(y_data                      = age_specific_mortality_counts,
                           contact_matrix              = aggr_cm,
                           age_distribution_population = aggr_age,
                           age_specific_ifr            = aggr_age_ifr[[3]],
                           itd_distr                   = ditd,
                           incubation_period           = 3,
                           infectious_period           = 4,
                           likelihood_variance_type    = "linear",
                           ecr_changes                 = 7,
                           prior_scale_x0              = 1,
                           prior_scale_x1              = 1,
                           prior_scale_contactmatrix   = 0.05,
                           pi_perc                     = 0.1,
                           prior_volatility            = normal(location = 0, scale = 1),
                           prior_nb_dispersion         = exponential(rate = 1/5),
                           algorithm_inference         = "optimizing",
                           seed                        = 1,
                           hessian                     = TRUE)
```

```{r, eval=F, echo=T}
sampler_init <- function(){
  list(x0          = igbm_fit_init$par[names(igbm_fit_init$par) %in% "x0"],
       x_init      = igbm_fit_init$par[grepl("x_init",   names(igbm_fit_init$par), fixed = TRUE)],
       x_noise     = igbm_fit_init$par[grepl("x_noise[", names(igbm_fit_init$par), fixed = TRUE)],
       L_raw       = igbm_fit_init$par[grepl("L_raw[",   names(igbm_fit_init$par), fixed = TRUE)],
       pi          = igbm_fit_init$par[names(igbm_fit_init$par) %in% "pi"],
       volatilities= igbm_fit_init$par[grepl("volatilities",  names(igbm_fit_init$par), fixed = TRUE)],
       phiD        = igbm_fit_init$par[names(igbm_fit_init$par) %in% "phiD"]
  )
}
```

We sample from the posterior distribution of the parameters using the Hamiltonian Monte Carlo algorithm implemented in Stan - Note that a longer time horizon and/or a greater number of age groups in ```y_data``` will lead to increased CPU time:

```{r, eval=F, echo=T}
igbm_fit <- stan_igbm(y_data                      = age_specific_mortality_counts,
                      contact_matrix              = aggr_cm,
                      age_distribution_population = aggr_age,
                      age_specific_ifr            = aggr_age_ifr[[3]],
                      itd_distr                   = ditd,
                      incubation_period           = 3,
                      infectious_period           = 4,
                      likelihood_variance_type    = "linear",
                      ecr_changes                 = 7,
                      prior_scale_x0              = 1,
                      prior_scale_x1              = 1,
                      prior_scale_contactmatrix   = 0.05,
                      pi_perc                     = 0.1,
                      prior_volatility            = normal(location = 0, scale = 1),
                      prior_nb_dispersion         = exponential(rate = 1/5),
                      algorithm_inference         = "sampling",
                      nBurn                       = 500,
                      nPost                       = 500,
                      nThin                       = 1,
                      chains                      = chains,
                      adapt_delta                 = 0.80,
                      max_treedepth               = 19,
                      seed                        = 1,
                      init                        = sampler_init)
``` 


```{r, eval=T, echo=T}
# Import the saved object:
giturl <- "https://github.com/bernadette-eu/Bernadette/blob/master/Example_files/Experiment_v112.RData?raw=true"
load(url(giturl))
```

The implementation of MCMC under this setting took about 5.4 hours:

```{r, eval=T, echo=T}
time_run <- get_elapsed_time(igbm_fit)
apply(time_run, 1, sum)
```

## Summarise the distributions of estimated parameters and derived quantities using the posterior draws. 

```{r, eval=T, echo=T}
print_summary <- summary(object = igbm_fit,
                         y_data = age_specific_mortality_counts)
round(print_summary$summary, 3)
``` 

Standard functions from the **rstan** package can be used for further assessment of the model:
  
```{r, eval=T, echo=T}
# Check for divergent transitions:
rstan::check_divergences(igbm_fit)
``` 

```{r, eval=T, echo=T}
# Prints the number (and percentage) of iterations that saturated the max treedepth:
rstan::check_treedepth(igbm_fit)
``` 

## MCMC diagnostic plots

Example - Pairs plots between some parameters:
  
```{r, eval=T, echo=T}
volatilities_names <- paste0("volatilities","[", 1:ncol(age_specific_mortality_counts[,-c(1:5)]),"]")
posterior_1        <- as.array(igbm_fit)
``` 

```{r, eval=T, echo=T}
bayesplot::mcmc_pairs(posterior_1,
                      pars          = volatilities_names,
                      off_diag_args = list(size = 1.5))
``` 

## Plot the posterior estimates of epidemiological quantities

**Visualize the posterior distribution of the random contact matrix:**
  
```{r, eval=T, echo=T}
plot_posterior_cm(igbm_fit, y_data = age_specific_mortality_counts)
```

**Visualize the posterior distribution of the age-specific transmission rate:**
  
```{r, eval=T, echo=T}
post_transmrate_summary <- posterior_transmrate(object = igbm_fit,
                                                y_data = age_specific_mortality_counts)
plot_posterior_transmrate(post_transmrate_summary)
```

The piecewise-constant behavior in the posterior estimates of the age-specific transmission rate is caused by the assumption that the effective contact rate is allowed to change every 7 days. The respective posterior trajectories will be smoothed when more frequent changes are allowed. 

**Visualize the posterior distribution of the effective reproduction number:**
  
```{r, eval=T, echo=T}
post_rt_summary <- posterior_rt(object                      = igbm_fit,
                                y_data                      = age_specific_mortality_counts,
                                age_distribution_population = aggr_age,
                                infectious_period           = 4)
plot_posterior_rt(post_rt_summary)
```

**Visualize the posterior distribution (age-specific or aggregated daily estimates) of the infection counts:**
  
```{r, eval=T, echo=T}
post_inf_summary <- posterior_infections(object = igbm_fit,
                                         y_data = age_specific_mortality_counts)
```

```{r, eval=T, echo=T}
plot_posterior_infections(post_inf_summary, type = "age-specific")
```

```{r, eval=T, echo=T}
plot_posterior_infections(post_inf_summary, type = "aggregated")
```

**Visualize the posterior distribution (age-specific or aggregated daily estimates) of the mortality counts:**
  
```{r, eval=T, echo=T}
post_mortality_summary <- posterior_mortality(object = igbm_fit,
                                              y_data = age_specific_mortality_counts)
```

```{r, eval=T, echo=T}
plot_posterior_mortality(post_mortality_summary, type = "age-specific")
```

```{r, eval=T, echo=T}
plot_posterior_mortality(post_mortality_summary, type = "aggregated")
```

## Model checking and comparison with loo

For more information about the **loo** package visit the [dedicated website](https://mc-stan.org/loo/).

```{r, eval=T, echo=T}
# (for bigger models use as many cores as possible)
log_lik_1 <- loo::extract_log_lik(igbm_fit, merge_chains = FALSE)
r_eff_1   <- loo::relative_eff(exp(log_lik_1), cores = 4)
```

The estimated LOO Information Criterion and the respective estimated effective number of parameters are given by

```{r, eval=T, echo=T}
loo_1 <- loo::loo(log_lik_1, r_eff = r_eff_1, cores = 4)
print(loo_1)
```

The estimated Widely Applicable Information Criterion and the respective estimated effective number of parameters are given by

```{r, eval=T, echo=T}
waic_1 <- loo::waic(log_lik_1)
```

```{r, eval=T, echo=T}
print(waic_1)
```

## References

[1] Bouranis, L., Demiris, N., Kalogeropoulos, K., & Ntzoufras, I. (2022). _Bayesian analysis of diffusion-driven multi-type epidemic models with application to COVID-19_. arXiv. 
  
## Original Computing Environment

```{r, eval=F, echo=T}
writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))
```

```{r, eval=F, echo=T}
CXX14FLAGS=-O3 -Wno-unused-variable -Wno-unused-function
CXX14 = $(BINPREF)g++ -m$(WIN) -std=c++1y
CXX11FLAGS=-O3 -Wno-unused-variable -Wno-unused-function
```

```{r, eval=F, echo=T}
sessionInfo()
```

```{r, eval=F, echo=T}
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
  [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] loo_2.4.1            Bernadette_1.1.4     bayesplot_1.8.1      rstan_2.21.3         StanHeaders_2.21.0-7
[6] ggplot2_3.3.5       

loaded via a namespace (and not attached):
  [1] tidyselect_1.1.1   purrr_0.3.4        reshape2_1.4.4     colorspace_2.0-2   vctrs_0.3.8        generics_0.1.0
[7] stats4_4.1.0       utf8_1.2.1         rlang_1.1.1        pkgbuild_1.2.0     pillar_1.6.3       glue_1.4.2    
[13] withr_2.4.2        DBI_1.1.1          matrixStats_0.59.0 lifecycle_1.0.1    plyr_1.8.6         stringr_1.4.0 
[19] munsell_0.5.0      gtable_0.3.0       codetools_0.2-18   labeling_0.4.2     inline_0.3.19      callr_3.7.0   
[25] ps_1.6.0           parallel_4.1.0     fansi_0.5.0        rstantools_2.1.1   Rcpp_1.0.10        scales_1.1.1  
[31] backports_1.2.1    checkmate_2.0.0    RcppParallel_5.1.4 farver_2.1.0       gridExtra_2.3      digest_0.6.28 
[37] stringi_1.6.2      processx_3.5.2     dplyr_1.0.7        grid_4.1.0         cli_3.6.1          tools_4.1.0   
[43] magrittr_2.0.1     tibble_3.1.2       crayon_1.4.1       pkgconfig_2.0.3    ellipsis_0.3.2     prettyunits_1.1.1 
[49] ggridges_0.5.3     assertthat_0.2.1   rstudioapi_0.13    R6_2.5.1           compiler_4.1.0  
```

Owner

Name: bernadette-eu
Login: bernadette-eu
Kind: user
Location: Athens, Greece
Company: Athens University of Economics and Business

Website: https://bernadette-eu.github.io/
Twitter: BernadetteMsca
Repositories: 4
Profile: https://github.com/bernadette-eu

JOSS Publication

Bernadette: Bayesian Inference and Model Selection for Stochastic Epidemics in R

Published

September 26, 2023

DOI

10.21105/joss.05612

Volume 8, Issue 89, Page 5612

Authors

Lampros Bouranis

Department of Statistics, Athens University of Economics and Business, Athens, Greece

Editor

Samuel Forbes

GitHub Events

Total

Watch event: 1
Push event: 3

Last Year

Watch event: 1
Push event: 3

Committers

Last synced: 7 months ago

All Time

Total Commits: 217
Total Committers: 6
Avg Commits per committer: 36.167
Development Distribution Score (DDS): 0.041

Past Year

Commits: 3
Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
bernadette-eu	b**b@g**m	208
Jeffrey Pullin	j**n@g**m	3
Hugo Gruson	B****o	2
Lampros Bouranis	l**s@u**e	2
Arfon Smith	a****n	1
Andrew Johnson	a**n@a**m	1

Committer Domains (Top 20 + Academic)

arjohnsonau.com: 1 ucdconnect.ie: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 9
Total pull requests: 5
Average time to close issues: 1 day
Average time to close pull requests: 24 days
Total issue authors: 3
Total pull request authors: 5
Average comments per issue: 1.0
Average comments per pull request: 1.2
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

rowlandseymour (5)
bernadette-eu (2)
strengejacke (2)

Pull Request Authors

Bisaloo (2)
jeffreypullin (1)
arfon (1)
bernadette-eu (1)
andrjohns (1)

Top Labels

Issue Labels

enhancement (1) bug (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Bernadette

Science Score: 93.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

Bernadette: Bayesian Inference and Model Selection for Stochastic Epidemics in R

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies