https://github.com/athenaisgautier/slgp-package

https://github.com/athenaisgautier/slgp-package

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AthenaisGautier
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Size: 48.2 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed 9 months ago
Metadata Files
Readme License

README.html




















Introduction to SLGP Package

Athénaïs Gautier

This vignette serves as a startup guide to SLGP modeling, providing a practical introduction to the implementation of Spatial Logistic Gaussian Processes (SLGPs).

Dataset

We illustrate the model’s capabilities using the Boston Housing dataset @harrison_hedonic_1978, a widely used benchmark in statistical modeling and regression analysis.

For this vignette, we focus on modeling the distribution of median home values (med) as a function of the proportion of pre-1940 owner-occupied units (age). This example highlights the ability of SLGPs to capture complex, spatially dependent distributions in data that exhibit heterogeneity and multi-modality.

library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> Les objets suivants sont masqués depuis 'package:stats':
#> 
#>     filter, lag
#> Les objets suivants sont masqués depuis 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# Load the dataset (available in MASS package)
if (!requireNamespace("MASS", quietly = TRUE)) install.packages("MASS")
data("Boston", package = "MASS")
df <- Boston %>%
  mutate(age_bin = cut(age, breaks = seq(0, 100, by = 10), include.lowest = FALSE)) %>%
  group_by(age_bin) %>%
  mutate(age_bin = paste0(age_bin, "\nn=", n()))%>%
  ungroup()%>%
  mutate(age_bin = factor(age_bin, 
                          levels = sort(unique(age_bin), decreasing = FALSE))) %>%
  data.frame()

range_response <- c(0, 50) # Can use range(df$medv), or user defined range as we do here
range_x <- c(0, 100) # Can use range(df$age), or user defined range as we do here

We represent the data.

library(ggplot2)
library(ggpubr)
#> Warning: le package 'ggpubr' a été compilé avec la version R 4.4.2
library(viridis)
#> Le chargement a nécessité le package : viridisLite
# Scatterplot: med vs. age
scatter_plot <- ggplot(df, aes(x = age, y = medv)) +
  geom_point(alpha = 0.5, color = "navy") +
  labs(x = "Proportion of owner-occupied units\nbuilt prior to 1940 [AGE, %]", 
       y = "Median value of owner-occupied homes [MEDV, k$]",
       title = "Median value vs. age of homes") +
  theme_bw()+
  coord_cartesian(xlim=range_x,
                  ylim=range_response)

# Histogram: Distribution of med by age bin
hist_plot <- ggplot(df, aes(x = medv)) +
  geom_histogram(mapping=aes(y=after_stat(density)),
                 position = "identity", breaks = seq(0, 50, 2.5),
                 fill="darkgrey", col="grey50", lwd=0.2, alpha=0.7) +
  geom_rug(sides = "b", color = "navy", alpha = 0.5)+
  facet_wrap(~ age_bin, scales = "free_y", nrow=2) +
  labs(x = "Median value of owner-occupied homes [MEDV, k$]", 
       y = "Probability density", 
       title = "Histogram of median housing values by AGE group") +
  theme_bw()+
  coord_cartesian(xlim=range_response,
                  ylim=c(0, 0.25))
ggarrange(scatter_plot, hist_plot, ncol = 2, nrow = 1,
          widths = c(0.3, 0.7))
A visual representation of the dependency of the median value of owner-occupied homes on proportion of owner-occupied units constructed before 1940 in the Boston Housing dataset.

A visual representation of the dependency of the median value of owner-occupied homes on proportion of owner-occupied units constructed before 1940 in the Boston Housing dataset.

# ggsave("./Figures/scatter.pdf", width=10, height=3.5)

We see that there is a general trend where older homes tend to have lower values, with exceptions likely due to survivor bias: older homes that remain tend to be of higher structural quality. This dataset provides an test case for SLGP modeling, offering a compact, one-dimensional covariate space, heterogeneously distributed data, and shifting distributional shapes.

SLGP model specifications

To model the distributional changes observed in the Boston Housing dataset, we now introduce the Spatial Logistic Gaussian Process (SLGP) model. SLGPs provide a flexible non-parametric framework for modeling spatially dependent probability densities. By transforming a Gaussian Process (GP) through exponentiation and normalization, SLGPs ensure positivity and integration to one, making them well-suited for density estimation.

Prior

library(SLGP)

modelPrior <- slgp(medv~age, # Use a formula to specify predictors VS response
                   # Can use medv~. for all variables,
                   # Or medv ~ age + var2 + var3 for more variables
                   data=df,
                   method="none", #Maximum a posteriori estimation scheme
                   basisFunctionsUsed = "RFF",
                   interpolateBasisFun="WNN", # Accelerate inference
                   hyperparams = list(lengthscale=c(0.15, 0.15), 
                                      # Applied to normalised data
                                      # So 0.15 is 15% of the range of values
                                      sigma2=1), 
                   # Will be re-selected with sigmaEstimationMethod
                   sigmaEstimationMethod = "heuristic", # Set to heuristic for numerical stability                 
                   predictorsLower= c(range_x[1]),
                   predictorsUpper= c(range_x[2]),
                   responseRange= range_response,
                   opts_BasisFun = list(nFreq=200,
                                        MatParam=5/2),
                   seed=1)


library(tidyr)
nrep <- 3
set.seed(8)
p <- ncol(modelPrior@coefficients)
modelPrior@coefficients <- matrix(rnorm(n=nrep*p), nrow=nrep)

dfGrid <- data.frame(expand.grid(seq(range_x[1], range_x[2], 5), 
                                 seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("age", "medv")
predPrior <- predictSLGP_newNode(SLGPmodel=modelPrior,
                                 newNodes = dfGrid)
colnames(predPrior) <- c("age", "medv", paste0("Draw from the prior n°", seq(nrep)))
predPrior <- predPrior%>%
  pivot_longer(-c("age", "medv"))
scale_factor <- 200
ggplot()  +
  labs(y = "Proportion of owner-occupied units built prior to 1940 [AGE, %]", 
       x = "Median value of owner-occupied homes [MEDV, k$]",
       title = "Samples from the SLGP Prior for the pdfs of MEDV at AGE, visualised across slices") +
  theme_bw()+
  geom_ribbon(data=predPrior,
              mapping=aes(x=medv, ymax=scale_factor*value+age, 
                          ymin=age, group=-age, fill=age),
              col="grey", alpha=0.9)+
  # geom_point(data=df,
  #            mapping=aes(x = medv, y = age), alpha = 0.5, color = "navy")+
  scale_fill_viridis(option = "plasma",
                     guide = guide_colorbar(nrow = 1,
                                            title = "Indexing variable: Proportion of owner-occupied units built prior to 1940",
                                            barheight = unit(2, units = "mm"),
                                            barwidth = unit(55, units = "mm"),
                                            title.position = 'top',
                                            label.position = "bottom",
                                            title.hjust = 0.5))+
  theme(legend.position = "bottom")+
  coord_flip()+
  facet_grid(.~name)
Samples from the SLGP Prior for the pdfs of MEDV at AGE, visualised across slices.

Samples from the SLGP Prior for the pdfs of MEDV at AGE, visualised across slices.

# ggsave(paste0("./Figures/ribbonsPrior",  ".pdf"), width=10, height=5)

selected_values <- c(20, 50, 95)
gap <- 2.5
df_filtered <- df %>%
  mutate(interval=findInterval(age, c(0, 
                                      selected_values[1]-gap, 
                                      selected_values[1]+gap, 
                                      selected_values[2]-gap, 
                                      selected_values[2]+gap, 
                                      selected_values[3]-gap, 
                                      selected_values[3]+gap)))%>%
  filter(interval %in% c(2, 4, 6))%>%
  group_by(interval)%>%
  mutate(category = paste0("Age close to ", c("", selected_values[1],
                                              "", selected_values[2],
                                              "", selected_values[3])[interval], 
                           "\nn=", n()))
names <- sort(unique(df_filtered$category))
dfGrid <- data.frame(expand.grid(selected_values, 
                                 seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("age", "medv")
predPrior <- predictSLGP_newNode(SLGPmodel=modelPrior,
                                 newNodes = dfGrid)
colnames(predPrior) <- c("age", "medv", paste0("Draw from the prior n°", seq(nrep)))
predPrior <- predPrior%>%
  pivot_longer(-c("age", "medv"))
predPrior$category <-ifelse(predPrior$age==selected_values[1], names[1],
                            ifelse(predPrior$age==selected_values[2], names[2], names[3]))

ggplot(mapping=aes(x = medv)) +
  geom_histogram(df_filtered,
                 mapping=aes(y=after_stat(density)),
                 position = "identity", breaks = seq(0, 50, 2.5),
                 fill="darkgrey", col="grey50", lwd=0.2, alpha=0.7) +
  geom_rug(data=df_filtered, sides = "b", color = "navy", alpha = 0.5)+
  geom_line(data=predPrior, mapping=aes(y=value, group=name), 
            color = "black", lwd=0.1, alpha=0.5)+
  geom_line(data=predPrior, mapping=aes(y=value, group=name, col=name), lwd=1.1)+
  facet_wrap(~ category, scales = "free_y", nrow=1) +
  labs(x = "Median value of owner-occupied homes [k$]", 
       y = "Probability density", 
       title = "Histogram of median value at bins centered at several 'age' values, with width 5\nversus draws from a SLGP prior") +
  theme_bw()+
  theme(legend.position="bottom",
        legend.direction = "horizontal",
        legend.title = element_blank())+
  coord_cartesian(xlim=range_response,
                  ylim=c(0, 0.25))
Samples from the SLGP Prior versus histograms of median value at bins centered at several 'age' values

Samples from the SLGP Prior versus histograms of median value at bins centered at several ‘age’ values


# ggsave(paste0("./Figures/histPrior",  ".pdf"), width=10, height=5)

Maximum a posteriori estimate

For a quicker approximation, MAP estimation provides a point estimate, offering a balance between computational efficiency and the depth of inference. It is the fastest estimation scheme we propose, however MAP does not facilitate uncertainty quantification because it yields a non-probabilistic estimate of the underlying density field, focusing instead on identifying the mode of the posterior distribution.

modelMAP <- slgp(medv~age, # Use a formula to specify predictors VS response
                 # Can use medv~. for all variables,
                 # Or medv ~ age + var2 + var3 for more variables
                 data=df,
                 method="MAP", #Maximum a posteriori estimation scheme
                 basisFunctionsUsed = "RFF",
                 interpolateBasisFun="WNN", # Accelerate inference
                 hyperparams = list(lengthscale=c(0.15, 0.15), 
                                    # Applied to normalised data
                                    # So 0.15 is 15% of the range of values
                                    sigma2=1), 
                 # Will be re-selected with sigmaEstimationMethod
                 sigmaEstimationMethod = "heuristic", 
                 # Set to heuristic for numerical stability                 
                 predictorsLower= c(range_x[1]),
                 predictorsUpper= c(range_x[2]),
                 responseRange= range_response,
                 opts_BasisFun = list(nFreq=200,
                                      MatParam=5/2),
                 seed=1)
# Or equivalent, re-use the same basis functions 
# and hyper parameters as in the prior we saw

modelMAP <- retrainSLGP(SLGPmodel=modelPrior, 
                        newdata = df, 
                        method="MAP")
# Or equivalent, more explicit in the re-using of the elements
# From the SLGP prior

modelMAP <- slgp(medv~age, 
                 data=df,
                 method="MAP", #Maximum a posteriori estimation scheme
                 basisFunctionsUsed = "RFF",
                 interpolateBasisFun="WNN", # Accelerate inference
                 hyperparams = modelPrior@hyperparams, 
                 sigmaEstimationMethod = "none",# Already selected in the prior
                 predictorsLower= c(range_x[1]),
                 predictorsUpper= c(range_x[2]),
                 responseRange= range_response,
                 opts_BasisFun = modelPrior@opts_BasisFun,
                 BasisFunParam = modelPrior@BasisFunParam,
                 seed=1)

We can represent the conditional density as a colormap.

dfGrid <- data.frame(expand.grid(seq(range_x[1], range_x[2],, 101), 
                                 seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("age", "medv")
pred <- predictSLGP_newNode(SLGPmodel=modelMAP,
                            newNodes = dfGrid)

ggplot()  +
  labs(y = "Proportion of owner-occupied units\nbuilt prior to 1940 [%]", 
       x = "Median value of owner-occupied homes [k$]",
       title = "Median value vs. age of homes") +
  theme_bw()+
  geom_raster(data=pred,
              mapping=aes(x=age, y=medv, fill=pdf_1))+
  geom_point(data=df,
             mapping=aes(x=age, y=medv), alpha = 0.5, 
             pch="x", col="grey")+
  scale_fill_viridis(option = "viridis",
                     guide = guide_colorbar(nrow = 1,
                                            title = "Probability density of med at age",
                                            barheight = unit(2, units = "mm"),
                                            barwidth = unit(55, units = "mm"),
                                            title.position = 'top',
                                            label.position = "bottom",
                                            title.hjust = 0.5))+
  theme(legend.position = "bottom")
Predictive probability density of medv at age, as predicted by a SLGP.

Predictive probability density of medv at age, as predicted by a SLGP.



library(SLGP)
library(viridis)
dfGrid <- data.frame(expand.grid(seq(range_x[1], range_x[2], 5), 
                                 seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("age", "medv")
pred <- predictSLGP_newNode(SLGPmodel=modelMAP,
                            newNodes = dfGrid)

scale_factor <- 100
ggplot()  +
  labs(y = "Proportion of owner-occupied units\nbuilt prior to 1940 [%]", 
       x = "Median value of owner-occupied homes [k$]",
       title = "Median value vs. age of homes") +
  theme_bw()+
  geom_ribbon(data=pred,
              mapping=aes(x=medv, ymax=scale_factor*pdf_1+age, 
                          ymin=age, group=-age, fill=age),
              col="grey", alpha=0.9)+
  geom_point(data=df,
             mapping=aes(x = medv, y = age), alpha = 0.5, color = "navy")+
  scale_fill_viridis(option = "plasma",
                     guide = guide_colorbar(nrow = 1,
                                            title = "Indexing variable: Proportion of owner-occupied units built prior to 1940",
                                            barheight = unit(2, units = "mm"),
                                            barwidth = unit(55, units = "mm"),
                                            title.position = 'top',
                                            label.position = "bottom",
                                            title.hjust = 0.5))+
  theme(legend.position = "bottom")+
  coord_flip()
Predictive probability density of medv at age, seen over slices.

Predictive probability density of medv at age, seen over slices.

Laplace approximation estimate

By integrating the MAP approach with Laplace approximation, we refine our estimation strategy by approximating the posterior distribution with a multivariate Gaussian. This method strikes a balance between the full Bayesian inference of MCMC and the computational efficiency of MAP estimation. By leveraging both the gradient and Hessian of the posterior, it captures essential curvature information, providing a more informed approximation of the posterior landscape.

modelLaplace <- slgp(medv~age, # Use a formula to specify predictors VS response
                     # Can use medv~. for all variables,
                     # Or medv ~ age + var2 + var3 for more variables
                     data=df,
                     method="Laplace", #Maximum a posteriori + Laplace approximation
                     basisFunctionsUsed = "RFF",
                     interpolateBasisFun="WNN", # Accelerate inference
                     hyperparams = list(lengthscale=c(0.15, 0.15), 
                                        # Applied to normalised data
                                        # So 0.15 is 15% of the range of values
                                        sigma2=1), 
                     # Will be re-selected with sigmaEstimationMethod
                     sigmaEstimationMethod = "heuristic", # Set to heuristic for numerical stability                 
                     predictorsLower= c(range_x[1]),
                     predictorsUpper= c(range_x[2]),
                     responseRange= range_response,
                     opts_BasisFun = list(nFreq=100,
                                          MatParam=5/2))
# Define the three selected values
selected_values <- c(20, 50, 95)
gap <- 2.5

dfGrid <- data.frame(expand.grid(seq(3), 
                                 seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("ID", "medv")
dfGrid$age <- selected_values[dfGrid$ID]
pred <- predictSLGP_newNode(SLGPmodel=modelLaplace,
                            newNodes = dfGrid)
pred$meanpdf <- rowMeans(pred[, -c(1:3)])

library(tidyr)
# Filter the data: keep values within ±5 of the selected ones
df_filtered <- df %>%
  mutate(interval=findInterval(age, c(0, 
                                      selected_values[1]-gap, 
                                      selected_values[1]+gap, 
                                      selected_values[2]-gap, 
                                      selected_values[2]+gap, 
                                      selected_values[3]-gap, 
                                      selected_values[3]+gap)))%>%
  filter(interval %in% c(2, 4, 6))%>%
  group_by(interval)%>%
  mutate(category = paste0("Age close to ", c("", selected_values[1],
                                              "", selected_values[2],
                                              "", selected_values[3])[interval], 
                           "\nn=", n()))
names <- sort(unique(df_filtered$category))
pred$category <- names[pred$ID]

set.seed(1)
selected_cols <- sample(seq(1000), size=10, replace=FALSE)
df_plot <- pred %>%
  dplyr::select(c("age", "medv", "category", 
                  paste0("pdf_", selected_cols)))%>%
  pivot_longer(-c("age", "medv", "category"))


ggplot(mapping=aes(x = medv)) +
  geom_histogram(df_filtered,
                 mapping=aes(y=after_stat(density)),
                 position = "identity", breaks = seq(0, 50, 2.5),
                 fill="darkgrey", col="grey50", lwd=0.2, alpha=0.7) +
  geom_rug(data=df_filtered, sides = "b", color = "navy", alpha = 0.5)+
  geom_line(data=df_plot, mapping=aes(y=value, group=name), 
            color = "black", lwd=0.1, alpha=0.5)+
  geom_line(data=pred, mapping=aes(y=meanpdf, group=category), color = "red")+
  facet_wrap(~ category, scales = "free_y", nrow=1) +
  labs(x = "Median value of owner-occupied homes [k$]", 
       y = "Probability density", 
       title = "Histogram of median value at bins centered at several 'age' values, with width 5\nversus SLGP draws from a laplace approximation (black curves) and average (red curve)") +
  theme_bw()+
  coord_cartesian(xlim=range_response,
                  ylim=c(0, 0.25))
Predictive probability density (and draws from a Laplace approximation) of medv at age, seen over 3 slices.

Predictive probability density (and draws from a Laplace approximation) of medv at age, seen over 3 slices.

MCMC estimate

This method allows us to explore the posterior distribution by drawing samples from it. It enables precise, exact inference of the posterior distribution of the underlying density field knowing the data. The main drawback of this approach being its higher computational cost

modelMCMC <- slgp(medv~age, # Use a formula to specify predictors VS response
                  # Can use medv~. for all variables,
                  # Or medv ~ age + var2 + var3 for more variables
                  data=df,
                  method="MCMC", #MCMC
                  basisFunctionsUsed = "RFF",
                  interpolateBasisFun="WNN", # Accelerate inference
                  hyperparams = list(lengthscale=c(0.15, 0.15), 
                                     # Applied to normalised data
                                     # So 0.15 is 15% of the range of values
                                     sigma2=1), 
                  # Will be re-selected with sigmaEstimationMethod
                  sigmaEstimationMethod = "heuristic", # Set to heuristic for numerical stability                 
                  predictorsLower= c(range_x[1]),
                  predictorsUpper= c(range_x[2]),
                  responseRange= range_response,
                  opts_BasisFun = list(nFreq=100,
                                       MatParam=5/2),
                  opts = list(stan_chains=2, stan_iter=1000))
#> 
#> SAMPLING FOR MODEL 'SLGP_Likelihood_composed' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.001252 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 12.52 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
#> Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 16.954 seconds (Warm-up)
#> Chain 1:                16.522 seconds (Sampling)
#> Chain 1:                33.476 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'SLGP_Likelihood_composed' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 0.001649 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 16.49 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
#> Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
#> Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
#> Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
#> Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 18.071 seconds (Warm-up)
#> Chain 2:                15.504 seconds (Sampling)
#> Chain 2:                33.575 seconds (Total)
#> Chain 2: 
#> Convergence diagnostics:
#>   * A R-hat close to 1 indicates a good convergence.
#> Here, the dimension of the sampled values is 200.
#> The range of the R-hats is: 0.99802 - 1.00126  * Effective Sample Sizes estimates the number of independent draws from the posterior.
#> Higher values are better.
#> The range of the ESS is: 1279.1 - 3000  * Checking the Bayesian Fraction of Missing Information is also a way to locate issues.
#> E-BFMI indicated no pathological behavior.
# Define the three selected values
pred <- predictSLGP_newNode(SLGPmodel=modelMCMC,
                            newNodes = dfGrid)
pred$meanpdf <- rowMeans(pred[, -c(1:3)])
pred$category <- names[pred$ID]

df_plot <- pred %>%
  dplyr::select(c("age", "medv", "category", 
                  paste0("pdf_", selected_cols)))%>%
  pivot_longer(-c("age", "medv", "category"))


ggplot(mapping=aes(x = medv)) +
  geom_histogram(df_filtered,
                 mapping=aes(y=after_stat(density)),
                 position = "identity", breaks = seq(0, 50, 2.5),
                 fill="darkgrey", col="grey50", lwd=0.2, alpha=0.7) +
  geom_rug(data=df_filtered, sides = "b", color = "navy", alpha = 0.5)+
  geom_line(data=df_plot, mapping=aes(y=value, group=name), 
            color = "black", lwd=0.1, alpha=0.5)+
  geom_line(data=pred, mapping=aes(y=meanpdf, group=category), color = "red")+
  facet_wrap(~ category, scales = "free_y", nrow=1) +
  labs(x = "Median value of owner-occupied homes [k$]", 
       y = "Probability density", 
       title = "Histogram of median value at bins centered at several 'age' values, with width 5\nversus SLGP draws from a MCMC (black curves) and average (red curve)") +
  theme_bw()+
  coord_cartesian(xlim=range_response,
                  ylim=c(0, 0.25))
Predictive probability density (and draws from a MCMC) of medv at age, seen over 3 slices.

Predictive probability density (and draws from a MCMC) of medv at age, seen over 3 slices.

Owner

  • Name: AthenaisGautier
  • Login: AthenaisGautier
  • Kind: user

GitHub Events

Total
  • Push event: 22
Last Year
  • Push event: 22

Dependencies

DESCRIPTION cran
  • R >= 4.3.0 depends
  • ggplot2 * depends
  • stats * depends
  • DiceDesign * imports
  • Rcpp >= 0.12.0 imports
  • RcppParallel >= 5.0.1 imports
  • dplyr * imports
  • methods * imports
  • mvnfast * imports
  • rstan >= 2.18.1 imports
  • rstantools >= 2.3.1.1 imports
  • tidyr * imports
  • truncnorm * imports