Recent Releases of gratia

gratia - gratia 0.11.1

A new version of gratia is out! The main reason for this release is to make gratia compliant with the upcoming 4.0.0 release of ggplot2. This has led to a small improvement in residuals_hist_plot() to better align the bins. I'd also already implemented a few more CDF functions for some of mgcv's families, so they came along for the release ride. Finally, a couple of bugs were fixed and partial_derivatives() got some quality of life improvements.

The full changelog for the release is below.

User visible changes

  • residuals_hist_plot() and hence appraise() now centre the middle bin of the histogram at 0. In part this was due to ggplot2's new binning algorithm leading to potentially odd choices for the bin breaks in the context of model residuals.

New features

  • quantile_residuals() now supports more of mgcv's families:

    1. scat(),
    2. nb()
    3. betar(),
    4. tw().
  • Several user friendliness improvements in partial_derivatives():

    • now better handles the case where there are multiple smooths for which partial derivatives are required,
    • correctly identifies smooths that involve random effect terms (i.e. any smooth or tensor product marginal smooths with bs %in% c("re", "fs")) and ignores them,
    • identifies and ignores univariate smooths, and
    • displays more informative error messages to explain what was wrong.

Part of discussion with @BenFN121 in #356.

Bug fixes

  • partial_derivatives() threw an error when the select argument was used.

    356 reported by @BenFN121

  • draw.conditional_values() was setting a label on the fill aesthetic even if that aesthetic was not being used. In ggplot2 v4.0.0 this resulted in a warning, which is fixed.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson 4 months ago

gratia - Version 0.10.0 released and on CRAN

Earlier this evening I wrapped up the source tarball for version 0.10.0 of gratia and submitted it to CRAN. Following automated check's, this new version of gratia is now available from CRAN 🥳

This is a small release of gratia to coincide with the final stages of the review process of a Journal of Open Source Software paper on gratia that I submitted earlier in the year. Apart from a slew of bug fixes, the main new feature in this release is conditional_values(), a tidy/ggplot version of mgcv::vis.gam() that is based on marginaleffects plot_predictions(). conditional_values() is intended as a user friendly way to visualise predicted values of the model that are conditional on supplied values of covariates. For more complex GAMs, such visualisations are an essential way to understand and interpret the fitted model.

New features

  • conditional_values() and its draw() method compute and plot predictions from a fitted GAM that are conditional on one or more covariates. The function is a wrapper around fitted_values() but allows the user simple ways to specify which covariates to condition on and at what values those covariates should take. It provides similar functionality to marginaleffects::plot_predictions(), but is simpler. See #300.

  • penalty() and basis() can now allow the smooth to be reparameterized such that the resulting basis has an identity matrix. This more clearly highlights the penalty null space, the functions that the penalty has no effect on.

  • draw.gam() and draw.smooth_estimates() gain argument caption, which, if set to FALSE will not plot the smooth basis type as a caption on the plot.

    307

  • appraise() and qq_plot.gam() now allow the user to set a random seed that is used when generating reference quantiles with method = "uniform" or method = "simulate".

Bug fixes

  • derivative_samples() was ignoring the scale argument. #293 Reported by @jonathonmellor

  • Argument level to derivative_samples() was included accidentally. As of v0.9.2.9002 this argument is deprecated and using it will now generate a warning. #291

  • draw() was not plotting cyclic P spline smooths. Reported by @Zuckerbrot

    297

  • derivatives() would fail for "fs" smooths with other parametric effects in the model. Reported by @mahonmb #301

  • Partial residuals in partial_residuals() and draw.gam() were wrong for GAMs fitted with family = binomial() where the weights argument contained the binomial sample sizes because the prior weights were being used to form weighted working residuals. Now working weights are used instead. Reported by @emchuron #273

  • Internal function gammals_link() was expecting "theta" as a synonym for the scale parameter but the master table has "phi" coded as the synonym. Now both work as expected.

  • level() assumed that level would have only a single value even though it could handle multiple levels. #321

Scientific Software - Peer-reviewed - R
Published by gavinsimpson about 1 year ago

gratia - Version 0.9.2 of gratia now on CRAN

Version 0.9.2 released to CRAN, June 25, 2024

This patch release is largely motivated to fix a few bugs that came to light recently as I was teaching my GAM course for Physalia and preparing a paper for submission to the Journal of Open Source Software. Version 0.9.1 was never released (submission was rejected by CRAN as the package vignettes took the package over the 5Mb limit and CRAN finally said "Nope").

The entries below summarise the changes in this version of gratia. Nothing major here, but I have started building in support for location, scale, shape families in fitted_samples(), although currently only the location parameter of those models is supported.

Breaking changes

  • parametric_effects() slightly escaped the great renaming that happened for 0.9.0. Columns type and term did not gain a prefix .. This is now rectified and these two columns are now .type and .term.

User visible changes

  • Plots of random effects are now labelled with their smooth label. Previously, the title was taken fro the variable involved in the smooth, but this doesn't work for terms like s(subject, continuous_var, bs = "re") for random slopes, which previsouly would have the title "subject". Now such terms will have title "s(subject,continuous_var)". Simple random intercept terms, s(subject, bs = "re"), are now titled "s(subject)". #287

  • The vignettes

    1. custom-plotting.Rmd, and
    2. posterior-simulation.Rmd were moved to vignettes/articles and thus are no longer available as package vignettes. Instead, they are accessible as Articles through the package website: https://gavinsimpson.github.io/gratia/

New features

  • fitted_samples() now works for gam() models with multiple linear predictors, but currently only the location parameter is supported. The parameter is indicated through a new variable .parameter in the returned object.

Bug fixes

  • partial_residuals() was computing partial residuals from the deviance residuals. For compatibility with mgcv::plot.gam(), partial residuals are now computed from the working residuals. Reported by @wStockhausen #273

  • appraise() was not passing the ci_col argument on qq_plot() and worm_plot(). Reported by Sate Ahmed.

  • Couldn't pass mvn_method on to posterior sampling functions from user facing functions fitted_samples(), posterior_samples(), smooth_samples(), derivative_samples(), and repsonse_derivatives(). Reported by @stefgehrig

    279

  • fitted_values() works again for quantile GAMs fitted by qgam().

  • confint.gam() was not applying shift to the estimate and upper and lower interval. #280 reported by @TIMAVID & @rbentham

  • parametric_effects() and draw.parametric_effects() would forget about the levels of factors (intentionally), but this would lead to problems with ordered factors where the ordering of levels was not preserved. Now, parametric_effects() returns a named list of factor levels as attribute "factor_levels" containing the required information and the order of levels is preserved when plotting. #284 Reported by @mhpob

  • parametric_effects() would fail if there were parametric terms in the model but they were all interaction terms (which we don't currently handle). #282

Scientific Software - Peer-reviewed - R
Published by gavinsimpson over 1 year ago

gratia - gratia 0.9.0

Breaking changes

  • Many functions now return objects with different named variables. In order to avoid clashes with variable names used in user's models or data, a period (.) is now being used as a prefix for generated variable names. The functions whose names have changed are: smooth_estimates(), fitted_values(), fitted_samples(), posterior_samples(), derivatives(), partial_derivatives(), and derivative_samples(). In addition, add_confint() also adds newly-named variables.
  1. `est` is now `.estimate`,
  2. `lower` and `upper` are now `.lower_ci` and `.upper_ci`,
  3. `draw` and `row` and now `.draw` and `.row` respectively,
  4. `fitted`, `se`, `crit` are now `.fitted`, `.se`, `.crit`, respectively
  5. `smooth`, `by`, and `type` in `smooth_estimates()` are now `.smooth`,
     `.by`, `.type`, respectively.
  • derivatives() and partial_derivatives() now work more like smooth_estimates(); in place of the var and data columns, gratia now stores the data variables at which the derivatives were evaluated as columns in the object with their actual variable names.

  • The way spline-on-the-sphere (SOS) smooths (bs = "sos") are plotted has changed to use ggplot2::coord_sf() instead of the previously-used ggplot2::coord_map(). This changed has been made as a result of coord_map() being soft-deprecated ("superseded") for a few minor versions of ggplot2 by now already, and changes to the guides system in version 3.5.0 of ggplot2.

The axes on plots created with coord_map() never really worked correctly and changing the angle of the tick labels never worked. As coord_map() is superseded, it didn't receive the updates to the guides system and a side effect of these changes, the code that plotted SOS smooths was producing a warning with the release of ggplot2 version 3.5.0.

The projection settings used to draw SOS smooths was previously controlled via arguments projection and orientation. These arguments do not affect ggplot2::coord_sf(), Instead the projection used is controlled through new argument crs, which takes a PROJ string detailing the projection to use or an integer that refers to a known coordinate reference system (CRS). The default projection used is +proj=ortho +lat_0=20 +lon_0=XX where XX is the mean of the longitude coordinates of the data points.

Defunct and deprecated functions and arguments

Defunct

  • evaluate_smooth() was deprecated in gratia version 0.7.0. This function and all it's methods have been removed from the package. Use smooth_estimates() instead.

Deprecated functions

The following functions were deprecated in version 0.9.0 of gratia. They will eventually be removed from the package as part of a clean up ahead of an eventual 1.0.0 release. These functions will become defunct by version 0.11.0 or 1.0.0, whichever is released soonest.

  • evaluate_parametric_term() has been deprecated. Use parametric_effects() instead.

  • datagen() has been deprecated. It never really did what it was originally designed to do, and has been replaced by data_slice().

Deprecated arguments

To make functions in the package more consistent, the arguments select, term, and smooth are all used for the same thing and hence the latter two have been deprecated in favour of select. If a deprecated argument is used, a warning will be issued but the value assigned to the argument will be assigned to select and the function will continue.

User visible changes

  • smooth_samples() now uses a single call to the RNG to generate draws from the posterior of smooths. Previous to version 0.9.0, smooth_samples() would do a separate call to mvnfast::rmvn() for each smooth. As a result, the result of a call to smooth_samples() on a model with multiple smooths will now produce different results to those generated previously. To regain the old behaviour, add rng_per_smooth = TRUE to the smooth_samples() call.

Note, however, that using per-smooth RNG calls with method = "mh" will be very inefficient as, with that method, posterior draws for all coefficients in the model are sampled at once. So, only use rng_per_smooth = TRUE with method = "gaussian".

  • The output of smooth_estimates() and its draw() method have changed for tensor product smooths that involve one or more 2D marginal smooths. Now, if no covariate values are supplied via the data argument, smooth_estimates() identifies if one of the marginals is a 2d surface and allows the covariates involved in that surface to vary fastest, ahead of terms in other marginals. This change has been made as it provides a better default when nothing is provided to data.

This also affects draw.gam().

  • fitted_values() now has some level of support for location, scale, shape families. Supported families are mgcv::gaulss(), mgcv::gammals(), mgcv::gumbls(), mgcv::gevlss(), mgcv::shash(), mgcv::twlss(), and mgcv::ziplss().

  • gratia now requires dplyr versions >= 1.1.0 and tidyselect >= 1.2.0.

  • A new vignette Posterior Simulation is available, which describes how to do posterior simulation from fitted GAMs using {gratia}.

New features

  • Soap film smooths using basis bs = "so" are now handled by draw(), smooth_estimates() etc. #8

  • response_derivatives() is a new function for computing derivatives of the response with respect to a (continuous) focal variable. First or second order derivatives can be computed using forward, backward, or central finite differences. The uncertainty in the estimated derivative is determined using posterior sampling via fitted_samples(), and hence can be derived from a Gaussian approximation to the posterior or using a Metropolis Hastings sampler (see below.)

  • derivative_samples() is the work horse function behind response_derivatives(), which computes and returns posterior draws of the derivatives of any additive combination of model terms. Requested by @jonathanmellor #237

  • data_sim() can now simulate response data from gamma, Tweedie and ordered categorical distributions.

  • data_sim() gains two new example models "gwf2", simulating data only from Gu & Wabha's f2 function, and "lwf6", example function 6 from Luo & Wabha (1997 JASA 92(437), 107-116).

  • data_sim() can also simulate data for use with GAMs fitted using family = gfam() for grouped families where different types of data in the response are handled. #266 and part of #265

  • fitted_samples() and smooth_samples() can now use the Metropolis Hastings sampler from mgcv::gam.mh(), instead of a Gaussian approximation, to sample from the posterior distribution of the model or specific smooths respectively.

  • posterior_samples() is a new function in the family of fitted_samples() and smooth_samples(). posterior_samples() returns draws from the posterior distribution of the response, combining the uncertainty in the estimated expected value of the response and the dispersion of the response distribution. The difference between posterior_samples() and predicted_samples() is that the latter only includes variation due to drawing samples from the conditional distribution of the response (the uncertainty in the expected values is ignored), while the former includes both sources of uncertainty.

  • fitted_samples() can new use a matrix of user-supplied posterior draws. Related to #120

  • add_fitted_samples(), add_predicted_samples(), add_posterior_samples(), and add_smooth_samples() are new utility functions that add the respective draws from the posterior distribution to an existing data object for the covariate values in that object: obj |> add_posterior_draws(model). #50

  • basis_size() is a new function to extract the basis dimension (number of basis functions) for smooths. Methods are available for objects that inherit from classes "gam", "gamm", and "mgcv.smooth" (for individual smooths).

  • data_slice() gains a method for data frames and tibbles.

  • typical_values() gains a method for data frames and tibbles.

  • fitted_values() now works with models fitted using the mgcv::ocat() family. The predicted probability for each category is returned, alongside a Wald interval created using the standard error (SE) of the estimated probability. The SE and estimated probabilities are transformed to the logit (linear predictor) scale, a Wald credible interval is formed, which is then back-transformed to the response (probability) scale.

  • fitted_values() now works for GAMMs fitted using mgcv::gamm(). Fitted (predicted) values only use the GAM part of the model, and thus exclude the random effects.

  • link() and inv_link() work for models fitted using the cnorm() family.

  • A worm plot can now be drawn in place of the QQ plot with appraise() via new argument use_worm = TRUE. #62

  • smooths() now works for models fitted with mgcv::gamm().

  • overview() now returns the basis dimension for each smooth and gains an argument stars which if TRUE add significance stars to the output plus a legend is printed in the tibble footer. Part of wish of @noamross #214

  • New add_constant() and transform_fun() methods for smooth_samples().

  • evenly() gains arguments lower and upper to modify the lower and / or upper bound of the interval over which evenly spaced values will be generated.

  • add_sizer() is a new function to add information on whether the derivative of a smooth is significantly changing (where the credible interval excludes 0). Currently, methods for derivatives() and smooth_estimates() objects are implemented. Part of request of @asanders11 #117

  • draw.derivatives() gains arguments add_change and change_type to allow derivatives of smooths to be plotted with indicators where the credible interval on the derivative excludes 0. Options allow for periods of decrease or increase to be differentiated via change_type = "sizer" instead of the default change_type = "change", which emphasises either type of change in the same way. Part of wish of @asanders11 #117

  • draw.gam() can now group factor by smooths for a given factor into a single panel, rather than plotting the smooths for each level in separate panels. This is achieved via new argument grouped_by. Requested by @RPanczak #89

draw.smooth_estimates() can now also group factor by smooths for a given factor into a single panel.

  • The underlying plotting code used by draw_smooth_estimates() for most univariate smooths can now add change indicators to the plots of smooths if those change indicators are added to the object created by smooth_estimates() using add_sizer(). See the example in ?draw.smooth_estimates.

  • smooth_estimates() can, when evaluating a 3D or 4D tensor product smooth, identify if one or more 2D smooths is a marginal of the tensor product. If users do not provide covariate values at which to evaluate the smooths, smooth_estimates() will focus on the 2D marginal smooth (or the first if more than one is involved in the tensor product), instead of following the ordering of the terms in the definition of the tensor product. #191

For example, in te(z, x, y, bs = c(cr, ds), d = c(1, 2)), the second marginal smooth is a 2D Duchon spline of covariates x and y. Previously, smooth_estimates() would have generated n values each for z and x and n_3d values for y, and then evaluated the tensor product at all combinations of those generated values. This would ignore the structure implicit in the tensor product, where we are likely to want to know how the surface estimated by the Duchon spline of x and y smoothly varies with z. Previously smooth_estimates() would generate surfaces of z and x, varying by y. Now, smooth_estimates() correctly identifies that one of the marginal smooths of the tensor product is a 2D surface and will focus on that surface varying with the other terms in the tensor product.

This improved behaviour is needed because in some bam() models it is not always possible to do the obvious thing and reorder the smooths when defining the tensor product to be te(x, y, z, bs = c(ds, cr), d = c(2, 1)). When discrete = TRUE is used with bam() the terms in the tensor product may get rearranged during model setup for maximum efficiency (See Details in ?mgcv::bam).

Additionally, draw.gam() now also works the same way.

  • New function null_deviance() that extracts the null deviance of a fitted model.

  • draw(), smooth_estimates(), fitted_values(), data_slice(), and smooth_samples() now all work for models fitted with scam::scam(). Where it matters, current support extends only to univariate smooths.

  • generate_draws() is a new low-level function for generating posterior draws from fitted model coefficients. generate_daws() is an S3 generic function so is extensible by users. Currently provides a simple interface to a simple Gaussian approximation sampler (gaussian_draws()) and the simple Metropolis Hasting sample (mh_draws()) available via mgcv::gam.mh(). #211

  • smooth_label() is a new function for extracting the labels 'mgcv' creates for smooths from the smooth object itself.

  • penalty() has a default method that works with s(), te(), t2(), and ti(), which create a smooth specification.

  • transform_fun() gains argument constant to allow for the addition of a constant value to objects (e.g. the estimate and confidence interval). This enables a single obj |> transform_fun(fun = exp, constant = 5) instead of separate calls to add_constant() and then transform_fun(). Part of the discussion of #79

  • model_constant() is a new function that simply extracts the first coefficient from the estimated model.

Bug fixes

  • link(), inv_link(), and related family functions for the ocat() weren't correctly identifying the family name and as a result would throw an error even when passed an object of the correct family.

link() and inv_link() now work correctly for the betar() family in a fitted GAM.

  • The print() method for lp_matrix() now converts the matrix to a data frame before conversion to a tibble. This makes more sense as it results in more typical behaviour as the columns of the printed object are doubles.

  • Constrained factor smooths (bs = "sz") where the factor is not the first variable mentioned in the smooth (i.e. s(x, f, bs = "sz") for continuous x and factor f) are now plotable with draw(). #208

  • parametric_effects() was unable to handle special parametric terms like poly(x) or log(x) in formulas. Reported by @fhui28 #212

  • parametric_effects() now works better for location, scale, shape models. Reported by @pboesu #45

  • parametric_effects now works when there are missing values in one or more variables used in a fitted GAM. #219

  • response_derivatives() was incorrectly using .data with tidyselect selectors.

  • typical_values() could not handle logical variables in a GAM fit as mgcv stores these as numerics in the var.summary. This affected evenly() and data_slice(). #222

  • parametric_effects() would fail when two or more ordered factors were in the model. Reported by @dsmi31 #221

  • Continuous by smooths were being evaluated with the median value of the by variable instead of a value of 1. #224

  • fitted_samples() (and hence posterior_samples()) now handles models with offset terms in the formula. Offset terms supplied via the offset argument are ignored by mgcv:::predict.gam() and hence are ignored also by gratia. Reported by @jonathonmellor #231 #233

  • smooth_estimates() would fail on a "fs" smooth when a multivariate base smoother was used and the factor was not the last variable specified in the definition of the smooth: s(x1, x2, f, bs = "fs", xt = list(bs = "ds")) would work, but s(f, x1, x2, bs = "fs", xt = list(bs = "ds")) (or any ordering of variables that places the factor not last) would emit an obscure error. The ordering of the terms involved in the smooth now doesn't matter. Reported by @chrisaak #249.

  • draw.gam() would fail when plotting a multivariate base smoother used in an "sz" smooth. Now, this use case is identified and a message printed indicating that (currently) gratia doesn't know how to plot such a smooth. Reported by @chrisaak #249.

  • draw.gam() would fail when plotting a multivariate base smoother used in an "fs" smooth. Now, this use case is identified and a message printed indicating that (currently) gratia doesn't know how to plot such a smooth. Reported by @chrisaak #249.

  • derivative_samples() would fail with order = 2 and was only computing forward finite differences, regardless of type for order = 1. Partly reported by @samlipworth #251.

  • The draw() method for penalty() was normalizing the penalty to the range 0--1, not the claimed and documented -1--1 with argument normalize = TRUE. This is now fixed.

  • smooth_samples() was failing when data was supplied that contained more variables than were used in the smooth that was being sampled. Hence this generally fail unless a single smooth was being sampled from or the model contained only a single smooth. The function never intended to retain all the variables in data but was written in such a way that it would fail when relocating the data columns to the end of the posterior sampling object. #255

  • draw.gam() and draw.smooth_estimates() would fail when plotting a univariate tensor product smooth (e.g. te(x), ti(x), or t2()). Reported by @wStockhausen #260

  • plot.smooth() was not printing the factor level in subtitles for ordered factor by smooths.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 2 years ago

gratia - gratia version 0.8.1 on CRAN

Version 0.8.1 of gratia is on CRAN. Version 0.8.0 was not released do to changes necessitated for the 1.1.0 release of dplyr. The full list of changes in the 0.8. and 0.8.1 versions is given below.

gratia 0.8.1

User visible changes

  • smooth_samples() now returns objects with variables involved in smooths that have their correct name. Previously variables were named .x1, .x2, etc. Fixing #126 and improving compatibility with compare_smooths() and smooth_estimates() allowed the variables to be named correctly.

  • gratia now depends on version 1.8-41 or later of the mgcv package.

New features

  • draw.gam() can now handle tensor products that include a marginal random effect smooth. Beware plotting such smooths if there are many levels, however, as a separate surface plot will be produced for each level.

Bug fixes

  • Additional fixes for changes in dplyr 1.1.0.

  • smooth_samples() now works when sampling from posteriors of multiple smooths with different dimension. #126 reported by @Aariq

gratia 0.8.0

User visible changes

  • {gratia} now depends on R version 4.1 or later.

  • A new vignette "Data slices" is supplied with {gratia}.

  • Functions in {gratia} have harmonised to use an argument named data instead of newdata for passing new data at which to evaluate features of smooths. A message will be printed if newdata is used from now on. Existing code does not need to be changed as data takes its value from newdata.

Note that due to the way ... is handled in R, if your R script uses the data argument, and is run with versions of gratia prior to 8.0 (when released; 0.7.3.8 if using the development version) the user-supplied data will be silently ignored. As such, scripts using data should check that the installed version of gratia is >= 0.8 and package developers should update to depend on versions >= 0.8 by using gratia (>= 0.8) in DESCRIPTION.

  • The order of the plots of smooths has changed in draw.gam() so that they again match the order in which smooths were specified in the model formula. See Bug Fixes below for more detail or #154.

New features

  • Added basic support for GAMLSS (distributional GAMs) fitted with the gamlss() function from package GJRM. Support is currently restricted to a draw() method.

  • difference_smooths() can now include the group means in the difference, which many users expected. To include the group means use group_means = TRUE in the function call, e.g. difference_smooths(model, smooth = "s(x)", group_means = TRUE). Note: this function still differs from plot_diff() in package itsadug, which essentially computes differences of model predictions. The main practical difference is that other effects beyond the factor by smooth, including random effects, may be included with plot_diff().

This implements the main wish of #108 (@dinga92) and #143 (@mbolyanatz) despite my protestations that this was complicated in some cases (it isn't; the complexity just cancels out.)

  • data_slice() has been totally revised. Now, the user provides the values for the variables they want in the slice and any variables in the model that are not specified will be held at typical values (i.e. the value of the observation that is closest to the median for numeric variables, or the modal factor level.)

Data slices are now produced by passing name = value pairs for the variables and their values that you want to appear in the slice. For example

m <- gam(y ~ s(x1) + x2 + fac) data_slice(model, x1 = evenly(x1, n = 100), x2 = mean(x2))

The value in the pair can be an expression that will be looked up (evaluated) in the data argument or the model frame of the fitted model (the default). In the above example, the resulting slice will be a data frame of 100 observations, comprising x1, which is a vector of 100 values spread evenly over the range of x1, a constant value of the mean of x2 for the x2 variable, and a constant factor level, the model class of fac, for the fac variable of the model.

  • partial_derivatives() is a new function for computing partial derivatives of multivariate smooths (e.g. s(x,z), te(x,z)) with respect to one of the margins of the smooth. Multivariate smooths of any dimension are handled, but only one of the dimensions is allowed to vary. Partial derivatives are estimated using the method of finite differences, with forward, backward, and central finite differences available. Requested by @noamross #101

  • overview() provides a simple overview of model terms for fitted GAMs.

  • The new bs = "sz" basis that was released with mgcv version 1.18-41 is now supported in smooth_estimates(), draw.gam(), and draw.smooth_estimates() and this basis has its own unique plotting method.

    202

  • basis() now has a method for fitted GAM(M)s which can extract the estimated basis from the model and plot it, using the estimated coefficients for the smooth to weight the basis. #137

There is also a new draw.basis() method for plotting the results of a call to basis(). This method can now also handle bivariate bases.

tidy_basis() is a lower level function that does the heavy lifting in basis(), and is now exported. tidy_basis() returns a tidy representation of a basis supplied as an object inheriting from class "mgcv.smooth". These objects are returned in the $smooth component of a fitted GAM(M) model.

  • lp_matrix() is a new utility function to quickly return the linear predictor matrix for an estimated model. It is a wrapper to predict(..., type = "lpmatrix")

  • evenly() is a synonym for seq_min_max() and is preferred going forward. Gains argument by to produce sequences over a covariate that increment in units of by.

  • ref_level() and level() are new utility functions for extracting the reference or a specific level of a factor respectively. These will be most useful when specifying covariate values to condition on in a data slice.

  • model_vars() is a new, public facing way of returning a vector of variables that are used in a model.

  • difference_smooths() will now use the user-supplied data as points at which to evaluate a pair of smooths. Also note that the argument newdata has been renamed data. #175

  • The draw() method for difference_smooths() now uses better labels for plot titles to avoid long labels with even modest factor levels.

  • derivatives() now works for factor-smooth interaction ("fs") smooths.

  • draw() methods now allow the angle of tick labels on the x axis of plots to be rotated using argument angle. Requested by @tamas-ferenci #87

  • draw.gam() and related functions (draw.parametric_effects(), draw.smooth_estimates()) now add the basis to the plot using a caption.

    155

  • smooth_coefs() is a new utility function for extracting the coefficients for a particular smooth from a fitted model. smooth_coef_indices() is an associated function that returns the indices (positions) in the vector of model coefficients (returned by coef(gam_model)) of those coefficients that pertain to the stated smooth.

  • draw.gam() now better handles patchworks of plots where one or more of those plots has fixed aspect ratios. #190

Bug fixes

  • draw.posterior_smooths now plots posterior samples with a fixed aspect ratio if the smooth is isotropic. #148

  • derivatives() now ignores random effect smooths (for which derivatives don't make sense anyway). #168

  • confint.gam(...., method = "simultaneous") now works with factor by smooths where parm is passed the full name of a specific smooth s(x)faclevel.

  • The order of plots produced by gratia::draw.gam() again matches the order in which the smooths entered the model formula. Recent changes to the internals of gratia::draw.gam() when the switch to smooth_estimates() was undertaken lead to a change in behaviour resulting from the use of dplyr::group_split(), and it's coercion internally of a character vector to a factor. This factor is now created explicitly, and the levels set to the correct order. #154

  • Setting the dist argument to set response or smooth values to NA if they lay too far from the support of the data in multivariate smooths, this would lead an incorrect scale for the response guide. This is now fixed. #193

  • Argument fun to draw.gam() was not being applied to any parametric terms. Reported by @grasshoppermouse #195

  • draw.gam() was adding the uncertainty for all linear predictors to smooths when overall_uncertainty = TRUE was used. Now draw.gam() only includes the uncertainty for those linear predictors in which a smooth takes part. #158

  • partial_derivatives() works when provided with a single data point at which to evaluate the derivative. #199

  • transform_fun.smooth_estimates() was addressing the wrong variable names when trying to transform the confidence interval. #201

  • data_slice() doesn't fail with an error when used with a model that contains an offset term. #198

  • confint.gam() no longer uses evaluate_smooth(), which is soft deprecated.

    167

  • qq_plot() and worm_plot() could compute the wrong deviance residuals used to generate the theoretical quantiles for some of the more exotic families (distributions) available in mgcv. This also affected appraise() but only for the QQ plot; the residuals shown in the other plots and the deviance residuals shown on the y-axis of the QQ plot were correct. Only the generation of the reference intervals/quantiles was affected.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 3 years ago

gratia - gratia version 0.7.3 is released and on CRAN

gratia 0.7.3

This is a minor release for gratia, mainly motivated by a request to fix outputs from examples on M1 Macs where the results printed deviated markedly from the reference output generated on my Linux machine. The full entry for the release in NEWS.md is reproduced below.

User visible changes

  • Plots of smooths now use "Partial effect" for the y-axis label in place of "Effect", to better indicate what is displayed.

New features

  • confint.fderiv() and confint.gam() now return their results as a tibble instead of a common-or-garden data frame. The latter mostly already did this.

  • Examples for confint.fderiv() and confint.gam() were reworked, in part to remove some inconsistent output in the examples when run on M1 macs.

Bug fixes

  • compare_smooths() failed when passed non-standard model "names" like compare_smooths(m_gam, m_gamm$gam) or compare_smooths(l[[1]], l[[2]]) even if the evaluated objects were valid GAM(M) models. Reported by Andrew Irwin #150

Scientific Software - Peer-reviewed - R
Published by gavinsimpson over 3 years ago

gratia - gratia version 0.7.2 is released and on CRAN

gratia 0.7.2 is available and on CRAN

Following the release of version 0.7.0, a couple of annoying bugs were identified which necessitated a patch release. I had implemented methods to plot partial effects for 3d and 4d smooths so decided to include these early enhancements in the patch release to try to shake out any bugs or problems with the implementation prior to a more substantial point (0.8.0) release later in the year (planned for September 2022 at the latest as gratia is needed for a GAM course). Similarly, the problem that delayed 0.7.1 (below) meant that a new plotting method to handle splines on the sphere snuck in to the release, for the same reasons as handling >2d smooths.

Due to an issue with the size of the package source tarball, which wasn't discovered until after submission to CRAN, 0.7.1 was never released.

While binaries for Windows and MacOS X systems are being built, you can install version 0.7.2 from R Universe: https://gavinsimpson.r-universe.dev/ui#builds

New features

  • draw.gam() and draw.smooth_estimates() can now handle splines on the sphere (s(lat, long, bs = "sos")) with special plotting methods using ggplot2::coord_map() to handle the projection to spherical coordinates. An orthographic projection is used by default, with an essentially arbitrary (and northern hemisphere-centric) default for the orientation of the view.

    plot (1)

  • draw.gam() and draw.smooth_estimates(): {gratia} can now handle smooths of 3 or 4 covariates when plotting. As an example of what is possible, the figure below shows the estimated smooths from y ~ s(x,z) + s(year, bs = "cr") + ti(x,z, year, d = c(2,1), bs = c("tp", "cr")) for a space-time GAM modelling shrimp abundance. The layout has been tweaked a little (via the design argument to patchwork::plot_layout()) from the default you get with draw.gam() but otherwise it is unchanged.

    space-time-tensor-product-ti-smoother

    For smooths of 3 covariates, the third covariate is handled with ggplot2::facet_wrap() and a set (default n = 16) of small multiples is drawn, each a 2d surface evaluated at the specified value of the third covariate. For smooths of 4 covariates, ggplot2::facet_grid() is used to draw the small multiples, with the default producing 4 rows by 4 columns of plots at the specific values of the third and fourth covariates. The number of small multiples produced is controlled by new arguments n_3d (default = n_3d = 16) and n_4d (default n_4d = 4, yielding n_4d * n_4d = 16 facets) respectively.

    This only affects plotting; smooth_estimates() has been able to handle smooths of any number of covariates for a while.

    When handling higher-dimensional smooths, actually drawing the plots on the default device can be slow, especially with the default value of n = 100 (which for 3D or 4D smooths would result in 160,000 data points being plotted). As such it is recommended that you reduce n to a smaller value:

    n = 50 is a reasonable compromise of resolution and speed.

  • model_concurvity() returns concurvity measures from mgcv::concurvity() for estimated GAMs in a tidy format. The synonym concrvity() [sic] is also provided. A draw() method is provided which produces a bar plot or a heatmap of the concurvity values depending on whether the overall concurvity of each smooth or the pairwise concurvity of each smooth in the model is requested.

  • fitted_values() insures that data (and hence the returned object) is a tibble rather than a common or garden data frame.

  • draw.gam() gains argument resid_col = "steelblue3" that allows the colour of the partial residuals (if plotted) to be changed.

Bug fixes

  • draw.posterior_smooths() was redundantly plotting duplicate data in the rug plot. Now only the unique set of covariate values are used for drawing the rug.

  • data_sim() was not passing the scale argument in the bivariate example setting ("eg2").

  • draw() methods for gamm() and gamm4::gamm4() fits were not passing arguments on to draw.gam().

  • draw.smooth_estimates() would produce a subtitle with data for a continuous by smooth as if it were a factor by smooth. Now the subtitle only contains the name of the continuous by variable.

  • model_edf() was not using the type argument. As a result it only ever returned the default EDF type.

  • add_constant() methods weren't applying the constant to all the required variables.

  • draw.gam(), draw.parametric_effects() now actually work for a model with only parametric effects. #142 Reported by @Nelson-Gon

  • parametric_effects() would fail for a model with only parametric terms because predict.gam() returns empty arrays when passed exclude = character(0).

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 4 years ago

gratia - gratia version 0.7.0 now on CRAN

gratia version 0.7.0 released

I am pleased to announce the release of version 0.7.0 of the gratia package. gratia is intended to make working with generalized additive models (GAMs) easier and to facilitate the production of high quality visualizations of estimated smooths and entire models using the ggplot2 package.

Version 0.7.0 of the package represents a significant milestone: the main user-facing and internal functions for evaluating estimated smooths at covariate values have been entirely replaced by new functions written from the ground up to be easier to extend and maintain than the original functions. These new functions are smooth_estimates() and parametric_effects(). Consequently, functions evaluate_smooth() and evaluate_parametric_term() are now soft-deprecated; a warning will be issued upon their first usage to encourage the use of the new functions.

smooth_estimates() and parametric_effects() are more capable and easier to extend than their deprecated forebears. They can return results for multiple smooth or parametric terms in a single call, while the internals allow for new smooth types that require specialist handling to be added without rewriting the main code base or extensive redesigns.

The main user-facing plotting function draw() for fitted GAMs and related models has been rewritten to use smooth_estimates() and parametric_effects(). Some small differences in behaviour may be encountered, but it is expected that previous code using gratia is backward compatible.

In addition to the major changes described above, version 0.7.0 also introduces a ranges of new functions to make the GAM-related aspects of your life a little bit easier.

  • fitted_values() produces fitted or estimated values from the model. These can be on the scale of the link function or the response and a credible interval is provided for the requested coverage on the chosen scale.
  • rootogram() provides rootogram diagnostics, mainly for count-based models (fitted with families poisson(), negbin(), nb(), and gaussian()), but other families may be supported in the future. The draw() method can plot various kinds of rootogram from the results of rootogram().
  • New helper functions typical_values(), factor_combos() and data_combos() for quickly creating data sets for producing predictions from fitted models where some of the covariates are fixed at come typical or representative values.
  • edf() extracts the effective degrees of freedom (EDF) of a fitted model or a specific smooth in the model. Various forms for the EDF can be extracted.
  • model_edf() returns the EDF of the overall model. If supplied with multiple models, the EDFs of each model are returned for comparison.

Additional new features and information of bugs fixed can be found in the news.

The package has a new pkgdown website, with search facility: https://gavinsimpson.github.io/gratia/

Finally, I know the documentation available for the package and individual functions isn't anywhere near as good as it could be. I have tried to provide examples for the user-facing functions in the package. In addition, this version of gratia comes with a Getting Started vignette, which shows some of the main functions for working with GAMs with gratia. Development on the package towards version 0.8.0 will have a focus on providing better documentation and additional vignettes to illustrate the range of functionality in the package.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 4 years ago

gratia - gratia version 0.5.1 now on CRAN

This release was prompted by an issue with an argument naming choice in the new smooth_estimates() function. Some additional functionality was completed prior to realising I needed to release 0.5.1,

User visible changes

  • The newdata argument to smooth_estimates() has been changed to data as was originally intended.

New features

  • smooth_estimates() can now handle

    • bivariate and multivariate thinplate regression spline smooths, e.g. s(x, z, a),
    • tensor product smooths (te(), t2(), & ti()), e.g. te(x, z, a)
    • factor smooth interactions, e.g. s(x, f, bs = "fs")
    • random effect smooths, e.g. s(f, bs = "re")
  • penalty() provides a tidy representation of the penalty matrices of smooths. The tidy representation is most suitable for plotting with ggplot().

A draw() method is provided, which represents the penalty matrix as a heatmap.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 5 years ago

gratia - gratia version 0.5.0 now on CRAN

gratia 0.5.0

Covid-19- and teaching left me little development time, but a prompt from CRAN to address the use of {vdiffr} 📦 in package tests spurred me to wrap up some of the new features I had committed to the development version.

I also took the opportunity to complete the initial steps on a replacement for (or more accurately a successor to) evaluate_smooth(). Some early decisions I made when developing evaluate_smooth() meant that it was increasingly difficult to maintain and add support for more complex models, due to the way I had handled factor by variable smooths.

The replacement/successor is smooth_estimates(). At the moment it only handles simple 1-D smooths, but it should be much easier to accommodate other smooth types and more complex models with multiple linear predictors.

Eventually, once smooth_estimates() can handle the range of smooths and models that evaluate_smooth() can currently, I'll swap out instances of evaluate_smooth() from the higher-level functions that rely upon it. At the moment I don't plan on removing evaluate_smooth() from {gratia}, but its use will be at the very least soft-deprecated.

Some of the News for the release is copied below.

New features

  • Partial residuals for models can be computed with partial_residuals(). The partial residuals are the weighted residuals of the model added to the contribution of each smooth term (as returned by predict(model, type = "terms").

Wish of #76 (@noamross)

Also, new function add_partial_residuals() can be used to add the partial residuals to data frames.

  • Users can now control to some extent what colour or fill scales are used when plotting smooths in those draw() methods that use them. This is most useful to change the fill scale when plotting 2D smooths, or to change the discrete colour scale used when plotting random factor smooths (bs = "fs").

The user can pass scales via arguments discrete_colour and continuous_fill.

  • The effects of certain smooths can be excluded from data simulated from a model using simulate.gam() and predicted_samples() by passing exclude or terms on to predict.gam(). This allows for excluding random effects, for example, from model predicted values that are then used to simulate new data from the conditional distribution. See the example in predicted_samples().

Wish of #74 (@hgoldspiel)

  • draw.gam() and related functions gain arguments constant and fun to allow for user-defined constants and transformations of smooth estimates and confidence intervals to be applied.

Part of wish of Wish of #79.

  • confint.gam() now works for 2D smooths also.

  • smooth_estimates() is an early version of code to replace (or more likely supersede) evaluate_smooth(). smooth_estimates() can currently only handle 1D smooths of the standard types.

User visible changes

  • The meaning of parm in confint.gam has changed. This argument now requires a smooth label to match a smooth. A vector of labels can be provided, but partial matching against a smooth label only works with a single parm value.

The default behaviour remains unchanged however; if parm is NULL then all smooths are evaluated and returned with confidence intervals.

  • data_class() is no longer exported; it was only ever intended to be an internal function.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 5 years ago

gratia - Version 0.4.1 released to CRAN

Version 0.4.1 of gratia has been released to CRAN. Version 0.4.0 existed for a short while but the release to CRAN was pulled because of a last minute change needed to accommodate v 1.0.0 of dplyr that had gone overlooked in the testing for 0.4.0.

This gave me an opportunity to fix an additional bug (#73) as well.

The full list of changes is reproduced below for version 0.4.1 and 0.4.0.

gratia 0.4.1

User visible changes

  • draw.gam() with scales = "fixed" now applies to all terms that can be plotted, including 2d smooths.

Reported by @StefanoMezzini #73

Bug fixes

  • dplyr::combine() was deprecated. Switch to vctrs::vec_c().

  • draw.gam() with scales = "fixed" wasn't using fixed scales where 2d smooths were in the model.

Reported by @StefanoMezzini #73

gratia 0.4.0

New features

  • draw.gam() can now include partial residuals when drawing univariate smooths. Use residuals = TRUE to add partial residuals to each univariate smooth that is drawn. This feature is not available for smooths of more than one variable, by smooths, or factor-smooth interactions (bs = "fs").

  • The coverage of credible and ocnfidence intervals drawn by draw.gam() can be specified via argument ci_level. The default is arbitrarily 0.95 for no other reason than (rough) compatibility with plot.gam().

This chance has had the effect of making the intervals slightly narrower than in previous versions of gratia; intervals were drawn at ± 2 × the standard error. The default intervals are now drawn at ± ~1.96 × the standard error.

  • New function difference_smooth() for computing differences between factor smooth interactions. Methods available for gam(), bam(), gamm() and gamm4::gamm4(). Also has a draw() method, which can handle differences of 1D and 2D smooths currently (handling 3D and 4D smooths is planned).

  • New functions add_fitted() and add_residuals() to add fitted values (expectations) and model residuals to an existing data frame. Currently methods available for objects fitted by gam() and bam().

  • data_sim() is a tidy reimplementation of mgcv::gamSim() with the added ability to use sampling distributions other than the Gaussian for all models implemented. Currently Gaussian, Poisson, and Bernoulli sampling distributions are available.

  • smooth_samples() can handle continuous by variable smooths such as in varying coefficient models.

  • link() and inv_link() now work for all families available in mgcv, including the location, scale, shape families, and the more specialised families described in ?mgcv::family.mgcv.

  • evaluate_smooth(), data_slice(), family(), link(), inv_link() methods for models fitted using gamm4() from the gamm4 package.

  • data_slice() can generate data for a 1-d slice (a single variable varying).

  • The colour of the points, reference lines, and simulation band in appraise() can now be specified via arguments

    • point_col,
    • point_alpha,
    • ci_col
    • ci_alpha
    • line_col

These are passed on to qq_plot(), observed_fitted_plot(), residuals_linpred_plot(), and residuals_hist_plot(), which also now take the new arguments were applicable.

  • Added utility functions is_factor_term() and term_variables() for working with models. is_factor_term() identifies is the named term is a factor using information from the terms() object of the fitted model. term_variables() returns a character vector of variable names that are involved in a model term. These are strictly for working with parametric terms in models.

  • appraise() now works for models fitted by glm() and lm(), as do the underlying functions it calls, especially qq_plot.

appraise() also works for models fitted with family gaulss(). Further locational scale models and models fitted with extended family functions will be supported in upcoming releases.

User visible changes

  • datagen() is now an internal function and is no longer exported. Use data_slice() instead.

  • evaluate_parametric_terms() is now much stricter and can only evaluate main effect terms, i.e. those whose order, as stored in the terms object of the model is 1.

Bug fixes

  • The draw() method for derivatives() was not getting the x-axis label for factor by smooths correctly, and instead was using NA for the second and subsequent levels of the factor.

  • The datagen() method for class "gam" couldn't possibly have worked for anything but the simplest models and would fail even with simple factor by smooths. These issues have been fixed, but the behaviour of datagen() has changed, and the function is now not intended for use by users.

  • Fixed an issue where in models terms of the form factor1:factor2 were incorrectly identified as being numeric parametric terms. #68

Scientific Software - Peer-reviewed - R
Published by gavinsimpson over 5 years ago

gratia - gratia version 0.3.1

This version of gratia was prompted by changes in the upcoming 4.0.0 release of R, which makes changes to the stringsAsFactors default to be FALSE. A number of tests relied inadvertently on the implicit coercion of character vectors to factors and the derivative code made some assumptions about data only contains numeric of factor variables.

New features

In addition, this version of gratia includes new functions for extracting the link functions from models, and has been updated to work with the forthcoming release of the tibble package.

  • New functions link() and inv_link() to access the link function and its inverse from fitted models and family functions.

Methods for classes: "glm", "gam", "bam", "gamm" currently. #58

  • Adds explicit family() methods for objects of classes "gam", "bam", and "gamm".

  • derivatives() now handles non-numeric when creating shifted data for finite differences. Fixes a problem with stringsAsFactors = FALSE default in R-devel. #64

Bug fixes

  • Updated gratia to work with tibble versions >= 3.0

Scientific Software - Peer-reviewed - R
Published by gavinsimpson over 5 years ago

gratia - Bug fix release

This release fixes a bug in the use of the select argument to draw.gam(), which was resulting in the wrong smooths being plotted.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 7 years ago

gratia - First CRAN submission of gratia

gratia recently reached version 0.2-0 and after some last-minute teething issues related to a new release of the tibble package, gratia was submitted to CRAN.

The package is ready for public release and has been widely tested against a range of estimated models. In particular, the package is now used to support a paper that I've been involved with writing on hierarchical GAMs.

Scientific Software - Peer-reviewed - R
Published by gavinsimpson almost 7 years ago