ordgam

Additive proportional odds model for ordinal data

https://github.com/plambertuliege/ordgam

Last synced: 9 months ago · JSON representation

Repository

Additive proportional odds model for ordinal data

Basic Info

Host: GitHub
Owner: plambertULiege
Language: R
Default Branch: main
Size: 1.67 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed almost 3 years ago

https://github.com/plambertULiege/ordgam/blob/main/

The *ordgam* R-package
================

The **ordgam** package enables to fit an additive proportional odds
model to ordinal data.

The combination of **Laplace approximations** and of **Bayesian
P-splines** (named *LPS*) enable fast and flexible inference in a
Bayesian framework. The Gaussian Markov random field prior assumed for the
penalized spline parameters and the Bernstein-von Mises theorem
typically provide reliable Laplace approximation to the posterior
distribution of these quantities. However, this accuracy can be
seriously compromised for some unpenalized parameters, especially when
the information synthesized by the prior and the likelihood is sparse.

We propose a refined version of the LPS methodology by splitting the
parameter space in two subsets,
$\pmb{\zeta}=(\pmb{\gamma}^\top,\pmb{\theta}^\top)^\top$ . The first set
involves the non-penalized parameters $\pmb{\gamma}$ for which the joint
posterior distribution $p(\pmb{\gamma}|\pmb{\lambda},{\cal D})$ is
approached from a non-Gaussian perspective with an approximation scheme
tailored to capture asymmetry and kurtosis, while the conditional
posterior distribution
$p(\pmb{\theta}|\pmb{\gamma},\pmb{\lambda},{\cal D})$ for the penalized
parameters $\pmb{\theta}$ in the complementary set undergoes the LPS
treatment with Laplace approximations, see Fig.1 (with
$\pmb{\eta}=\pmb{\lambda}$).

The marginal posterior density $p(\pmb{\nu}|{\cal D})$ for the log of
the penalty parameters, $\pmb{\nu}=\log(\pmb{\lambda})$, also could be
approximated using a product of skew-t densities. More details can be
found in Lambert & Gressani (2023)
[\[1\]](https://doi.org/10.1177/1471082X231181173).




Fig.1: Proposed strategy to account for skewness and kurtosis in the
posterior of the non-penalized parameters.



## The *ordgam* package in action

Let us illustrate the use of the *ordgam* package on a data subset
($n=552$) from the European Social Survey (ESS 2018) specific to the
French speaking respondents from Wallonia, one of the three regions in
Belgium. Each of the participants (aged at least 15) was asked to react
to the following statement, *Gay men and lesbians should be free to live
their own life as they wish*, with a positioning on a Likert scale going
from 1 (*Agree strongly*) to 5 (*Disagree strongly*).

``` r
## Package installation and loading
## install.packages("devtools")
## devtools::install_github("plambertULiege/ordgam")
library(ordgam) 
    
## Data reading
data(freehmsDataBE)
donnees = subset(freehmsDataBE,region=="WAL") ## Focus on Wallonia
head(donnees)
```

    ##    freehms   gndr age eduyrs region
    ## 2        1   Male  25     17    WAL
    ## 3        2 Female  76      6    WAL
    ## 7        1   Male  69     17    WAL
    ## 9        2   Male  36     10    WAL
    ## 12       1   Male  54      7    WAL
    ## 18       1 Female  28     14    WAL

### Model fitting

Let us fit a proportional odds model to these data with the number of
completed years of education ($14.1\pm 4.4$ years) and age
($47.3\pm 18.5$ years) entering as additive terms, $L=10$ recentered
B-splines spanning each covariate range, and Gamma priors for the
penalty parameters, $\lambda_j\sim{\cal G}(1,10^{-4})$ ($j=1,2$).

``` r
## Model fit with a Gamma prior for the penalty parameters
mod = ordgam(freehms ~ s(eduyrs) + s(age), data=donnees, descending = FALSE) 
print(mod)                                  
```

    ## Call:
    ##  ordgam(formula = freehms ~ s(eduyrs) + s(age), data = donnees,      descending = FALSE) 
    ## 
    ## Prior set on the regression paramaters 
    ## 
    ## Parameter estimation:
    ##       est      se     low      up     Z   Pval    
    ## 1  0.0272  0.1152 -0.1986  0.2530  0.24   0.81    
    ## 2  1.6318  0.1400  1.3574  1.9062 11.66 <2e-16 ***
    ## 3  2.5509  0.1863  2.1858  2.9160 13.69 <2e-16 ***
    ## 4  4.4168  0.4163  3.6008  5.2328 10.61 <2e-16 ***
    ## 
    ## Approximate significance of smooth terms:
    ##            edf    Tr     Pval  Chi2    Pval    
    ## f(eduyrs) 1.24  1.01 0.477617  1.15 0.35501    
    ## f(age)    2.55 18.77 0.000265 18.82 0.00017 ***
    ## 
    ## Selected penalty parameters :  191.7967 18.40125 
    ## Lambda log prior:  dgamma(lambda, 1, 1e-04, log = TRUE)
    ## 
    ## Likelihood - Information criterions (n = 552):
    ##      edf     logL logLmarg      AIC      BIC 
    ##     7.79  -598.66  -606.75  1212.90  1246.49 
    ## 
    ## NOTE: model the odds of a response value in the lower scale

The estimated additive terms can also be visualized:

``` r
plot(mod, mfrow=c(1,2))
```





### Asymmetric posterior for the non-penalized parameters

The asymmetry of the posterior for the non-penalized parameters
$\pmb{\gamma}$ can be visualized. The first step in the approach
requires the projection of $\pmb{\gamma}$ on the eigenvectors of the
variance-covariance matrix $\Sigma_\lambda^{\gamma\gamma}$, yielding
$\tilde{\pmb{\gamma}}$ and its approximately independent components. The
posterior density $p(\tilde\gamma_k|\lambda,{\cal D})$ is further
approximated using a skew-t density, revealing a non-negligible
asymmetry for the posterior of $\tilde\gamma_1$ (corresponding to the
direction of the eigenvector with the largest eigenvalue):

``` r
model = mod  ## Model for which the approximation is required
    
## Skew-t approximation to the marginal of gamma.tilde[k]
ngamma = with(model, nalpha+nfixed)  ## Number of non-penalized parms
gamt.ST = list()  ## Skew-t coefs --> dst(x,dp=coef)
for (k in 1:ngamma){ ## Loop over the gamma.tilde components
     x.grid = seq(-4,4,length=10)  ## Grid of values for gamma.tilde[k]
     lfy.grid = ordgam::lmarg.gammaTilde(x.grid,k=k,model)  ## log p(gamma.tilde[k] | D)
     gamt.ST[[k]] = ordgam:::STapprox(x.grid,lfy.grid)$dp  ## Approximate using a skew-t
}
    
## Visualization of the posterior for 
par(mfrow=c(2,2),mar=c(4,5,1,1))
for (k in 1:ngamma){ ## Loop over the gamma.tilde components
     xlab = bquote(tilde(gamma)[.(k)])
     ylab = bquote(p(tilde(gamma)[.(k)]~ "|"~lambda~","~D))
     xlim = sn::qst(c(.0001,.9999),dp=gamt.ST[[k]])
     curve(sn::dst(x,dp=gamt.ST[[k]]), xlim=xlim,
           xlab=xlab,ylab=ylab,col="blue",lwd=2,lty=1) 
}
```



The results can be re-expressed in the original parametrization, yieding
$p(\gamma_k|\lambda,{\cal D})$ and a noticable asymmetry for the
posterior density of $\gamma_4$:

``` r
## Sampling  using the (approximate) independence of its components
M = 10000
sample.tilde = matrix(nrow=M,ncol=ngamma)
for (k in 1:ngamma){
    ## ... or using skew-t approximation
    sample.tilde[,k] = c(sn::rst(M, dp=gamt.ST[[k]]))
}
    
## Revert sample to the original  parametrization
gam.hat = model$theta[1:ngamma]  ## MAP estimate
Sig = model$Sigma.theta[1:ngamma,1:ngamma] ## Variance-covariance
sv = svd(Sig) ; V = sv$u ; omega = sv$d  ## SVD
sample.gam = t(gam.hat + V %*% (sqrt(omega) * t(sample.tilde)))
    
## Skew-t approximation to the marginal of gam[k]
gam.ST = gam.ST2 = list()
for (k in 1:ngamma){
    temp = hist(sample.gam[,k],plot=FALSE)
    gam.ST[[k]] = ordgam:::STapprox(temp$mids,log(temp$density+1e-6))$dp
    ## temp = selm(sample.gam[,k] ~ 1, family="ST")
    ## gam.ST2[[k]] = coef(temp,"DP")
}
    
## Visualize p(gamma[k] | lambda,data)
par(mfrow=c(2,2),mar=c(4,5,1,1))
for (k in 1:ngamma){
    xlab = bquote(gamma[.(k)])
    ylab = bquote(p(gamma[.(k)]~ "|"~lambda~","~D))
    xlim = sn::qst(c(.0001,.9999),dp=gam.ST[[k]])
    curve(sn::dst(x,dp=gam.ST[[k]]), lwd=2,col="blue",
          xlim=xlim, xlab=xlab, ylab=ylab)
}
```



### Marginal posterior of the penalty parameters

Let us revisit the same model for the data collected in Flanders, the
most populated region of Belgium.

``` r
## Model fit with a Gamma prior for the penalty parameters
mod.FL = ordgam(freehms ~ s(eduyrs) + s(age), data=subset(freehmsDataBE,region=="FL"), 
                descending = FALSE) 
print(mod.FL)
```

    ## Call:
    ##  ordgam(formula = freehms ~ s(eduyrs) + s(age), descending = FALSE,
    ##         data = subset(freehmsDataBE, region == "FL")) 
    ## 
    ## Prior set on the regression paramaters 
    ## 
    ## Parameter estimation:
    ##      est     se    low     up     Z    Pval    
    ## 1 0.3176 0.0866 0.1479 0.4872  3.67 0.00024 ***
    ## 2 2.7105 0.1367 2.4426 2.9784 19.83 < 2e-16 ***
    ## 3 3.4898 0.1829 3.1312 3.8484 19.08 < 2e-16 ***
    ## 4 4.8852 0.3407 4.2174 5.5530 14.34 < 2e-16 ***
    ## 
    ## Approximate significance of smooth terms:
    ##            edf   Tr     Pval Chi2    Pval    
    ## f(eduyrs) 1.15 27.7 9.64e-07 29.0 1.0e-07 ***
    ## f(age)    2.57 34.6 1.41e-06 34.6 7.9e-08 ***
    ## 
    ## Selected penalty parameters :  296.6694 33.87409 
    ## Lambda log prior:  dgamma(lambda, 1, 1e-04, log = TRUE)
    ## 
    ## Likelihood - Information criterions (n = 1046):
    ##      edf     logL logLmarg      AIC      BIC 
    ##     7.72  -923.13  -932.09  1861.70  1899.94 
    ## 
    ## NOTE: model the odds of a response value in the lower scale

``` r
fhat.FL = plot(mod.FL, mfrow=c(1,2))
```



The estimated additive terms for education and age look a bit different
from the estimates for Wallonia with, here, the suggestion of a
statistically significant and linear effect of 'eduyrs'.

The marginal posterior distribution of the log of the penalty
parameters, $\nu_j=\log\lambda_j$, can be visualized and compared to the
their prior:

``` r
model = mod.FL
## log(lambda) marginal posteriors
par(mfrow=c(1,2),mar=c(4,5,3,1))
for (j in 1:2){ ## Loop over  components
  ## Plot of the approximating skew-t to the marginal posterior of 
  xlims = sn::qst(c(.001,.999),dp=model$nu.dp[[j]])
  xlab = bquote(log(lambda[.(j)]))
  ylab = bquote(p(log(lambda[.(j)])~ "|"~D))
  curve(sn::dst(x,dp=model$nu.dp[[j]]),xlims[1],xlims[2],lwd=2,col="red",
        main=names(model$lambda)[j],xlab=xlab,ylab=ylab,ylim=c(0,.4))
  ## ... compared to the prior for 
  curve(exp(model$lprior.lambda(exp(x)))*exp(x),add=T,lty=3,lwd=2)
  legend("topright", legend=c("Posterior","Prior"),
         lwd=c(2,2), lty=c(1,3), col=c("red","black"),
         horiz=TRUE, bty="n",cex=1)
}
```



While the first posterior associated to penalty parameter for eduyrs
does not depart from its prior with a large prior mean (as
$\lambda_1 \sim {\cal G}(1,10^{-4})$ has prior mean $10^4$) and suggests
linearity, the posterior associated to $\nu_2=\log(\lambda_2)$ suggests a
non-linear effect of age by pointing to smaller values for the penalty
parameter.

## License

**ordgam**: Additive proportional odds model for ordinal data using
Laplace approximations and Bayesian P-splines. Copyright (C) 2022-2023
Philippe Lambert

This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.

You should have received a copy of the GNU General Public License along
with this program. If not, see .

## References

\[1\] Lambert, P. and Gressani, O (2023). Penalty parameter selection
and asymmetry corrections to Laplace approximations in Bayesian
P-splines models. *Statistical Modelling* [*doi:10.1177/1471082X231181173*](https://doi.org/10.1177/1471082X231181173). Preprint: [*arXiv:2210.01668*](https://arxiv.org/abs/2210.01668)

\[2\] Lambert, P. (2023) R-package *ordgam* - CRAN and GitHub:
[plambertULiege/ordgam](https://github.com/plambertULiege/ordgam)

Owner

Name: Philippe Lambert
Login: plambertULiege
Kind: user
Company: ULiege & UCLouvain

Website: http://www.statsoc.ulg.ac.be
Repositories: 1
Profile: https://github.com/plambertULiege

Professor at ULiege and UCLouvain (Belgium)

GitHub Events

Total

Last Year

Packages

Total packages: 1
Total downloads:
- cran 573 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: ordgam

Additive Model for Ordinal Data using Laplace P-Splines

Homepage: <https://github.com/plambertULiege/ordgam>
Documentation: http://cran.r-project.org/web/packages/ordgam/ordgam.pdf
License: GPL-3
Latest release: 0.9.1
published almost 3 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 573 Last month

Rankings

Dependent packages count: 28.0%

Forks count: 28.1%

Stargazers count: 34.9%

Dependent repos count: 36.6%

Average: 42.7%

Downloads: 85.8%

Maintainers (1)

p.lambert@uliege.be

Last synced: 10 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science