19 June 2018 (Nijmegen)

Non Linearities

GLMMs are still linear models on the link scale, but we can also go truly non-linear ….

Generalized Additive Models (GAMs)

example from Baayen et al. (2017), see also Tremblay and Newman (2015)

summary(dfr123.gamTPRS) # red line, right plot
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lrt ~ s(trial, m = 2, bs = "tp", k = 20)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.812733   0.007279   798.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F p-value    
## s(trial) 14.42  16.75 49.37  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.566   Deviance explained = 57.6%
## fREML = -144.76  Scale est. = 0.033589  n = 634

Random Effect or Smooth? It's all a matter of perspective

help("random.effects",package = "mgcv")
random.effects R Documentation

Random effects in GAMs

Description

The smooth components of GAMs can be viewed as random effects for estimation purposes. This means that more conventional random effects terms can be incorporated into GAMs in two ways. The first method converts all the smooths into fixed and random components suitable for estimation by standard mixed modelling software. Once the GAM is in this form then conventional random effects are easily added, and the whole model is estimated as a general mixed model. gamm and gamm4 from the gamm4 package operate in this way.

The second method represents the conventional random effects in a GAM in the same way that the smooths are represented — as penalized regression terms. This method can be used with gam by making use of s(…,bs="re") terms in a model: see smooth.construct.re.smooth.spec, for full details. The basic idea is that, e.g., s(x,z,g,bs="re") generates an i.i.d. Gaussian random effect with model matrix given by model.matrix(~x:z:g-1) — in principle such terms can take any number of arguments. This simple approach is sufficient for implementing a wide range of commonly used random effect structures. For example if g is a factor then s(g,bs="re") produces a random coefficient for each level of g, with the random coefficients all modelled as i.i.d. normal. If g is a factor and x is numeric, then s(x,g,bs="re") produces an i.i.d. normal random slope relating the response to x for each level of g. If h is another factor then s(h,g,bs="re") produces the usual i.i.d. normal g - h interaction. Note that a rather useful approximate test for zero random effect is also implemented for tsuch terms based on Wood (2013). If the precision matrix is known to within a multiplicative constant, then this can be supplied via the xt argument of s. See smooth.construct.re.smooth.spec for details and example.

Alternatively, but less straightforwardly, the paraPen argument to gam can be used: see gam.models. If smoothing parameter estimation is by ML or REML (e.g. gam(…,method="REML")) then this approach is a completely conventional likelihood based treatment of random effects.

gam can be slow for fitting models with large numbers of random effects, because it does not exploit the sparcity that is often a feature of parametric random effects. It can not be used for models with more coefficients than data. However gam is often faster and more relaiable than gamm or gamm4, when the number of random effects is modest.

To facilitate the use of random effects with gam, gam.vcomp is a utility routine for converting smoothing parameters to variance components. It also provides confidence intervals, if smoothness estimation is by ML or REML.

Note that treating random effects as smooths does not remove the usual problems associated with testing variance components for equality to zero: see summary.gam and anova.gam.

Author(s)

Simon Wood <simon.wood@r-project.org>;

References

Wood, S.N. (2013) A simple test for random effects in regression models. Biometrika 100:1005-1010

Wood, S.N. (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3-36

Wood, S.N. (2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society (B) 70(3):495-518

Wood, S.N. (2006) Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics 62(4):1025-1036

See Also

gam.vcomp, gam.models, smooth.terms, smooth.construct.re.smooth.spec, gamm

Examples

## see also examples for gam.models, gam.vcomp, gamm
## and smooth.construct.re.smooth.spec

## simple comparison of lme and gam
require(mgcv)
require(nlme)
b0 <- lme(travel~1,data=Rail,~1|Rail,method="REML") 

b <- gam(travel~s(Rail,bs="re"),data=Rail,method="REML")

intervals(b0)
gam.vcomp(b)
anova(b)
plot(b)

## simulate example...
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth

fac <- sample(1:20,400,replace=TRUE)
b <- rnorm(20)*.5
dat$y <- dat$y + b[fac]
dat$fac <- as.factor(fac)

rm1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
gam.vcomp(rm1)

fv0 <- predict(rm1,exclude="s(fac)") ## predictions setting r.e. to 0
fv1 <- predict(rm1) ## predictions setting r.e. to predicted values

See also: mgcv::gam, mgcv::gamm, gamm4::gamm4

Ordinal Data, Likert Scales and Other Ratings

Stevens (1946)

Liddell and Kruschke (2017)

sometimes \(1 + 1 \neq 2\)

Ordinal Regression

ordinal::clmm

Robust Regression

increase your breakdown point

outliers need not be special

Quantile Regression

lqmm

Rank Regression

rlme (seems incomplete – missing predict() and fitted() methods)

Heavy-Tailed Likelihoods

heavy (seems incomplete – missing predict() and fitted() methods) see also Wilcox (2012)

Modern Estimators

##                      not.robust.modified robust.modified
## Coef                                                    
## (Intercept)          22.8 (0.792)        23 (0.911)     
##                                                         
## VarComp                                                 
## (Intercept) | plate  1.40                0.96           
## (Intercept) | sample 1.80                2.12           
##                                                         
## sigma                0.609               0.573          
##                                                         
## deviance             379

robustlmm see also Wilcox (2012)

Regularized Regression

deal with the Stein paradox

prioritize prediction

select variables

bust the model complexity myth

LASSO (\(L_1\) regularized) Regression

## 
## Attaching package: 'glmmLasso'
## The following objects are masked from 'package:brms':
## 
##     acat, cumulative
## Linear mixed-effects model fit by REML
##  Data: soccer 
##        AIC      BIC    logLik
##   389.1851 409.5367 -183.5925
## 
## Random effects:
##  Formula: ~1 + ave.attend | team
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev       Corr  
## (Intercept) 3.2974279970 (Intr)
## ave.attend  0.0006841471 0     
## Residual    8.6433247210       
## 
## Fixed effects: points ~ transfer.spendings + ave.unfair.score + ball.possession +      tackles + ave.attend + sold.out 
##                       Value Std.Error DF  t-value p-value
## (Intercept)        46.38287  1.378129 25 33.65641  0.0000
## transfer.spendings  3.81344  1.538115 25  2.47930  0.0203
## ave.unfair.score   -0.49914  1.342518 25 -0.37180  0.7132
## ball.possession     1.17845  2.136306 25  0.55163  0.5861
## tackles            -0.23876  1.943919 25 -0.12282  0.9032
## ave.attend          3.18495  1.679808 25  1.89602  0.0696
## sold.out            5.15003  1.656701 25  3.10860  0.0046
##  Correlation: 
##                    (Intr) trnsf. av.nf. bll.ps tackls av.ttn
## transfer.spendings  0.010                                   
## ave.unfair.score    0.002  0.167                            
## ball.possession     0.012 -0.082 -0.245                     
## tackles            -0.007 -0.021  0.152 -0.560              
## ave.attend          0.022 -0.157  0.181 -0.353 -0.020       
## sold.out           -0.005 -0.305  0.030 -0.200 -0.161  0.023
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.7927449 -0.6703846 -0.1164276  0.6025327  2.2850764 
## 
## Number of Observations: 54
## Number of Groups: 23
## Warning in est.glmmLasso.RE(fix = fix, rnd = rnd, data = data, lambda =
## lambda, : Random slopes are not standardized back!
## Call:
## glmmLasso(fix = points ~ transfer.spendings + ave.unfair.score + 
##     ball.possession + tackles + ave.attend + sold.out, rnd = list(team = ~1 + 
##     ave.attend), data = soccer, lambda = 10, control = list(index = c(1, 
##     2, 3, 4, NA, 5), method = "REML", print.iter = FALSE))
## 
## 
## Fixed Effects:
## 
## Coefficients:
##                     Estimate StdErr z.value p.value
## (Intercept)        42.613749     NA      NA      NA
## transfer.spendings  2.977187     NA      NA      NA
## ave.unfair.score   -0.086569     NA      NA      NA
## ball.possession     0.199322     NA      NA      NA
## tackles             0.000000     NA      NA      NA
## sold.out            5.014555     NA      NA      NA
## ave.attend          2.938222     NA      NA      NA
## 
## Random Effects:
## 
## StdDev:
##                       team team:ave.attend
## team             5.0000000      -0.1503555
## team:ave.attend -0.1503555       5.0000000

least absolute shrinkage and selection operator, see also LARS,"least-angle regression", glmmLasso

Other Penalties

  • Ridge Regression
    • \(L_2\) penalty
    • will not exactly zero-out variables
    • a special case of Tikhonov regularization
  • Elastic Net Regression
    • combine ridge and LASSO
    • dynamic mixture of penalties
    • best of both worlds

the "shape" of constraints

Hastie, Tibshirani, and Friedman (2009); James et al. (2013)

Advertising Break

See how to do all of these things from a single package with a Bayesian perspective with brms in the Bayes course!

And we'll also discuss the other Bayesian packages for MEM: MCMCglmm, rstanarm, INLA, blme, etc.

Extending PCA to Regression

select (linear combinations of) variables

eliminate collinearity

reduce dimensionality

Principle components regression (PCR)

  • transform your predictors into principle components
  • regress against these components
  • (optional) transform back
  • variance of the dependent variable plays no role in determing loadings

Partial Least Squares (PLS)

  • "simultaneous" principle components with regression
  • variance of the dependent variable plays a role in determining loadings

Don't Limit Yourself to Lines

Bibliography

Baayen, Harald, Shravan Vasishth, Reinhold Kliegl, and Douglas Bates. 2017. “The Cave of Shadows: Addressing the Human Factor with Generalized Additive Mixed Models.” Journal of Memory and Language 94 (June): 206–34. doi:10.1016/j.jml.2016.11.006.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. Springer Verlag. http://statweb.stanford.edu~tibs/ElemStatLearn/.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. Springer. doi:10.1007/978-1-4614-7138-7.

Liddell, Torrin, and John Kruschke. 2017. “Analyzing Ordinal Data with Metric Models: What Could Possibly Go Wrong?” Open Science Framework. doi:10.17605/osf.io/9h3et.

Stevens, S S. 1946. “On the Theory of Scales of Measurement.” Science 103 (2684): 677–80.

Tremblay, Antoine, and Aaron J. Newman. 2015. “Modeling Nonlinear Relationships in Erp Data Using Mixed-Effects Regression with R Examples.” Psychophysiology 52 (August): 124–39. doi:10.1111/psyp.12299.

Wilcox, Rand R. 2012. Introduction to Robust Estimation and Hypothesis Testing. 3. ed. Academic Press.