DEM 7283 - Splines and Generalized Additive Models

Splines
- piece-wise splines
Cubic splines
Regression splines
Using smooth terms in other models

In this example, we will describe how to use splines to model nonlinearities in predictor variables and the generalized additive model.

Splines

Splines are nothing more than a set of linear or non-linear functions that are tied together to construct a model for data.

piece-wise splines

In the simplest example, consider a regression function whose form changes at some point in the data. So if we see data like this:

b0<-1; b1=.5;b2=35
x<-seq(1:100)
y1<-rnorm(70,b0+b1*x[1:70], 1) 
y2<-rnorm(30, b2, 1)

y<-c(y1, y2)
plot(y~x, main="linear fit to nonlinear data")

we might say, yes, I can fit a linear model to that:

plot(y~x, main="linear fit to nonlinear data")
lin<-lm(y~x)
summary(lin)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5338 -2.3689  0.4286  2.2613  5.5451 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.59458    0.63262   7.263 9.14e-11 ***
## x            0.37634    0.01088  34.604  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.139 on 98 degrees of freedom
## Multiple R-squared:  0.9243, Adjusted R-squared:  0.9236 
## F-statistic:  1197 on 1 and 98 DF,  p-value: < 2.2e-16

lines(fitted(lin)~x,col=3)

but we would be sorely mistaken, as this model clearly doesn’t fit the data.

Instead, we can model the change in the underlying model structure by using a series of knots in the function, or points along the data in which the form of the function changes.

Mathematically, knots can be written as:

$$Y(x) = \[\begin{Bmatrix} F_1(x) \text { for } x\in [x_1, x_2]\\ F_2(x) \text { for } x\in [x_2, x_3]\\ \cdots \\ F_{k}(x) \text { for } x\in [x_{k-1}, x_k]\\ \end{Bmatrix}\]

Where each of the $F_k (x)$ functions imply a different form in the interval between $x\in [x_{k-1}, x_k]$, where the $k$ breaks are at a given knot in the data.

This would generate a model that looks like this:

sp<-(lm(y~x*I(x<71)+x*I(x>71)))
summary(sp)

## 
## Call:
## lm(formula = y ~ x * I(x < 71) + x * I(x > 71))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.09929 -0.73705 -0.01578  0.74149  2.18256 
## 
## Coefficients: (1 not defined because of singularities)
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      32.28812    1.84687  17.483   <2e-16 ***
## x                 0.02251    0.02196   1.025    0.308    
## I(x < 71)TRUE   -31.34488    1.86228 -16.831   <2e-16 ***
## I(x > 71)TRUE     0.71688    1.05903   0.677    0.500    
## x:I(x < 71)TRUE   0.47920    0.02273  21.082   <2e-16 ***
## x:I(x > 71)TRUE        NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9896 on 95 degrees of freedom
## Multiple R-squared:  0.9927, Adjusted R-squared:  0.9924 
## F-statistic:  3236 on 4 and 95 DF,  p-value: < 2.2e-16

plot(y~x,main="Piece-wise linear fit to nonlinear data, knot at x=70")
lines(fitted(sp)~x,col=2)

Of course, this can be much more complicated, with multiple knots in the data, to model multiple changes in the underlying data generating fucntion

n <- 100 # number of data points
t <- seq(0,2*pi,,100)
a <- 3
b <- 2
c.norm <- rnorm(n)
amp <- 1

# generate data and calculate "y"
set.seed(1)
y2 <- a*sin(b*t)+c.norm*amp # Gaussian/normal error

# plot results
plot(t, y2, t="l", ylim=range(y2)*c(1,1.2), main="Sine curve with gaussian noise")

plot(t, y2, t="l", ylim=range(y2)*c(1,1.2), main="Terrible linear model")

abline(lm(y2~t), col=3)

Splines are often characterized by the number of knots used to build them

library(splines)
library(mgcv)

## Loading required package: nlme

## This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.

plot(t, y2, t="l", ylim=range(y2)*c(1,1.2), main="Increasingly complex splines")
fit<-spline(y2,t)
lines(spline(t,y2, n = 4),col="darkgreen",lwd=2,type="l")
lines(spline(t,y2, n = 10),col="red",lwd=2,type="l")
lines(spline(t,y2, n = 20),col="blue",lwd=2,type="l")
lines(spline(t,y2, n = 30),col="orange",lwd=2,type="l")

Cubic splines

The splines above were piecewise linear, we can also fit piecewise cubic splines, where at each knot a cubic polynomial is used. In the plot below, we can see that by increasing the number of knots, we better approximate the nonlinearities in the data.

plot(t, y2, t="l", ylim=range(y2)*c(1,1.2), main="Increasingly complex splines")

fit<-smooth.spline(t, y2, df=2)
fit5<-smooth.spline(t, y2, df=5)
fit10<-smooth.spline(t, y2, df=10)
lines(fit, col="red")
lines(fit5, col="green")
lines(fit10, col="blue")

Regression splines

The generalized additive model (GAM) is a modeling framework that forms a linear predictor for a regression function through the combination of both smooth and linear terms.

The GAM model forms the linear predctor of a GLM as:

\[E(y)= \beta_0 + f(x_1) + \beta_1 x_2\]

where the $f(x_1)$ term is a regression spline of one of the variables. The models can be a mixture of linear and smooth terms.

library(mgcv) #one library for GAMs
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:nlme':
## 
##     collapse

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dat<-data.frame(y=y2, x=t)

gamfit<-gam(y~s(x), data=dat)
summary(gamfit)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.10273    0.09228   1.113    0.269
## 
## Approximate significance of smooth terms:
##       edf Ref.df    F p-value    
## s(x) 7.92  8.694 54.4  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.829   Deviance explained = 84.3%
## GCV = 0.93506  Scale est. = 0.85165   n = 100

plot(gamfit)

dat%>%
  ggplot(aes(x=x, y=y))+geom_point()+geom_smooth(method = "gam", formula = y ~s(x))

BRFSS DATA

load("~/Google Drive/classes/dem7283/class_19_7283/data/brfss16_mmsa.Rdata")
set.seed(12345)
#samps<-sample(1:nrow(brfss16m), size = 40000, replace=F)
#brfss16m<-brfss16m[samps,]
#The names in the data are very ugly, so I make them less ugly
nams<-names(brfss16m)
#we see some names are lower case, some are upper and some have a little _ in the first position. This is a nightmare.
newnames<-gsub(pattern = "_",replacement =  "",x =  nams)
names(brfss16m)<-tolower(newnames)

Recode variables

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(dplyr)

#nice MSA name
brfss16m$mmsa_name<-substr(brfss16m$mmsaname, 1,nchar(brfss16m$mmsaname)-31)

#sex
brfss16m$male<-ifelse(brfss16m$sex==1, 1, 0)

#BMI
brfss16m$bmi<-ifelse(is.na(brfss16m$bmi5)==T, NA, brfss16m$bmi5/100)

#Healthy days
brfss16m$healthdays<-Recode(brfss16m$physhlth, recodes = "88=0; 77=NA; 99=NA")

#Healthy mental health days
brfss16m$healthmdays<-Recode(brfss16m$menthlth, recodes = "88=0; 77=NA; 99=NA")

brfss16m$badhealth<-Recode(brfss16m$genhlth, recodes="4:5=1; 1:3=0; else=NA")
#race/ethnicity
brfss16m$black<-Recode(brfss16m$racegr3, recodes="2=1; 9=NA; else=0")
brfss16m$white<-Recode(brfss16m$racegr3, recodes="1=1; 9=NA; else=0")
brfss16m$other<-Recode(brfss16m$racegr3, recodes="3:4=1; 9=NA; else=0")
brfss16m$hispanic<-Recode(brfss16m$racegr3, recodes="5=1; 9=NA; else=0")

brfss16m$race_eth<-Recode(brfss16m$racegr3, 
recodes="1='nhwhite'; 2='nh black'; 3='nh other';4='nh multirace'; 5='hispanic'; else=NA",
as.factor = T)
brfss16m$race_eth<-relevel(brfss16m$race_eth, ref = "nhwhite")

#insurance
brfss16m$ins<-Recode(brfss16m$hlthpln1, recodes ="7:9=NA; 1=1;2=0")

#income grouping
brfss16m$inc<-Recode(brfss16m$incomg, recodes = "9= NA;1='1_lt15k'; 2='2_15-25k';3='3_25-35k';4='4_35-50k';5='5_50kplus'", as.factor = T)
brfss16m$inc<-as.ordered(brfss16m$inc)
#education level
brfss16m$educ<-Recode(brfss16m$educa,
recodes="1:2='0Prim'; 3='1somehs'; 4='2hsgrad'; 5='3somecol'; 6='4colgrad';9=NA",
as.factor=T)
brfss16m$educ<-relevel(brfss16m$educ, ref='2hsgrad')

#employment
brfss16m$employ<-Recode(brfss16m$employ1,
recodes="1:2='employloyed'; 2:6='nilf'; 7='retired'; 8='unable'; else=NA",
as.factor=T)
brfss16m$employ<-relevel(brfss16m$employ, ref='employloyed')

#marital status
brfss16m$marst<-Recode(brfss16m$marital,
recodes="1='married'; 2='divorced'; 3='widowed'; 4='separated'; 5='nm';6='cohab'; else=NA",
as.factor=T)
brfss16m$marst<-relevel(brfss16m$marst, ref='married')

#Age cut into intervals
brfss16m$agec<-cut(brfss16m$age80, breaks=c(0,24,39,59,79,99))

#BMI, in the brfss16ma the bmi variable has 2 implied decimal places,
#so we must divide by 100 to get real bmi's

brfss16m$bmi<-brfss16m$bmi5/100

#smoking currently
brfss16m$smoke<-Recode(brfss16m$smoker3, 
recodes="1:2=1; 3:4=0; else=NA")
#brfss16m$smoke<-relevel(brfss16m$smoke, ref = "NeverSmoked")

brfss16m$obese<-ifelse(is.na(brfss16m$bmi)==T, NA, 
                       ifelse(brfss16m$bmi>30,1,0))

library(tidycensus)
library(dplyr)
usacs<-get_acs(geography = "metropolitan statistical area/micropolitan statistical area", year = 2011,
                variables=c( "DP05_0001E", "DP03_0009P", "DP03_0062E", "DP03_0119PE",
                             "DP05_0001E","DP02_0009PE","DP02_0008PE", "DP02_0040E","DP02_0038E",
                             "DP02_0066PE","DP02_0067PE","DP02_0080PE","DP02_0092PE",
                             "DP03_0005PE","DP03_0028PE","DP03_0062E","DP03_0099PE","DP03_0101PE",
                             "DP03_0119PE","DP04_0046PE","DP05_0072PE","DP05_0073PE",
                             "DP05_0066PE","DP05_0033","DP05_0032", "DP05_0072PE", "DP02_0113PE",  "DP04_0003PE") ,
                summary_var = "B01001_001",
                geometry = F, output = "wide")

## Getting data from the 2007-2011 5-year ACS

## Using the ACS Data Profile
## Using the ACS Data Profile

usacs<-usacs%>%
  mutate(totpop= DP05_0001E, fertrate = DP02_0040E,pwhite=DP05_0072PE/100, nwhite=DP05_0032E,nblack=DP05_0033E,
         pblack=DP05_0073PE/100 , phisp=DP05_0066PE/100, pfemhh=DP02_0008PE/100,
         phsormore=DP02_0066PE/100,punemp=DP03_0009PE/100, medhhinc=DP03_0062E,
         ppov=DP03_0119PE/100, pforn=DP02_0092PE/100,plep=DP02_0113PE/100,  pvacant=DP04_0003PE/100)%>%
  select(GEOID,NAME, totpop, pwhite, pblack, phisp, pfemhh, phsormore, punemp, medhhinc, ppov, pforn,plep, pvacant)


head(usacs)

Clean and standardize contextual data

myscale<-function(x){as.numeric(scale(x))}
usacs<-usacs %>%
  mutate_at(c("ppov","punemp", "pblack","phisp", "pfemhh", "phsormore", "medhhinc", "pvacant"),myscale)

usacs<-usacs%>%filter(complete.cases(.))

merged<-merge(x=brfss16m,
              y=usacs,
              by.x="mmsa",
              by.y="GEOID",
              all.x=F)

merged<-merged%>%
  select( mmsa, ststr, agec,bmi,healthmdays, age80, educ, black, hispanic, other,smoke,obese, badhealth,pblack,phisp, medhhinc,ppov,  male, mmsawt, mmsa_name )%>%
  filter(complete.cases(.))

Here is a GAM fit to the BRFSS, using smooth terms for age, bmi and number of poor mental health days

brfgam<-gam(badhealth~s(age80)+s(bmi)+s(healthmdays)+educ+male+black+hispanic+other, data=merged, weights = mmsawt/mean(mmsawt, na.rm=T), family=binomial)
summary(brfgam)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## badhealth ~ s(age80) + s(bmi) + s(healthmdays) + educ + male + 
##     black + hispanic + other
## 
## Parametric coefficients:
##              Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)  -1.65882    0.01641 -101.112  < 2e-16 ***
## educ0Prim     1.10644    0.03303   33.494  < 2e-16 ***
## educ1somehs   0.66010    0.02428   27.183  < 2e-16 ***
## educ3somecol -0.33752    0.01836  -18.382  < 2e-16 ***
## educ4colgrad -1.01834    0.02280  -44.671  < 2e-16 ***
## male          0.09580    0.01505    6.363 1.98e-10 ***
## black         0.37279    0.02125   17.541  < 2e-16 ***
## hispanic      0.57222    0.02145   26.677  < 2e-16 ***
## other         0.38646    0.03292   11.738  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                  edf Ref.df Chi.sq p-value    
## s(age80)       6.364  7.503   5585  <2e-16 ***
## s(bmi)         7.563  8.360   3067  <2e-16 ***
## s(healthmdays) 8.805  8.984  10047  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.194   Deviance explained = 19.1%
## UBRE = -0.27536  Scale est. = 1         n = 168050

plot(brfgam)

Versus the linear terms

brfgam_l<-gam(badhealth~age80+bmi+healthmdays+educ+male+black+hispanic+other, data=merged, weights = mmsawt/mean(mmsawt, na.rm=T), family=binomial)

summary(brfgam_l)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## badhealth ~ age80 + bmi + healthmdays + educ + male + black + 
##     hispanic + other
## 
## Parametric coefficients:
##                Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)  -5.1136949  0.0445867 -114.691   <2e-16 ***
## age80         0.0310745  0.0004443   69.943   <2e-16 ***
## bmi           0.0550070  0.0010734   51.248   <2e-16 ***
## healthmdays   0.0747987  0.0007573   98.777   <2e-16 ***
## educ0Prim     1.0924998  0.0326516   33.459   <2e-16 ***
## educ1somehs   0.6625019  0.0241257   27.460   <2e-16 ***
## educ3somecol -0.3322283  0.0182259  -18.228   <2e-16 ***
## educ4colgrad -1.0036175  0.0225264  -44.553   <2e-16 ***
## male          0.0183525  0.0147303    1.246    0.213    
## black         0.3670013  0.0210577   17.428   <2e-16 ***
## hispanic      0.5187349  0.0212329   24.431   <2e-16 ***
## other         0.3922489  0.0328170   11.953   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.184   Deviance explained = 18.2%
## UBRE = -0.26773  Scale est. = 1         n = 168050

LRT for the difference in model fits

anova( brfgam_l, brfgam, test="Chisq")

Using smooth terms in other models

The regression splines can also be used in any generalized linear model using functions from the splines library.

The bs() function is a B-spline basis, and the ns() is the natural spline function. The natural splines often behave better at the boundaries of the data.

library(lme4)

## Loading required package: Matrix

## 
## Attaching package: 'lme4'

## The following object is masked from 'package:nlme':
## 
##     lmList

samp<-sample(1:dim(merged)[1], size = 10000)
fit_glmm_bs<-glmer(badhealth ~ bs(scale(age80))+male+black+hispanic+other+(1|mmsa_name),
                  data=merged[samp,],
                  family=binomial,
                  weights=mmsawt/mean(mmsawt),
                  control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=1e9)))

## Warning in eval(family$initialize, rho): non-integer #successes in a
## binomial glm!

fit_glmm<-glmer(badhealth ~ scale(age80)+male+black+hispanic+other+(1|mmsa_name),
                data=merged[samp,],
                family=binomial, 
                weights=mmsawt/mean(mmsawt),
                control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=1e9)))

## Warning in eval(family$initialize, rho): non-integer #successes in a
## binomial glm!

fit_glmm_ML<-glmer(badhealth ~ bs(scale(age80))+male+black+hispanic+other+bs(medhhinc)+(1|mmsa_name),
                data=merged[samp,],
                family=binomial, 
                weights=mmsawt/mean(mmsawt),
                control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=1e9)))

## Warning in eval(family$initialize, rho): non-integer #successes in a
## binomial glm!

summary(fit_glmm)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: badhealth ~ scale(age80) + male + black + hispanic + other +  
##     (1 | mmsa_name)
##    Data: merged[samp, ]
## Weights: mmsawt/mean(mmsawt)
## Control: 
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+09))
## 
##      AIC      BIC   logLik deviance df.resid 
##   7562.9   7613.3  -3774.4   7548.9     9993 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8617 -0.3972 -0.2430 -0.1210 10.3918 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  mmsa_name (Intercept) 0.18     0.4242  
## Number of obs: 10000, groups:  mmsa_name, 122
## 
## Fixed effects:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.787519   0.069669 -25.657   <2e-16 ***
## scale(age80)  0.412074   0.030026  13.724   <2e-16 ***
## male          0.004205   0.059282   0.071   0.9435    
## black         0.829131   0.082670  10.029   <2e-16 ***
## hispanic      1.184666   0.082449  14.368   <2e-16 ***
## other         0.263457   0.134076   1.965   0.0494 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) sc(80) male   black  hispnc
## scale(ag80)  0.052                            
## male        -0.432  0.050                     
## black       -0.250  0.134 -0.009              
## hispanic    -0.250  0.248 -0.001  0.285       
## other       -0.169  0.093 -0.005  0.162  0.197

summary(fit_glmm_bs)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: badhealth ~ bs(scale(age80)) + male + black + hispanic + other +  
##     (1 | mmsa_name)
##    Data: merged[samp, ]
## Weights: mmsawt/mean(mmsawt)
## Control: 
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+09))
## 
##      AIC      BIC   logLik deviance df.resid 
##   7564.0   7628.9  -3773.0   7546.0     9991 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7877 -0.3975 -0.2429 -0.1218 10.5942 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  mmsa_name (Intercept) 0.1806   0.425   
## Number of obs: 10000, groups:  mmsa_name, 122
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -2.634830   0.132319 -19.913  < 2e-16 ***
## bs(scale(age80))1  0.211551   0.322998   0.655   0.5125    
## bs(scale(age80))2  1.304305   0.202530   6.440 1.19e-10 ***
## bs(scale(age80))3  1.292442   0.174433   7.409 1.27e-13 ***
## male              -0.001442   0.059390  -0.024   0.9806    
## black              0.830534   0.082951  10.012  < 2e-16 ***
## hispanic           1.181008   0.082552  14.306  < 2e-16 ***
## other              0.257212   0.134117   1.918   0.0551 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) b((80))1 b((80))2 b((80))3 male   black  hispnc
## bs(sc(80))1 -0.732                                                
## bs(sc(80))2 -0.055 -0.361                                         
## bs(sc(80))3 -0.713  0.777   -0.338                                
## male        -0.248  0.018   -0.037    0.058                       
## black       -0.137 -0.063    0.051    0.037   -0.008              
## hispanic    -0.245  0.025    0.065    0.161    0.000  0.284       
## other       -0.138  0.022    0.008    0.073   -0.003  0.161  0.197

summary(fit_glmm_ML)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: badhealth ~ bs(scale(age80)) + male + black + hispanic + other +  
##     bs(medhhinc) + (1 | mmsa_name)
##    Data: merged[samp, ]
## Weights: mmsawt/mean(mmsawt)
## Control: 
## glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+09))
## 
##      AIC      BIC   logLik deviance df.resid 
##   7559.3   7645.8  -3767.6   7535.3     9988 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7586 -0.3981 -0.2415 -0.1214 10.5142 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  mmsa_name (Intercept) 0.1615   0.4019  
## Number of obs: 10000, groups:  mmsa_name, 122
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -2.091629   0.292201  -7.158 8.18e-13 ***
## bs(scale(age80))1  0.226036   0.323236   0.699   0.4844    
## bs(scale(age80))2  1.308335   0.202635   6.457 1.07e-10 ***
## bs(scale(age80))3  1.293460   0.174711   7.403 1.33e-13 ***
## male              -0.001209   0.059418  -0.020   0.9838    
## black              0.833798   0.082911  10.057  < 2e-16 ***
## hispanic           1.177991   0.082517  14.276  < 2e-16 ***
## other              0.266734   0.135113   1.974   0.0484 *  
## bs(medhhinc)1     -0.809058   0.771020  -1.049   0.2940    
## bs(medhhinc)2     -1.117894   0.610475  -1.831   0.0671 .  
## bs(medhhinc)3     -0.670659   0.564093  -1.189   0.2345    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) b((80))1 b((80))2 b((80))3 male   black  hispnc other 
## bs(sc(80))1 -0.321                                                       
## bs(sc(80))2 -0.012 -0.360                                                
## bs(sc(80))3 -0.304  0.777   -0.338                                       
## male        -0.117  0.020   -0.037    0.060                              
## black       -0.044 -0.063    0.051    0.038   -0.008                     
## hispanic    -0.116  0.026    0.063    0.161    0.001  0.284              
## other       -0.041  0.022    0.006    0.077   -0.001  0.161  0.198       
## bs(mdhhnc)1 -0.840 -0.010   -0.011   -0.025    0.006 -0.016  0.009 -0.024
## bs(mdhhnc)2  0.218 -0.003   -0.008    0.024   -0.005 -0.005 -0.008  0.039
## bs(mdhhnc)3 -0.481 -0.005    0.004   -0.038   -0.009 -0.013 -0.018 -0.115
##             bs(m)1 bs(m)2
## bs(sc(80))1              
## bs(sc(80))2              
## bs(sc(80))3              
## male                     
## black                    
## hispanic                 
## other                    
## bs(mdhhnc)1              
## bs(mdhhnc)2 -0.552       
## bs(mdhhnc)3  0.555 -0.434

anova(fit_glmm, fit_glmm_bs, fit_glmm_ML)

You can also fit GLMM’s within the GAM framework

library(gamm4)

gam_mm<-gamm4(badhealth ~ s(scale(age80))+male+black+hispanic+other,
              data=merged[samp,],
              random=~(1|mmsa_name),
              family = binomial)


summary(gam_mm$mer)
summary(gam_mm$gam)