This example will illustrate how to fit the extended Cox Proportional hazards model with Gaussian frailty to continuous duration data (i.e. person-level data) and a discrete-time (longitudinal) data set. In this example, I will use the event of a child dying before age 5. The data for this example come from the model.data Demographic and Health Survey for 2012 birth history recode file. This file contains information for all births to women in the survey.

The longitudinal data example uses data from the ECLS-K. Specifically, we will examine the transition into poverty between kindergarten and third grade.

#Load required libraries
library(foreign)
library(survival)
library(car)
library(survey)
library(coxme)
library(knitr)
library(lme4)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
#load the data
model.dat<-haven::read_dta("~/Google Drive/classes/dem7223/dem7223_19/data/zzbr62dt/ZZBR62FL.DTA")
#We form a subset of variables
sub<-data.frame(CASEID=model.dat$caseid,v008=model.dat$v008,bord=model.dat$bidx,
                csex=model.dat$b4,b2=model.dat$b2, b3=model.dat$b3, b5=model.dat$b5,
                b7=model.dat$b7, ibint=model.dat$b11, rural=model.dat$v025,
                educ=model.dat$v106,age=model.dat$v012,partneredu=model.dat$v701,
                partnerage=model.dat$v730, hhses=model.dat$v190,
                weight=model.dat$v005/1000000, psu=model.dat$v021, 
                strata=model.dat$v022, region=model.dat$v023)

sub$death.age<-ifelse(sub$b5==1,
                          ((((sub$v008))+1900)-(((sub$b3))+1900)) 
                          ,sub$b7)

#censoring indicator for death by age 5, in months (<=60 months)
sub$d.event<-ifelse(is.na(sub$b7)==T|sub$b7>60,0,1)
sub$d.eventfac<-factor(sub$d.event);
levels(sub$d.eventfac)<-c("Alive at Age 5", "Dead by Age 5")
table(sub$d.eventfac)
## 
## Alive at Age 5  Dead by Age 5 
##          19224           4442
#recodes
sub$male<-ifelse(sub$csex==1,1,0)
sub$educ.high<-ifelse(sub$educ %in% c(2,3), 1, 0)
sub$age2<-sub$age^2
sub$partnerhiedu<-ifelse(sub$partneredu<3,0,
                         ifelse(sub$partneredu%in%c(8,9),NA,1 ))

Fit the ordinary Cox model

Here I fit the ordinary Cox model without frailty, just for comparison sake.

#using coxph in survival library
fit.cox2<-coxph(Surv(death.age, d.event)~bord+male+educ.high+I(age/5)+I(hhses>3),
                data=sub, weights=weight)
summary(fit.cox2)
## Call:
## coxph(formula = Surv(death.age, d.event) ~ bord + male + educ.high + 
##     I(age/5) + I(hhses > 3), data = sub, weights = weight)
## 
##   n= 23666, number of events= 4442 
## 
##                       coef exp(coef)  se(coef)      z Pr(>|z|)    
## bord              0.165766  1.180297  0.007043 23.536  < 2e-16 ***
## male              0.075614  1.078547  0.030602  2.471   0.0135 *  
## educ.high        -0.043896  0.957053  0.053125 -0.826   0.4086    
## I(age/5)         -0.057113  0.944488  0.010958 -5.212 1.87e-07 ***
## I(hhses > 3)TRUE -0.184026  0.831914  0.035336 -5.208 1.91e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                  exp(coef) exp(-coef) lower .95 upper .95
## bord                1.1803     0.8472    1.1641    1.1967
## male                1.0785     0.9272    1.0158    1.1452
## educ.high           0.9571     1.0449    0.8624    1.0621
## I(age/5)            0.9445     1.0588    0.9244    0.9650
## I(hhses > 3)TRUE    0.8319     1.2020    0.7762    0.8916
## 
## Concordance= 0.607  (se = 0.005 )
## Likelihood ratio test= 623.2  on 5 df,   p=<2e-16
## Wald test            = 679.2  on 5 df,   p=<2e-16
## Score (logrank) test = 690.1  on 5 df,   p=<2e-16
plot(survfit(fit.cox2),  ylim=c(.8,1), xlim=c(0,60),
     ylab="S(t)", xlab="Age in Months")
title(main="Survival Function for Child Mortality")

Fit the Cox model with frailty at the regional level

The coxme() function in the coxme library Link will fit the Cox model with shared frailty, assuming a Gaussian frailty term. The model would look like:

\(h_j (t) = h_{0j} e^{(x'\beta + u_j)}\)

where

\(u_j \sim N(0, \sigma^2)\)

is a Normally distributed random effect, identical for each person in the jth group. This term raises or lowers the average hazard function the same way for each person within each group, but allows the overall risk for people in different groups to be different. This would be considered to be a random intercept model, if we were considering an ordinary linear or genearlized linear model.

fit.cox.f<-coxme(Surv(death.age, d.event)~bord+male+educ.high+I(age/5)+I(hhses>3)+(1|region),
                 data=sub, weights=weight)
summary(fit.cox.f)
## Cox mixed-effects model fit by maximum likelihood
##   Data: sub
##   events, n = 4442, 23666
##   Iterations= 10 53 
##                     NULL Integrated    Fitted
## Log-likelihood -42496.25  -42168.52 -42160.02
## 
##                    Chisq    df p    AIC    BIC
## Integrated loglik 655.46  6.00 0 643.46 605.06
##  Penalized loglik 672.45 10.29 0 651.87 586.03
## 
## Model:  Surv(death.age, d.event) ~ bord + male + educ.high + I(age/5) +      I(hhses > 3) + (1 | region) 
## Fixed coefficients
##                         coef exp(coef)   se(coef)     z       p
## bord              0.16407385 1.1783013 0.00704951 23.27 0.0e+00
## male              0.07622015 1.0792001 0.03060558  2.49 1.3e-02
## educ.high        -0.05391452 0.9475131 0.05376093 -1.00 3.2e-01
## I(age/5)         -0.05596300 0.9455741 0.01096637 -5.10 3.3e-07
## I(hhses > 3)TRUE -0.19942922 0.8191982 0.04246931 -4.70 2.7e-06
## 
## Random effects
##  Group  Variable  Std Dev    Variance  
##  region Intercept 0.10319525 0.01064926

This gives us the variance in child mortality by region, which is honestly pretty substantial. We can use a likelihood ratio test to see if the frailty model is significantly better than the ordinary Cox model:

anova(fit.cox.f, fit.cox2)
## Analysis of Deviance Table
##  Cox model: response is  Surv(death.age, d.event)
##  Model 1: ~bord + male + educ.high + I(age/5) + I(hhses > 3) + (1 | region)
##  Model 2: ~bord + male + educ.high + I(age/5) + I(hhses > 3)
##   loglik  Chisq Df P(>|Chi|)    
## 1 -42169                        
## 2 -42185 32.242  1 1.361e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(fit.cox.f)
## [1] 84340.63
AIC(fit.cox2)
## [1] 84379.28

Which it is, and this is supported by the AIC difference between the two models of 38.66 points.

So, what are the frailties in this case? We can get those from the frail portion of the model structure:

fit.cox.f$frail
## $region
##            1            2            3            4            5 
## -0.060602442 -0.160882759  0.001678816  0.013684448 -0.048005072 
##            6            7            8 
##  0.074802258  0.133004772  0.046319979
hist(fit.cox.f$frail$region)

Which shows the region region.7 has the highest frailty, which means that the average level of childhood mortality is highest in that region, while the lowest frailty is in region.2 .

Random slopes

If we were interested in whether a predictor variable had heterogeneous effects across the various groups within our data, we could include that in our model as well, and we would have effectively a random slope model:

\(h_j (t) = h_{0j} e^{(x'\beta + u_j+\gamma_j 'x)}\)

where \(\gamma_j\) is a group-specific effect of a particular predictor variable, and these two random effects will be distributed as:

\[ \left[\begin{array} {rrr} u_j \\ \gamma_j \end{array}\right] \sim \text{MVN}(0, \Sigma) \]

#See if higher birth order children face equal disadvantage in all regions
fit.cox.f2<-coxme(Surv(death.age, d.event)~bord+male+educ.high+I(age/5)+I(hhses>3)+(1+bord|region),
                  data=sub, weights=weight)
summary(fit.cox.f2)
## Cox mixed-effects model fit by maximum likelihood
##   Data: sub
##   events, n = 4442, 23666
##   Iterations= 10 53 
##                     NULL Integrated    Fitted
## Log-likelihood -42496.25  -42168.24 -42156.91
## 
##                    Chisq    df p    AIC    BIC
## Integrated loglik 656.01  8.00 0 640.01 588.82
##  Penalized loglik 678.68 12.38 0 653.92 574.73
## 
## Model:  Surv(death.age, d.event) ~ bord + male + educ.high + I(age/5) +      I(hhses > 3) + (1 + bord | region) 
## Fixed coefficients
##                         coef exp(coef)    se(coef)     z       p
## bord              0.16344882 1.1775651 0.008861874 18.44 0.0e+00
## male              0.07631351 1.0793009 0.030611447  2.49 1.3e-02
## educ.high        -0.05409460 0.9473425 0.053852298 -1.00 3.2e-01
## I(age/5)         -0.05580822 0.9457205 0.010967384 -5.09 3.6e-07
## I(hhses > 3)TRUE -0.19967351 0.8189981 0.042493875 -4.70 2.6e-06
## 
## Random effects
##  Group  Variable  Std Dev       Variance      Corr         
##  region Intercept  0.1162372113  0.0135110893 -0.4586696457
##         bord       0.0129927062  0.0001688104
anova(fit.cox.f, fit.cox.f2)
## Analysis of Deviance Table
##  Cox model: response is  Surv(death.age, d.event)
##  Model 1: ~bord + male + educ.high + I(age/5) + I(hhses > 3) + (1 | region)
##  Model 2: ~bord + male + educ.high + I(age/5) + I(hhses > 3) + (1 + bord |      region)
##   loglik Chisq Df P(>|Chi|)
## 1 -42169                   
## 2 -42168 0.559  2    0.7562

And it looks like there is not significant variation in this effect, because the model with the additional term does not fit the data significantly better than the model with only the random “intercept”.

Using Longitudinal Data

As in the other examples, I illustrate fitting these models to data that are longitudinal, instead of person-duration. In this example, we will examine how to fit the Cox model to a longitudinally collected data set.

First we load our data First we load our data

load("~/Google Drive/classes/dem7223/dem7223_19/data/eclsk_k5.Rdata")
names(eclskk5)<-tolower(names(eclskk5))
#get out only the variables I'm going to use for this example
myvars<-c( "childid","x_chsex_r", "x_raceth_r", "x1kage_r","x4age", "x5age", "x6age", "x7age", "x2povty","x4povty_i", "x6povty_i", "x8povty_i","x12par1ed_i", "s2_id", "w6c6p_6psu", "w6c6p_6str", "w6c6p_20")
eclskk5<-eclskk5[,myvars]


eclskk5$age1<-ifelse(eclskk5$x1kage_r==-9, NA, eclskk5$x1kage_r/12)
eclskk5$age2<-ifelse(eclskk5$x4age==-9, NA, eclskk5$x4age/12)
#for the later waves, the NCES group the ages into ranges of months, so 1= <105 months, 2=105 to 108 months. So, I fix the age at the midpoint of the interval they give, and make it into years by dividing by 12
eclskk5$age3<-ifelse(eclskk5$x5age==-9, NA, eclskk5$x5age/12)

eclskk5$pov1<-ifelse(eclskk5$x2povty==1,1,0)
eclskk5$pov2<-ifelse(eclskk5$x4povty_i==1,1,0)
eclskk5$pov3<-ifelse(eclskk5$x6povty_i==1,1,0)

#Recode race with white, non Hispanic as reference using dummy vars
eclskk5$race_rec<-Recode (eclskk5$x_raceth_r, recodes="1 = 'nhwhite';2='nhblack';3:4='hispanic';5='nhasian'; 6:8='other';-9=NA", as.factor = T)
eclskk5$race_rec<-relevel(eclskk5$race_rec, ref = "nhwhite")
eclskk5$male<-Recode(eclskk5$x_chsex_r, recodes="1=1; 2=0; -9=NA")
eclskk5$mlths<-Recode(eclskk5$x12par1ed_i, recodes = "1:2=1; 3:9=0; else = NA")
eclskk5$mgths<-Recode(eclskk5$x12par1ed_i, recodes = "1:3=0; 4:9=1; else =NA") 

Now, I need to form the transition variable, this is my event variable, and in this case it will be 1 if a child enters poverty between the first wave of the data and the third grade wave, and 0 otherwise.

NOTE I need to remove any children who are already in poverty age wave 1, because they are not at risk of experiencing this particular transition. Again, this is called forming the risk set

eclskk5<-subset(eclskk5, is.na(pov1)==F&is.na(pov2)==F&is.na(pov3)==F&is.na(age1)==F&is.na(age2)==F&is.na(age3)==F&pov1!=1)

Now we do the entire data set. To analyze data longitudinally, we need to reshape the data from the current “wide” format (repeated measures in columns) to a “long” format (repeated observations in rows). The reshape() function allows us to do this easily. It allows us to specify our repeated measures, time varying covariates as well as time-constant covariates.

e.long<-reshape(data.frame(eclskk5), idvar="childid", varying=list(c("age1","age2"),
                                                     c("age2", "age3")),
                v.names=c("age_enter", "age_exit"),
                times=1:2, direction="long" )
e.long<-e.long[order(e.long$childid, e.long$time),]

e.long$povtran<-NA

e.long$povtran[e.long$pov1==0&e.long$pov2==1&e.long$time==1]<-1
e.long$povtran[e.long$pov2==0&e.long$pov3==1&e.long$time==2]<-1

e.long$povtran[e.long$pov1==0&e.long$pov2==0&e.long$time==1]<-0
e.long$povtran[e.long$pov2==0&e.long$pov3==0&e.long$time==2]<-0

#find which kids failed in earlier time periods and remove them from the second & third period risk set
failed1<-which(is.na(e.long$povtran)==T)
e.long<-e.long[-failed1,]


e.long$age1r<-round(e.long$age_enter, 0)
e.long$age2r<-round(e.long$age_exit, 0)
e.long$time_start<-e.long$time-1
head(e.long[, c("childid","time_start" , "time", "age_enter", "age_exit", "pov1", "pov2", "pov3","povtran", "mlths")], n=10)
##             childid time_start time age_enter age_exit pov1 pov2 pov3
## 10000014.1 10000014          0    1  5.651667 7.161667    0    0    0
## 10000014.2 10000014          1    2  7.161667 7.644167    0    0    0
## 10000020.1 10000020          0    1  5.698333 7.380833    0    0    0
## 10000020.2 10000020          1    2  7.380833 7.780833    0    0    0
## 10000022.1 10000022          0    1  5.717500 7.306667    0    0    0
## 10000022.2 10000022          1    2  7.306667 7.748333    0    0    0
## 10000029.1 10000029          0    1  5.783333 7.238333    0    0    0
## 10000029.2 10000029          1    2  7.238333 7.723333    0    0    0
## 10000034.1 10000034          0    1  6.353333 7.775000    0    0    1
## 10000034.2 10000034          1    2  7.775000 8.295833    0    0    1
##            povtran mlths
## 10000014.1       0     0
## 10000014.2       0     0
## 10000020.1       0     0
## 10000020.2       0     0
## 10000022.1       0     0
## 10000022.2       0     0
## 10000029.1       0     1
## 10000029.2       0     1
## 10000034.1       0     0
## 10000034.2       1     0
#make an id that is the combination of state and strata 
e.long$sampleid<-paste(e.long$w6c6p_6str, e.long$w6c6p_6psu)
#within each sampling unit, sum the weights
wts<-tapply(e.long$w6c6p_20,e.long$sampleid,sum)
#make a data frame from this
wts<-data.frame(id=names(unlist(wts)), wt=unlist(wts))
#get the unique sampling location ids'
t1<-as.data.frame(table(e.long$sampleid))
#put all of this into a data set
wts2<-data.frame(ids=wts$id, sumwt=wts$wt, jn=t1$Freq)
#merge all of this back to the original data file
e.long<-merge(e.long, wts2, by.x="sampleid", by.y="ids", all.x=T)
#In the new data set, multiply the original weight by the fraction of the
#sampling unit total population each person represents
e.long$swts<-e.long$w6c6p_20*(e.long$jn/e.long$sumwt)

Fit basic Cox model

Now we fit the Cox model using full survey design. In the ECLS-K, I use the longitudinal weight for waves 1-7, as well as the associated psu and strata id’s for the longitudinal data from these waves from the parents of the child, since no data from the child themselves are used in the outcome.

#Fit the model
fitl1<-coxph(Surv(time = time, event = povtran)~mlths+mgths+race_rec, data=e.long, weights = swts)
summary(fitl1) 
## Call:
## coxph(formula = Surv(time = time, event = povtran) ~ mlths + 
##     mgths + race_rec, data = e.long, weights = swts)
## 
##   n= 3938, number of events= 221 
##    (57 observations deleted due to missingness)
## 
##                     coef exp(coef) se(coef)      z Pr(>|z|)    
## mlths             0.4842    1.6229   0.1864  2.597   0.0094 ** 
## mgths            -1.2299    0.2923   0.1544 -7.966 1.64e-15 ***
## race_rechispanic  0.9508    2.5877   0.1700  5.591 2.26e-08 ***
## race_recnhasian   0.3127    1.3671   0.3439  0.909   0.3632    
## race_recnhblack   0.9279    2.5293   0.2291  4.050 5.11e-05 ***
## race_recother     0.4838    1.6222   0.2929  1.652   0.0986 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                  exp(coef) exp(-coef) lower .95 upper .95
## mlths               1.6229     0.6162    1.1261    2.3387
## mgths               0.2923     3.4208    0.2160    0.3956
## race_rechispanic    2.5877     0.3864    1.8542    3.6112
## race_recnhasian     1.3671     0.7314    0.6967    2.6828
## race_recnhblack     2.5293     0.3954    1.6143    3.9628
## race_recother       1.6222     0.6165    0.9137    2.8800
## 
## Concordance= 0.746  (se = 0.02 )
## Likelihood ratio test= 205.6  on 6 df,   p=<2e-16
## Wald test            = 214.1  on 6 df,   p=<2e-16
## Score (logrank) test = 284.4  on 6 df,   p=<2e-16

Cox model with additive frailty

Now we fit the Cox model and doing fraily by the school the child attends. I use the weights calculated above, standardized to the within cluster sample size.

#Fit the model
fitl2<-coxme(Surv(time = time, event = povtran)~mlths+mgths+race_rec+(1|s2_id), e.long, weights = swts)
summary(fitl2) 
## Cox mixed-effects model fit by maximum likelihood
##   Data: e.long
##   events, n = 221, 3938 (57 observations deleted due to missingness)
##   Iterations= 7 47 
##                     NULL Integrated    Fitted
## Log-likelihood -1852.682  -1739.942 -1649.332
## 
##                    Chisq    df p    AIC    BIC
## Integrated loglik 225.48  7.00 0 211.48 187.69
##  Penalized loglik 406.70 86.68 0 233.34 -61.20
## 
## Model:  Surv(time = time, event = povtran) ~ mlths + mgths + race_rec +      (1 | s2_id) 
## Fixed coefficients
##                        coef exp(coef)  se(coef)     z       p
## mlths             0.4341737 1.5436870 0.2057034  2.11 3.5e-02
## mgths            -1.0785657 0.3400829 0.1628668 -6.62 3.5e-11
## race_rechispanic  0.9737078 2.6477435 0.1871651  5.20 2.0e-07
## race_recnhasian   0.4159912 1.5158725 0.3746802  1.11 2.7e-01
## race_recnhblack   0.9314125 2.5380917 0.2510170  3.71 2.1e-04
## race_recother     0.3648314 1.4402711 0.3346136  1.09 2.8e-01
## 
## Random effects
##  Group Variable  Std Dev   Variance 
##  s2_id Intercept 0.7655108 0.5860068
anova(fitl1, fitl2)
## Analysis of Deviance Table
##  Cox model: response is  Surv(time = time, event = povtran)
##  Model 1: ~mlths + mgths + race_rec
##  Model 2: ~mlths + mgths + race_rec + (1 | s2_id)
##    loglik  Chisq Df P(>|Chi|)    
## 1 -1749.9                        
## 2 -1739.9 19.918  1 8.083e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(fitl1)
## [1] 3511.802
AIC(fitl2)
## [1] 3472.021
hist(fitl2$frail$s2_id)

So, we see that the frailty model has a large variance, and that the AIC is lower than the ordinary Cox model, suggesting better model fit. The model likelihood ratio test also confirms that the frailty model fits better.

Discrete time frailty model

Basic Discrete time model

First, we fit the basic discrete time model for our outcome. Here I’m going to use the new standardized weights I created above in a regular glm() model:

fit1<-glm(povtran~as.factor(time)+mlths+mgths+race_rec-1, data=e.long, weights = swts, family=binomial(link="cloglog"))
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
arm::display(fit1, detail=T)
## glm(formula = povtran ~ as.factor(time) + mlths + mgths + race_rec - 
##     1, family = binomial(link = "cloglog"), data = e.long, weights = swts)
##                  coef.est coef.se z value Pr(>|z|)
## as.factor(time)1  -2.34     0.17  -14.12    0.00  
## as.factor(time)2  -2.90     0.18  -15.81    0.00  
## mlths              0.46     0.19    2.47    0.01  
## mgths             -1.22     0.15   -7.89    0.00  
## race_rechispanic   0.95     0.17    5.60    0.00  
## race_recnhasian    0.31     0.34    0.89    0.37  
## race_recnhblack    0.92     0.23    4.02    0.00  
## race_recother      0.49     0.29    1.65    0.10  
## ---
##   n = 3938, k = 8
##   residual deviance = 1537.7, null deviance = 7630.8 (difference = 6093.1)

Discrete time model with shared frailty

For the discrete time model, if the logit link is used, then we are effectively fitting a multilevel model for our outcome. The model with only a group frailty component will have the exact same form as the multilevel logit model with a random intecept at the group level: Fit the basic random intercept model using the complementary log-log link function:

\[log(-log(1-h(t))) = \beta_{0j} +x'\beta +Z\gamma'\]

with

\(\beta_{0j} = \beta_0 + Z\gamma'+ u_j\)

and

\(u_j\sim N(0, \sigma_u^2)\)

Where the intercepts (\(u_j\)) for each group vary randomly around the overall mean (\(\beta_0\)).

The individual level predictors are incorporated into \(x\), while the group level predictors (if any are measured) are included in \(Z\). If only a random intercept is specified, then \(Z\) is a vector of 1’s.

#this will take a lot longer to fit compared to the ordinary logit
#I also had to include some extra optimization details because the default maximizer didn't return a satisfactory model fit criteria.

fit2<-glmer(povtran~as.factor(time)+mlths+mgths+race_rec-1+(1|s2_id),
            data=e.long, weights = swts, family=binomial(link="cloglog"), 
            control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
## Warning in eval(family$initialize, rho): non-integer #successes in a
## binomial glm!
arm::display(fit2, detail=T)
## glmer(formula = povtran ~ as.factor(time) + mlths + mgths + race_rec - 
##     1 + (1 | s2_id), data = e.long, family = binomial(link = "cloglog"), 
##     control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05)), 
##     weights = swts)
##                  coef.est coef.se z value Pr(>|z|)
## as.factor(time)1  -2.62     0.20  -12.80    0.00  
## as.factor(time)2  -3.13     0.21  -14.56    0.00  
## mlths              0.40     0.21    1.93    0.05  
## mgths             -1.11     0.17   -6.62    0.00  
## race_rechispanic   0.99     0.19    5.27    0.00  
## race_recnhasian    0.49     0.40    1.21    0.23  
## race_recnhblack    0.93     0.25    3.67    0.00  
## race_recother      0.41     0.34    1.20    0.23  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s2_id    (Intercept) 0.73    
##  Residual             1.00    
## ---
## number of obs: 3938, groups: s2_id, 297
## AIC = 1511.5, DIC = 1219.8
## deviance = 1356.6

AIC comparison of the two models:

AIC(fit1) #Regular GLM
## [1] 1638.724
AIC(fit2) #GLMM
## [1] 1511.47