Multilevel Modeling in R

Homework Questions

The attached data file has observations from 1000 patients at risk of developing diabetes. The variables in the file are: blood sugar level, BMI, W, XM, and Provider.

The outcome variable is blood sugar, the explanatory variable is each patient’s BMI, and patients are nested in providers. (You can ignore the ‘W’ & ‘XM’ variables).

Packages needed for this assignment

library(nlme)
library(lme4)
library(ICC)

We will be conducting our MLM analyses using the ‘lme4’ package.

The analyses will be excuted with the ‘lmer’ function.

To begin, read in the data & change variable names appropriately

dat<- read.table("ex9.2a.dat", quote="\"", comment.char="", header=F)
names(dat)<- c("bloodsugar","BMI", "W", "XM", "Provider") #Assign vbl names - ignore "W" and "XM"
head(dat) #make sure names worked correctly

##   bloodsugar       BMI         W        XM Provider
## 1  -2.376597  0.360803 -2.558013 -1.368350        1
## 2  -1.069854 -1.090049 -2.558013 -1.368350        1
## 3  -1.184909  0.461845 -2.558013 -1.368350        1
## 4  -0.571297 -0.211070 -2.558013 -1.368350        1
## 5  -0.450361 -0.716481 -2.558013 -1.368350        1
## 6  -1.901163 -0.504917 -0.885252 -0.163053        2

Set grouping variable as a factor

factor(dat$Provider) #Set nesting variable as a factor

Unconditional Model

Before even going through the steps of MLM, we first want to check and see if we even have enough between-group variance in our DV to proceed. To do so, we simply fit the unconditional (baseline) model, and then use this to estimate our Intraclass Coefficient. There are no predictors in this model!

uncon<-lmer(bloodsugar ~ 1 + (1|Provider), data=dat, REML=F)
uncon

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bloodsugar ~ 1 + (1 | Provider)
##    Data: dat
##       AIC       BIC    logLik  deviance  df.resid 
##  3951.355  3966.078 -1972.678  3945.355       997 
## Random effects:
##  Groups   Name        Std.Dev.
##  Provider (Intercept) 1.787   
##  Residual             1.512   
## Number of obs: 1000, groups:  Provider, 110
## Fixed Effects:
## (Intercept)  
##       1.468

-The fixed effects represent our grand mean of bloodsugar, in this case, 1.468.

Our between-level variance is 1.787^2 (3.183).
Our within-level variance is 1.787^2 (2.2861) - this is captured in the residual variance.

Overall, there appears to be more between variance across providers in bloodsugar compared to within variance of bloodsugar for each individual provider. This is good, and should also be reflected in our ICC calculation.

Hand Calculating the ICC Based on Above Output:

Because the output yields the SDs, we can square those values to get the variance that will then be used to calculate the ICC.
As a reminder, we want the ICC to be larger. A small ICC suggests that observations are independent (i.e., no evidence for a clustering effect, and thus there is no reason to go forward with MLM). These values can range anywhere from 0-1.

1.7958^2/(1.796^2 + 1.512^2)

## [1] 0.585094

Calculating the ICC via R in

If we prefer to not do the hand calculation, we can use the “ICCest” command in the ‘ICC’ package to calculate the ICC. All we need to plug in is our cluster variable and our outcome variable of interest.

ICCest(Provider, #grouping variable goes here
       bloodsugar, #between level variable of interest here
       data = dat, 
       alpha = 0.05, 
       CI.type = c("THD", "Smith"))

## $ICC
## [1] 0.5901581
## 
## $LowerCI
## [1] 0.5185508
## 
## $UpperCI
## [1] 0.6642699
## 
## $N
## [1] 110
## 
## $k
## [1] 9.077982
## 
## $varw
## [1] 2.28843
## 
## $vara
## [1] 3.29526

R gives us a similar result from our hand calculation (ICC = .59)

Random Effects ANCOVA (Random Intercept, Fixed Predictor At Level 1)

For the next model, we add a level-1 predictor. We do this by replacing the ‘1’ of the previous model with the predictor (i.e., BMI). An intercept in this case is always assumed, so it is still estimated here (but only needs to be specified when there are no other added predictors).

Make sure to do REML=FALSE so the loglikelihood is computed for you.

mod.1 <- lmer(bloodsugar ~ BMI + (1 | Provider), data=dat, REML=F) 
summary(mod.1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bloodsugar ~ BMI + (1 | Provider)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   3777.3   3796.9  -1884.6   3769.3      996 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6787 -0.6194  0.0192  0.6045  4.0876 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Provider (Intercept) 3.129    1.769   
##  Residual             1.885    1.373   
## Number of obs: 1000, groups:  Provider, 110
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.45422    0.17508   8.306
## BMI          0.61737    0.04433  13.927
## 
## Correlation of Fixed Effects:
##     (Intr)
## BMI -0.005

To Grand Mean Center, or Not to Grand Mean Center? That is the Question…

There are various reasons that make sense to grand mean center - but this is dependent also on your research question and variables you are using in the model. Essentially what grand mean centering does is subtracts the grand mean of the predictor (in this example, BMI) using the mean from the full sample. If we wanted to grand mean center our predictor, we can add the ‘scaling’ command in the same lmer regression equation from above. For this assignment, both ways our demonstrated, but results will be interpreted with the grand mean centering option.

You could also grand mean center the predictor manually and save it as an object like so:

BMI.grandmean<- (dat$BMI - mean(dat$BMI))

Same Random Effects ANCOVA with Grand Mean Centering of the Predictor

modcenter.1 <- lmer(bloodsugar ~ scale(BMI, center=T) + (1 | Provider), data=dat, REML=F)
summary(modcenter.1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bloodsugar ~ scale(BMI, center = T) + (1 | Provider)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   3777.3   3796.9  -1884.6   3769.3      996 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6787 -0.6194  0.0192  0.6045  4.0876 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Provider (Intercept) 3.129    1.769   
##  Residual             1.885    1.373   
## Number of obs: 1000, groups:  Provider, 110
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)             1.47116    0.17508   8.403
## scale(BMI, center = T)  0.63063    0.04528  13.927
## 
## Correlation of Fixed Effects:
##             (Intr)
## sc(BMI,c=T) 0.002

Results

-The “Fixed Effects” section contains our values of γ00 (grand mean) and the grand slope of our level-1 predictor (γ10). Again, the grand mean is still ~1.47, and our grand slope of BMI is 0.63.

This means that an individual’s predicted bloodsugar level has a 0.63 increase for every one unit increase in the individual’s BMI.

-Our within-level residual variance of bloodsugar was 1.885.

-The between-level variance (the variance of the Providers around the grand mean) of bloodsugar was estimated to be 3.129.

This suggests that a substantial amount of the variability in bloodsugar scores can be explained by individual differences of BMI’s.

Random Intercept Random Slope (Random Coefficients Model)

This model contains a random intercept at the individual level & a predictor that is allowed to vary between groups (i.e., Providers). In other words, the effect of BMI on bloodsugar is allowed to vary between Providers. In order to estimate this model, the ‘1’ that indicates the intercept in the random part of the model specification is replaced by the variable (i.e., BMI) of which we want the effect to vary between the groups (i.e., Providers).

mod.2<-lmer(bloodsugar ~ BMI + (BMI | Provider), data=dat, REML=F)
summary(mod.2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bloodsugar ~ BMI + (BMI | Provider)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   3343.5   3372.9  -1665.7   3331.5      994 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5399 -0.6434 -0.0006  0.6694  3.2168 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Provider (Intercept) 3.1346   1.770        
##           BMI         0.8046   0.897    0.88
##  Residual             1.0308   1.015        
## Number of obs: 1000, groups:  Provider, 110
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   1.4311     0.1726   8.292
## BMI           0.7596     0.0934   8.133
## 
## Correlation of Fixed Effects:
##     (Intr)
## BMI 0.791

Same Model from Above, Just With Grand Mean Centering of the Predictor:

modcenter.2<-lmer(bloodsugar ~ scale(BMI, center=T) + (BMI | Provider), data=dat, REML=F)
summary(modcenter.2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bloodsugar ~ scale(BMI, center = T) + (BMI | Provider)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   3343.5   3372.9  -1665.7   3331.5      994 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5399 -0.6434 -0.0006  0.6694  3.2168 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Provider (Intercept) 3.1346   1.770        
##           BMI         0.8046   0.897    0.88
##  Residual             1.0308   1.015        
## Number of obs: 1000, groups:  Provider, 110
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)              1.4520     0.1746   8.315
## scale(BMI, center = T)   0.7759     0.0954   8.133
## 
## Correlation of Fixed Effects:
##             (Intr)
## sc(BMI,c=T) 0.797

Results

-The grand mean (again) is 1.452.

-The grand slope across Providers is 0.776. This means that a one unit increase in the level-1 predictor, BMI, increases bloodsugar by 0.776.

-The Provider random slope variance around the grand slope (how much Providers vary from each other) is 0.805.

-The variance of the random intercept between providers around the grand intercept is 3.135.

This suggests that there is much more variation around the intercepts (means) of the Providers compared to the variance of the Providers around the grand slope.

Compare Models

Out of the three models we selected, which one is the best? One way of evaluating this is through the -2LL statistic (often called the deviance) is an indicator of how much unexplained information there is after the model has been fitted, with large values of -2LL indicating poorly fitting models. We can obtain this statistic using the familiar ‘anova’ command with our 3 fitted models.

anova(uncon, modcenter.1, modcenter.2)

## Data: dat
## Models:
## uncon: bloodsugar ~ 1 + (1 | Provider)
## modcenter.1: bloodsugar ~ scale(BMI, center = T) + (1 | Provider)
## modcenter.2: bloodsugar ~ scale(BMI, center = T) + (BMI | Provider)
##             Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## uncon        3 3951.4 3966.1 -1972.7   3945.4                             
## modcenter.1  4 3777.3 3796.9 -1884.6   3769.3 176.07      1  < 2.2e-16 ***
## modcenter.2  6 3343.5 3372.9 -1665.7   3331.5 437.81      2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results

Our random intercept random slope model (modcenter.2) appears to fit the best based on the AIC, BIC, and deviance values.