Multilevel Modeling in R
Homework Questions
The attached data file has observations from 1000 patients at risk of developing diabetes. The variables in the file are: blood sugar level, BMI, W, XM, and Provider.
- The outcome variable is blood sugar, the explanatory variable is each patient’s BMI, and patients are nested in providers. (You can ignore the ‘W’ & ‘XM’ variables).
Packages needed for this assignment
We will be conducting our MLM analyses using the ‘lme4’ package.
- The analyses will be excuted with the ‘lmer’ function.
To begin, read in the data & change variable names appropriately
dat<- read.table("ex9.2a.dat", quote="\"", comment.char="", header=F)
names(dat)<- c("bloodsugar","BMI", "W", "XM", "Provider") #Assign vbl names - ignore "W" and "XM"
head(dat) #make sure names worked correctly## bloodsugar BMI W XM Provider
## 1 -2.376597 0.360803 -2.558013 -1.368350 1
## 2 -1.069854 -1.090049 -2.558013 -1.368350 1
## 3 -1.184909 0.461845 -2.558013 -1.368350 1
## 4 -0.571297 -0.211070 -2.558013 -1.368350 1
## 5 -0.450361 -0.716481 -2.558013 -1.368350 1
## 6 -1.901163 -0.504917 -0.885252 -0.163053 2
Set grouping variable as a factor
Unconditional Model
Before even going through the steps of MLM, we first want to check and see if we even have enough between-group variance in our DV to proceed. To do so, we simply fit the unconditional (baseline) model, and then use this to estimate our Intraclass Coefficient. There are no predictors in this model!
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bloodsugar ~ 1 + (1 | Provider)
## Data: dat
## AIC BIC logLik deviance df.resid
## 3951.355 3966.078 -1972.678 3945.355 997
## Random effects:
## Groups Name Std.Dev.
## Provider (Intercept) 1.787
## Residual 1.512
## Number of obs: 1000, groups: Provider, 110
## Fixed Effects:
## (Intercept)
## 1.468
-The fixed effects represent our grand mean of bloodsugar, in this case, 1.468.
- Our between-level variance is 1.787^2 (3.183).
- Our within-level variance is 1.787^2 (2.2861) - this is captured in the residual variance.
Overall, there appears to be more between variance across providers in bloodsugar compared to within variance of bloodsugar for each individual provider. This is good, and should also be reflected in our ICC calculation.
Hand Calculating the ICC Based on Above Output:
- Because the output yields the SDs, we can square those values to get the variance that will then be used to calculate the ICC.
- As a reminder, we want the ICC to be larger. A small ICC suggests that observations are independent (i.e., no evidence for a clustering effect, and thus there is no reason to go forward with MLM). These values can range anywhere from 0-1.
## [1] 0.585094
Calculating the ICC via R in
If we prefer to not do the hand calculation, we can use the “ICCest” command in the ‘ICC’ package to calculate the ICC. All we need to plug in is our cluster variable and our outcome variable of interest.
ICCest(Provider, #grouping variable goes here
bloodsugar, #between level variable of interest here
data = dat,
alpha = 0.05,
CI.type = c("THD", "Smith"))## $ICC
## [1] 0.5901581
##
## $LowerCI
## [1] 0.5185508
##
## $UpperCI
## [1] 0.6642699
##
## $N
## [1] 110
##
## $k
## [1] 9.077982
##
## $varw
## [1] 2.28843
##
## $vara
## [1] 3.29526
R gives us a similar result from our hand calculation (ICC = .59)
Random Effects ANCOVA (Random Intercept, Fixed Predictor At Level 1)
For the next model, we add a level-1 predictor. We do this by replacing the ‘1’ of the previous model with the predictor (i.e., BMI). An intercept in this case is always assumed, so it is still estimated here (but only needs to be specified when there are no other added predictors).
- Make sure to do REML=FALSE so the loglikelihood is computed for you.
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bloodsugar ~ BMI + (1 | Provider)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 3777.3 3796.9 -1884.6 3769.3 996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6787 -0.6194 0.0192 0.6045 4.0876
##
## Random effects:
## Groups Name Variance Std.Dev.
## Provider (Intercept) 3.129 1.769
## Residual 1.885 1.373
## Number of obs: 1000, groups: Provider, 110
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.45422 0.17508 8.306
## BMI 0.61737 0.04433 13.927
##
## Correlation of Fixed Effects:
## (Intr)
## BMI -0.005
To Grand Mean Center, or Not to Grand Mean Center? That is the Question…
There are various reasons that make sense to grand mean center - but this is dependent also on your research question and variables you are using in the model. Essentially what grand mean centering does is subtracts the grand mean of the predictor (in this example, BMI) using the mean from the full sample. If we wanted to grand mean center our predictor, we can add the ‘scaling’ command in the same lmer regression equation from above. For this assignment, both ways our demonstrated, but results will be interpreted with the grand mean centering option.
- You could also grand mean center the predictor manually and save it as an object like so:
Same Random Effects ANCOVA with Grand Mean Centering of the Predictor
modcenter.1 <- lmer(bloodsugar ~ scale(BMI, center=T) + (1 | Provider), data=dat, REML=F)
summary(modcenter.1)## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bloodsugar ~ scale(BMI, center = T) + (1 | Provider)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 3777.3 3796.9 -1884.6 3769.3 996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6787 -0.6194 0.0192 0.6045 4.0876
##
## Random effects:
## Groups Name Variance Std.Dev.
## Provider (Intercept) 3.129 1.769
## Residual 1.885 1.373
## Number of obs: 1000, groups: Provider, 110
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.47116 0.17508 8.403
## scale(BMI, center = T) 0.63063 0.04528 13.927
##
## Correlation of Fixed Effects:
## (Intr)
## sc(BMI,c=T) 0.002
Results
-The “Fixed Effects” section contains our values of γ00 (grand mean) and the grand slope of our level-1 predictor (γ10). Again, the grand mean is still ~1.47, and our grand slope of BMI is 0.63.
- This means that an individual’s predicted bloodsugar level has a 0.63 increase for every one unit increase in the individual’s BMI.
-Our within-level residual variance of bloodsugar was 1.885.
-The between-level variance (the variance of the Providers around the grand mean) of bloodsugar was estimated to be 3.129.
- This suggests that a substantial amount of the variability in bloodsugar scores can be explained by individual differences of BMI’s.
Random Intercept Random Slope (Random Coefficients Model)
This model contains a random intercept at the individual level & a predictor that is allowed to vary between groups (i.e., Providers). In other words, the effect of BMI on bloodsugar is allowed to vary between Providers. In order to estimate this model, the ‘1’ that indicates the intercept in the random part of the model specification is replaced by the variable (i.e., BMI) of which we want the effect to vary between the groups (i.e., Providers).
mod.2<-lmer(bloodsugar ~ BMI + (BMI | Provider), data=dat, REML=F)
summary(mod.2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bloodsugar ~ BMI + (BMI | Provider)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 3343.5 3372.9 -1665.7 3331.5 994
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5399 -0.6434 -0.0006 0.6694 3.2168
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Provider (Intercept) 3.1346 1.770
## BMI 0.8046 0.897 0.88
## Residual 1.0308 1.015
## Number of obs: 1000, groups: Provider, 110
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.4311 0.1726 8.292
## BMI 0.7596 0.0934 8.133
##
## Correlation of Fixed Effects:
## (Intr)
## BMI 0.791Same Model from Above, Just With Grand Mean Centering of the Predictor:
modcenter.2<-lmer(bloodsugar ~ scale(BMI, center=T) + (BMI | Provider), data=dat, REML=F)
summary(modcenter.2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bloodsugar ~ scale(BMI, center = T) + (BMI | Provider)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 3343.5 3372.9 -1665.7 3331.5 994
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5399 -0.6434 -0.0006 0.6694 3.2168
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Provider (Intercept) 3.1346 1.770
## BMI 0.8046 0.897 0.88
## Residual 1.0308 1.015
## Number of obs: 1000, groups: Provider, 110
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.4520 0.1746 8.315
## scale(BMI, center = T) 0.7759 0.0954 8.133
##
## Correlation of Fixed Effects:
## (Intr)
## sc(BMI,c=T) 0.797Results
-The grand mean (again) is 1.452.
-The grand slope across Providers is 0.776. This means that a one unit increase in the level-1 predictor, BMI, increases bloodsugar by 0.776.
-The Provider random slope variance around the grand slope (how much Providers vary from each other) is 0.805.
-The variance of the random intercept between providers around the grand intercept is 3.135.
- This suggests that there is much more variation around the intercepts (means) of the Providers compared to the variance of the Providers around the grand slope.
Compare Models
Out of the three models we selected, which one is the best? One way of evaluating this is through the -2LL statistic (often called the deviance) is an indicator of how much unexplained information there is after the model has been fitted, with large values of -2LL indicating poorly fitting models. We can obtain this statistic using the familiar ‘anova’ command with our 3 fitted models.
## Data: dat
## Models:
## uncon: bloodsugar ~ 1 + (1 | Provider)
## modcenter.1: bloodsugar ~ scale(BMI, center = T) + (1 | Provider)
## modcenter.2: bloodsugar ~ scale(BMI, center = T) + (BMI | Provider)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## uncon 3 3951.4 3966.1 -1972.7 3945.4
## modcenter.1 4 3777.3 3796.9 -1884.6 3769.3 176.07 1 < 2.2e-16 ***
## modcenter.2 6 3343.5 3372.9 -1665.7 3331.5 437.81 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results
Our random intercept random slope model (modcenter.2) appears to fit the best based on the AIC, BIC, and deviance values.