Class 5 Multilevel Model (Part 1)
Prepare the data
## Prepare the dataset
library(haven)
Cosmetic_Surgery <- read_sav("Cosmetic_Surgery.sav")
head(Cosmetic_Surgery)
## # A tibble: 6 x 9
## ID Post_QoL Base_QoL Surgery Clinic Age BDI Reason Gender
## <dbl> <dbl> <dbl> <dbl+lbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+l>
## 1 1 71.3 73 0 [Waitin… 1 31 12 0 [Change … 0 [Fem…
## 2 2 77 74 0 [Waitin… 1 32 16 0 [Change … 0 [Fem…
## 3 3 73 80 0 [Waitin… 1 33 13 0 [Change … 0 [Fem…
## 4 4 68.9 76 0 [Waitin… 1 59 11 0 [Change … 1 [Mal…
## 5 5 69 71 0 [Waitin… 1 61 11 0 [Change … 1 [Mal…
## 6 6 68.5 72 0 [Waitin… 1 32 10 1 [Physica… 0 [Fem…
## Take a glance of the data
summary(Cosmetic_Surgery)
## ID Post_QoL Base_QoL Surgery
## Min. : 1.00 Min. :40.00 Min. :43.00 Min. :0.0000
## 1st Qu.: 69.75 1st Qu.:52.30 1st Qu.:57.00 1st Qu.:0.0000
## Median :138.50 Median :58.00 Median :63.00 Median :0.0000
## Mean :138.50 Mean :59.61 Mean :63.56 Mean :0.4746
## 3rd Qu.:207.25 3rd Qu.:67.00 3rd Qu.:71.00 3rd Qu.:1.0000
## Max. :276.00 Max. :88.20 Max. :91.00 Max. :1.0000
## Clinic Age BDI Reason
## Min. : 1.000 Min. :18.00 Min. : 0.00 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:31.00 1st Qu.:10.00 1st Qu.:0.0000
## Median : 6.000 Median :38.00 Median :20.00 Median :1.0000
## Mean : 5.754 Mean :39.17 Mean :23.05 Mean :0.6449
## 3rd Qu.: 8.000 3rd Qu.:48.00 3rd Qu.:36.00 3rd Qu.:1.0000
## Max. :10.000 Max. :65.00 Max. :63.00 Max. :1.0000
## Gender
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4312
## 3rd Qu.:1.0000
## Max. :1.0000
## Good news, we don't have any missing data in our dataset
Packages for multilevel modelling in R
## There are two famous packages to do the multilevel modeling. One is "nlme", and the other is the "lme4".
## In this class, we will focus on the previous package, which is the "nlme" package.
library(nlme)
## Also, for the graphic display, we need "ggplot2" package
library(ggplot2)
## And for manage the data structure or reshape the data, we need "car" and "reshape" package.
library(car)
## Loading required package: carData
install.packages("reshape")
##
## The downloaded binary packages are in
## /var/folders/jx/m17nczh57gb3cmr_r_zgfq1w0000gn/T//Rtmp6U6Uvo/downloaded_packages
library(reshape)
Section 1. Setting up the simple linear model
## At frist, we need to know whether there is a significant difference between clinics in their patient's life quality after the cosmetic surgery.
## We could use a simple regression model here.
SimpleLinearModel<- lm(Post_QoL~Clinic,data=Cosmetic_Surgery)
summary(SimpleLinearModel)
##
## Call:
## lm(formula = Post_QoL ~ Clinic, data = Cosmetic_Surgery)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.6125 -6.1622 -0.4786 5.1049 22.0049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.0628 1.0701 65.47 <2e-16 ***
## Clinic -1.8168 0.1672 -10.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.787 on 274 degrees of freedom
## Multiple R-squared: 0.3011, Adjusted R-squared: 0.2986
## F-statistic: 118.1 on 1 and 274 DF, p-value: < 2.2e-16
## The output indicates that there is a significant difference between clinics in their patient's life quality after the cosmetic surgery. F(1,274)=118.1, p<.01.
Section 2. Setting up an Unconditional Model
## Fisrtly, we need to fit the unconditional model, allowing the intercepts to vary across contexts, in this case we want them to vary across clinics.
UnconditionalModel <- lme(Post_QoL ~ 1, data = Cosmetic_Surgery, random =~1|Clinic, method = "ML")
summary(UnconditionalModel)
## Linear mixed-effects model fit by maximum likelihood
## Data: Cosmetic_Surgery
## AIC BIC logLik
## 1911.473 1922.334 -952.7364
##
## Random effects:
## Formula: ~1 | Clinic
## (Intercept) Residual
## StdDev: 5.909691 7.238677
##
## Fixed effects: Post_QoL ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 60.08377 1.923283 266 31.24022 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.8828507 -0.7606631 -0.1378732 0.7075242 2.8607949
##
## Number of Observations: 276
## Number of Groups: 10
## To obtain the -2LL the same with our SPSS section, we just need to simply use the function logLik() and multiple "-2" to achieve the -2LL.
logLik(UnconditionalModel)*-2 # 1905.473 (df=3)
## 'log Lik.' 1905.473 (df=3)
Section 3. Random intercepts model
# Set up a random intercept model using lme() function
RandomInterceptModel <-lme(Post_QoL ~ Surgery + Base_QoL, data = Cosmetic_Surgery, random = ~1|Clinic, method = "ML")
summary(RandomInterceptModel)
## Linear mixed-effects model fit by maximum likelihood
## Data: Cosmetic_Surgery
## AIC BIC logLik
## 1847.49 1865.592 -918.745
##
## Random effects:
## Formula: ~1 | Clinic
## (Intercept) Residual
## StdDev: 3.039264 6.518986
##
## Fixed effects: Post_QoL ~ Surgery + Base_QoL
## Value Std.Error DF t-value p-value
## (Intercept) 29.563601 3.471879 264 8.515160 0.0000
## Surgery -0.312999 0.843145 264 -0.371228 0.7108
## Base_QoL 0.478630 0.052774 264 9.069465 0.0000
## Correlation:
## (Intr) Surgry
## Surgery 0.102
## Base_QoL -0.947 -0.222
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.8872666 -0.7537675 -0.0954987 0.5657241 3.0020852
##
## Number of Observations: 276
## Number of Groups: 10
# Calculate the -2LL
logLik(RandomInterceptModel)*-2 #1837.49
## 'log Lik.' 1837.49 (df=5)
## Do you still remeber the -2LL from our unconditional model? Now we need use it to make the comparison between two models.
## χ2 Differece = 1905.47 − 1837.49 = 67.98
## df_change =5−3=1
## If we look at the critical values for the chi-square statistic with 2 degree of freedom in the Appendix, they are 5.99 (p < .05); therefore, this change is highly significant.We choose random intercepts model.
# A simpler way to do such thing is to use the anova() function. This function compares the change in log likelihood without computing the chi-square change as well as the df change.
anova(RandomInterceptModel,UnconditionalModel)
## Model df AIC BIC logLik Test L.Ratio
## RandomInterceptModel 1 5 1847.490 1865.592 -918.7450
## UnconditionalModel 2 3 1911.473 1922.334 -952.7364 1 vs 2 67.98286
## p-value
## RandomInterceptModel
## UnconditionalModel <.0001
# We can use the StdDev in the Model summary to calculate the ICC
RandomInterceptModel_ICC <- 3.04^2/(3.04^2+6.52^2)
RandomInterceptModel_ICC # 17.85
## [1] 0.1785747
# Comparing to our SPSS section, where is the estimates? No worry, use VarCorr() to check these parameters.
VarCorr(RandomInterceptModel)
## Clinic = pdLogChol(1)
## Variance StdDev
## (Intercept) 9.237126 3.039264
## Residual 42.497179 6.518986
# Not cool enough? No worry, let's build a function to calculate the ICC automatically.
varests <- as.numeric(VarCorr(RandomInterceptModel))
ICClme <- function(out) {
varests <- as.numeric(VarCorr(out)[1:nrow(VarCorr(out))])
return(paste("ICC=",varests[1]/ (varests[1]+varests[-1])))
}
# Try the function
ICClme(RandomInterceptModel) # Note, this function could be used to different models.
## [1] "ICC= 0.178549339746615"
# Check the ICC for our unconditional model
ICClme(UnconditionalModel)
## [1] "ICC= 0.39994610805941"
## Base on the anova outcome, χ2(1) = 107.65, p < .0001. We can conclude then that the intercepts vary significantly across the different clinics. Multilevel madness must ensue.
Section 4. Random intercepts and slopes model
RandomInterceptSlopes <-lme(Post_QoL ~ Surgery + Base_QoL, data = Cosmetic_Surgery, random = ~Surgery|Clinic, method = "ML")
summary(RandomInterceptSlopes)
## Linear mixed-effects model fit by maximum likelihood
## Data: Cosmetic_Surgery
## AIC BIC logLik
## 1812.624 1837.967 -899.3119
##
## Random effects:
## Formula: ~Surgery | Clinic
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 6.132655 (Intr)
## Surgery 6.197489 -0.965
## Residual 5.912335
##
## Fixed effects: Post_QoL ~ Surgery + Base_QoL
## Value Std.Error DF t-value p-value
## (Intercept) 40.10253 3.892945 264 10.301334 0.0000
## Surgery -0.65453 2.110917 264 -0.310069 0.7568
## Base_QoL 0.31022 0.053506 264 5.797812 0.0000
## Correlation:
## (Intr) Surgry
## Surgery -0.430
## Base_QoL -0.855 -0.063
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.4114778 -0.6628574 -0.1138411 0.6833110 2.8334730
##
## Number of Observations: 276
## Number of Groups: 10
logLik(RandomInterceptSlopes)*-2 # 1798.624
## 'log Lik.' 1798.624 (df=7)
# Compare this model with the previous model
anova(RandomInterceptSlopes,RandomInterceptModel,UnconditionalModel)
## Model df AIC BIC logLik Test L.Ratio
## RandomInterceptSlopes 1 7 1812.624 1837.966 -899.3119
## RandomInterceptModel 2 5 1847.490 1865.592 -918.7450 1 vs 2 38.86626
## UnconditionalModel 3 3 1911.473 1922.334 -952.7364 2 vs 3 67.98286
## p-value
## RandomInterceptSlopes
## RandomInterceptModel <.0001
## UnconditionalModel <.0001
# Check the ICC for this model
VarCorr(RandomInterceptSlopes)
## Clinic = pdLogChol(Surgery)
## Variance StdDev Corr
## (Intercept) 37.60946 6.132655 (Intr)
## Surgery 38.40887 6.197489 -0.965
## Residual 34.95570 5.912335
ICClme(RandomInterceptSlopes)
## [1] "ICC= 0.494741991832759" "ICC= 0.518285358979433"
# Becasue we have two independent variables for this model, so we have two estimates for this model, leading to two ICC values.
Section 5. Adding an interaction term to the model
# Watch carefully, we can still use the lme() function. Just add an interaction term in our formula.
InteractionModel <- lme(Post_QoL ~ Surgery + Base_QoL + Reason + Surgery:Reason, data = Cosmetic_Surgery, random = ~Surgery|Clinic, method = "ML")
summary(InteractionModel)
## Linear mixed-effects model fit by maximum likelihood
## Data: Cosmetic_Surgery
## AIC BIC logLik
## 1807.045 1839.629 -894.5226
##
## Random effects:
## Formula: ~Surgery | Clinic
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 5.482359 (Intr)
## Surgery 5.417495 -0.946
## Residual 5.818911
##
## Fixed effects: Post_QoL ~ Surgery + Base_QoL + Reason + Surgery:Reason
## Value Std.Error DF t-value p-value
## (Intercept) 42.51781 3.875317 262 10.971441 0.0000
## Surgery -3.18768 2.185367 262 -1.458646 0.1459
## Base_QoL 0.30536 0.053125 262 5.747835 0.0000
## Reason -3.51515 1.140934 262 -3.080939 0.0023
## Surgery:Reason 4.22129 1.700269 262 2.482718 0.0137
## Correlation:
## (Intr) Surgry Bas_QL Reason
## Surgery -0.356
## Base_QoL -0.865 -0.078
## Reason -0.233 0.306 0.065
## Surgery:Reason 0.096 -0.505 0.024 -0.661
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.2331483 -0.6972193 -0.1541073 0.6326387 3.1641799
##
## Number of Observations: 276
## Number of Groups: 10
# Compare this model with the previous model
anova(InteractionModel,RandomInterceptSlopes,RandomInterceptModel,UnconditionalModel)
## Model df AIC BIC logLik Test L.Ratio
## InteractionModel 1 9 1807.045 1839.629 -894.5226
## RandomInterceptSlopes 2 7 1812.624 1837.966 -899.3119 1 vs 2 9.57852
## RandomInterceptModel 3 5 1847.490 1865.592 -918.7450 2 vs 3 38.86626
## UnconditionalModel 4 3 1911.473 1922.334 -952.7364 3 vs 4 67.98286
## p-value
## InteractionModel
## RandomInterceptSlopes 0.0083
## RandomInterceptModel <.0001
## UnconditionalModel <.0001
# Calculate the -2LL
VarCorr(InteractionModel)
## Clinic = pdLogChol(Surgery)
## Variance StdDev Corr
## (Intercept) 30.05626 5.482359 (Intr)
## Surgery 29.34925 5.417495 -0.946
## Residual 33.85972 5.818911
ICClme(InteractionModel)
## [1] "ICC= 0.505950710632734" "ICC= 0.470246407862322"
# The output shows that adding Reason to the model reduces the -2LL by 9.5782, so we choose the Interaction model.
# The ICC for the Interaction model is 47.02%. This tells us that about 47.02% of the variability in outcome difference occurs between level-2 units (clinic), with the remaining 52.98% occuring within level-2.