Class5_Multilevel Model

Class 5 Multilevel Model (Part 1)

Prepare the data

## Prepare the dataset
library(haven)
Cosmetic_Surgery <- read_sav("Cosmetic_Surgery.sav")
head(Cosmetic_Surgery)

## # A tibble: 6 x 9
##      ID Post_QoL Base_QoL    Surgery Clinic   Age   BDI      Reason  Gender
##   <dbl>    <dbl>    <dbl>  <dbl+lbl>  <dbl> <dbl> <dbl>   <dbl+lbl> <dbl+l>
## 1     1     71.3       73 0 [Waitin…      1    31    12 0 [Change … 0 [Fem…
## 2     2     77         74 0 [Waitin…      1    32    16 0 [Change … 0 [Fem…
## 3     3     73         80 0 [Waitin…      1    33    13 0 [Change … 0 [Fem…
## 4     4     68.9       76 0 [Waitin…      1    59    11 0 [Change … 1 [Mal…
## 5     5     69         71 0 [Waitin…      1    61    11 0 [Change … 1 [Mal…
## 6     6     68.5       72 0 [Waitin…      1    32    10 1 [Physica… 0 [Fem…

## Take a glance of the data
summary(Cosmetic_Surgery)

##        ID            Post_QoL        Base_QoL        Surgery      
##  Min.   :  1.00   Min.   :40.00   Min.   :43.00   Min.   :0.0000  
##  1st Qu.: 69.75   1st Qu.:52.30   1st Qu.:57.00   1st Qu.:0.0000  
##  Median :138.50   Median :58.00   Median :63.00   Median :0.0000  
##  Mean   :138.50   Mean   :59.61   Mean   :63.56   Mean   :0.4746  
##  3rd Qu.:207.25   3rd Qu.:67.00   3rd Qu.:71.00   3rd Qu.:1.0000  
##  Max.   :276.00   Max.   :88.20   Max.   :91.00   Max.   :1.0000  
##      Clinic            Age             BDI            Reason      
##  Min.   : 1.000   Min.   :18.00   Min.   : 0.00   Min.   :0.0000  
##  1st Qu.: 3.000   1st Qu.:31.00   1st Qu.:10.00   1st Qu.:0.0000  
##  Median : 6.000   Median :38.00   Median :20.00   Median :1.0000  
##  Mean   : 5.754   Mean   :39.17   Mean   :23.05   Mean   :0.6449  
##  3rd Qu.: 8.000   3rd Qu.:48.00   3rd Qu.:36.00   3rd Qu.:1.0000  
##  Max.   :10.000   Max.   :65.00   Max.   :63.00   Max.   :1.0000  
##      Gender      
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4312  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

## Good news, we don't have any missing data in our dataset

Packages for multilevel modelling in R

## There are two famous packages to do the multilevel modeling. One is "nlme", and the other is the "lme4".
## In this class, we will focus on the previous package, which is the "nlme" package.
library(nlme)
## Also, for the graphic display, we need "ggplot2" package
library(ggplot2)
## And for manage the data structure or reshape the data, we need "car" and "reshape" package.
library(car)

## Loading required package: carData

install.packages("reshape")

## 
## The downloaded binary packages are in
##  /var/folders/jx/m17nczh57gb3cmr_r_zgfq1w0000gn/T//Rtmp6U6Uvo/downloaded_packages

library(reshape)

Section 1. Setting up the simple linear model

## At frist, we need to know whether there is a significant difference between clinics in their patient's life quality after the cosmetic surgery.
## We could use a simple regression model here. 
SimpleLinearModel<- lm(Post_QoL~Clinic,data=Cosmetic_Surgery)
summary(SimpleLinearModel)

## 
## Call:
## lm(formula = Post_QoL ~ Clinic, data = Cosmetic_Surgery)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.6125  -6.1622  -0.4786   5.1049  22.0049 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  70.0628     1.0701   65.47   <2e-16 ***
## Clinic       -1.8168     0.1672  -10.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.787 on 274 degrees of freedom
## Multiple R-squared:  0.3011, Adjusted R-squared:  0.2986 
## F-statistic: 118.1 on 1 and 274 DF,  p-value: < 2.2e-16

## The output indicates that there is a significant difference between clinics in their patient's life quality after the cosmetic surgery. F(1,274)=118.1, p<.01.

Section 2. Setting up an Unconditional Model

## Fisrtly, we need to fit the unconditional model, allowing the intercepts to vary across contexts, in this case we want them to vary across clinics.
UnconditionalModel <- lme(Post_QoL ~ 1, data = Cosmetic_Surgery, random =~1|Clinic, method = "ML")
summary(UnconditionalModel)

## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##        AIC      BIC    logLik
##   1911.473 1922.334 -952.7364
## 
## Random effects:
##  Formula: ~1 | Clinic
##         (Intercept) Residual
## StdDev:    5.909691 7.238677
## 
## Fixed effects: Post_QoL ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 60.08377  1.923283 266 31.24022       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.8828507 -0.7606631 -0.1378732  0.7075242  2.8607949 
## 
## Number of Observations: 276
## Number of Groups: 10

## To obtain the -2LL the same with our SPSS section, we just need to simply use the function logLik() and multiple "-2" to achieve the -2LL.
logLik(UnconditionalModel)*-2 # 1905.473 (df=3)

## 'log Lik.' 1905.473 (df=3)

Section 3. Random intercepts model

# Set up a random intercept model using lme() function
RandomInterceptModel <-lme(Post_QoL ~ Surgery + Base_QoL, data = Cosmetic_Surgery, random = ~1|Clinic, method = "ML")
summary(RandomInterceptModel)

## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##       AIC      BIC   logLik
##   1847.49 1865.592 -918.745
## 
## Random effects:
##  Formula: ~1 | Clinic
##         (Intercept) Residual
## StdDev:    3.039264 6.518986
## 
## Fixed effects: Post_QoL ~ Surgery + Base_QoL 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 29.563601  3.471879 264  8.515160  0.0000
## Surgery     -0.312999  0.843145 264 -0.371228  0.7108
## Base_QoL     0.478630  0.052774 264  9.069465  0.0000
##  Correlation: 
##          (Intr) Surgry
## Surgery   0.102       
## Base_QoL -0.947 -0.222
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.8872666 -0.7537675 -0.0954987  0.5657241  3.0020852 
## 
## Number of Observations: 276
## Number of Groups: 10

# Calculate the -2LL 
logLik(RandomInterceptModel)*-2 #1837.49

## 'log Lik.' 1837.49 (df=5)

## Do you still remeber the -2LL from our unconditional model? Now we need use it to make the comparison between two models.
## χ2 Differece = 1905.47 − 1837.49 = 67.98 
## df_change =5−3=1
## If we look at the critical values for the chi-square statistic with 2 degree of freedom in the Appendix, they are 5.99 (p < .05); therefore, this change is highly significant.We choose random intercepts model.
# A simpler way to do such thing is to use the anova() function. This function compares the change in log likelihood without computing the chi-square change as well as the df change.
anova(RandomInterceptModel,UnconditionalModel)

##                      Model df      AIC      BIC    logLik   Test  L.Ratio
## RandomInterceptModel     1  5 1847.490 1865.592 -918.7450                
## UnconditionalModel       2  3 1911.473 1922.334 -952.7364 1 vs 2 67.98286
##                      p-value
## RandomInterceptModel        
## UnconditionalModel    <.0001

# We can use the StdDev in the Model summary to calculate the ICC
RandomInterceptModel_ICC <- 3.04^2/(3.04^2+6.52^2)
RandomInterceptModel_ICC # 17.85

## [1] 0.1785747

# Comparing to our SPSS section, where is the estimates? No worry, use VarCorr() to check these parameters.
VarCorr(RandomInterceptModel)

## Clinic = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept)  9.237126 3.039264
## Residual    42.497179 6.518986

# Not cool enough? No worry, let's build a function to calculate the ICC automatically.
varests <- as.numeric(VarCorr(RandomInterceptModel))
ICClme <- function(out) {
  varests <- as.numeric(VarCorr(out)[1:nrow(VarCorr(out))])
  return(paste("ICC=",varests[1]/ (varests[1]+varests[-1])))
}
# Try the function
ICClme(RandomInterceptModel) # Note, this function could be used to different models.

## [1] "ICC= 0.178549339746615"

# Check the ICC for our unconditional model
ICClme(UnconditionalModel)

## [1] "ICC= 0.39994610805941"

## Base on the anova outcome, χ2(1) = 107.65, p < .0001. We can conclude then that the intercepts vary significantly across the different clinics. Multilevel madness must ensue.

Section 4. Random intercepts and slopes model

RandomInterceptSlopes <-lme(Post_QoL ~ Surgery + Base_QoL, data = Cosmetic_Surgery, random = ~Surgery|Clinic, method = "ML")
summary(RandomInterceptSlopes)

## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##        AIC      BIC    logLik
##   1812.624 1837.967 -899.3119
## 
## Random effects:
##  Formula: ~Surgery | Clinic
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 6.132655 (Intr)
## Surgery     6.197489 -0.965
## Residual    5.912335       
## 
## Fixed effects: Post_QoL ~ Surgery + Base_QoL 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 40.10253  3.892945 264 10.301334  0.0000
## Surgery     -0.65453  2.110917 264 -0.310069  0.7568
## Base_QoL     0.31022  0.053506 264  5.797812  0.0000
##  Correlation: 
##          (Intr) Surgry
## Surgery  -0.430       
## Base_QoL -0.855 -0.063
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.4114778 -0.6628574 -0.1138411  0.6833110  2.8334730 
## 
## Number of Observations: 276
## Number of Groups: 10

logLik(RandomInterceptSlopes)*-2 # 1798.624

## 'log Lik.' 1798.624 (df=7)

# Compare this model with the previous model 
anova(RandomInterceptSlopes,RandomInterceptModel,UnconditionalModel)

##                       Model df      AIC      BIC    logLik   Test  L.Ratio
## RandomInterceptSlopes     1  7 1812.624 1837.966 -899.3119                
## RandomInterceptModel      2  5 1847.490 1865.592 -918.7450 1 vs 2 38.86626
## UnconditionalModel        3  3 1911.473 1922.334 -952.7364 2 vs 3 67.98286
##                       p-value
## RandomInterceptSlopes        
## RandomInterceptModel   <.0001
## UnconditionalModel     <.0001

# Check the ICC for this model
VarCorr(RandomInterceptSlopes)

## Clinic = pdLogChol(Surgery) 
##             Variance StdDev   Corr  
## (Intercept) 37.60946 6.132655 (Intr)
## Surgery     38.40887 6.197489 -0.965
## Residual    34.95570 5.912335

ICClme(RandomInterceptSlopes)

## [1] "ICC= 0.494741991832759" "ICC= 0.518285358979433"

# Becasue we have two independent variables for this model, so we have two estimates for this model, leading to two ICC values.

Section 5. Adding an interaction term to the model

# Watch carefully, we can still use the lme() function. Just add an interaction term in our formula.
InteractionModel <- lme(Post_QoL ~ Surgery + Base_QoL + Reason + Surgery:Reason, data = Cosmetic_Surgery, random = ~Surgery|Clinic, method = "ML")
summary(InteractionModel)

## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##        AIC      BIC    logLik
##   1807.045 1839.629 -894.5226
## 
## Random effects:
##  Formula: ~Surgery | Clinic
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 5.482359 (Intr)
## Surgery     5.417495 -0.946
## Residual    5.818911       
## 
## Fixed effects: Post_QoL ~ Surgery + Base_QoL + Reason + Surgery:Reason 
##                   Value Std.Error  DF   t-value p-value
## (Intercept)    42.51781  3.875317 262 10.971441  0.0000
## Surgery        -3.18768  2.185367 262 -1.458646  0.1459
## Base_QoL        0.30536  0.053125 262  5.747835  0.0000
## Reason         -3.51515  1.140934 262 -3.080939  0.0023
## Surgery:Reason  4.22129  1.700269 262  2.482718  0.0137
##  Correlation: 
##                (Intr) Surgry Bas_QL Reason
## Surgery        -0.356                     
## Base_QoL       -0.865 -0.078              
## Reason         -0.233  0.306  0.065       
## Surgery:Reason  0.096 -0.505  0.024 -0.661
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.2331483 -0.6972193 -0.1541073  0.6326387  3.1641799 
## 
## Number of Observations: 276
## Number of Groups: 10

# Compare this model with the previous model 
anova(InteractionModel,RandomInterceptSlopes,RandomInterceptModel,UnconditionalModel)

##                       Model df      AIC      BIC    logLik   Test  L.Ratio
## InteractionModel          1  9 1807.045 1839.629 -894.5226                
## RandomInterceptSlopes     2  7 1812.624 1837.966 -899.3119 1 vs 2  9.57852
## RandomInterceptModel      3  5 1847.490 1865.592 -918.7450 2 vs 3 38.86626
## UnconditionalModel        4  3 1911.473 1922.334 -952.7364 3 vs 4 67.98286
##                       p-value
## InteractionModel             
## RandomInterceptSlopes  0.0083
## RandomInterceptModel   <.0001
## UnconditionalModel     <.0001

# Calculate the -2LL 
VarCorr(InteractionModel)

## Clinic = pdLogChol(Surgery) 
##             Variance StdDev   Corr  
## (Intercept) 30.05626 5.482359 (Intr)
## Surgery     29.34925 5.417495 -0.946
## Residual    33.85972 5.818911

ICClme(InteractionModel)

## [1] "ICC= 0.505950710632734" "ICC= 0.470246407862322"

# The output shows that adding Reason to the model reduces the -2LL by 9.5782, so we choose the Interaction model.
# The ICC for the Interaction model is 47.02%. This tells us that about 47.02% of the variability in outcome difference occurs between level-2 units (clinic), with the remaining 52.98% occuring within level-2.

Class5_Multilevel Model

Cheng Hua

7/17/2019

Class 5 Multilevel Model (Part 1)

Prepare the data

Packages for multilevel modelling in R

Section 1. Setting up the simple linear model

Section 2. Setting up an Unconditional Model

Section 3. Random intercepts model

Section 4. Random intercepts and slopes model

Section 5. Adding an interaction term to the model