In this example, I am documenting how to formulate equations for random intercepts and slopes in multilevel modeling along with the r code using the nlme package with interpretation of the output. In this current data set there are four variables id, school, sex (i.e. gender), and like (i.e. a measure on a scale 1 to 8 how much the student likes school). Here I am just tidying up the data by making the f and m 1 and 0 respectively as well as getting rid of nonsensical values such as 19.1 for the sex variable. Overall there are 1,380 participants. In this demonstration we will want to predict how much a student likes schools with the covariate sex while accounting for the nesting of students within school and make the model more complicated further into the example. In later models, we will use the level two covariate school funding.

##   id school sex like sf
## 4  1      1   1    2 12
## 5  2      1   1    5 12
## 6  3      1   1    6 12
## 7  4      1   1    3 12
## 8  5      1   1    7 12
## 9  6      1   0    6 12

Here I try to break down what each component of the equation means.

gamma00 is the average or mean like score across the schools. The indexes indicate where the parameter falls in each level. For example, the first 0 in the gamma00 means that it is the intercept for level one. The second zero means that it is the mean or intercept for level two. However, level two intercepts can vary by u0j, which will be discussed next.

u0j is the random deviation from the intercept or initial value for each school. This is a level two error term, which is why it only has the subscript j instead of i and j. It has a j, because j represents all of the 200 schools (i.e. from 1 to j schools). For example the unique deviation from the intercept for school one would be u01.

Beta10 is the average regression coefficient for the change associated with sex variable (sexij). It is the average change we would see given a one unit change in the independent variable (in this case identifying as a female). The first 1 index means that it is the first slope coefficient in level one and the second 0 means that it is the average slope coefficient (i.e. no effect of any level two variables), because in this model the slope is not varying by schools.

eij is the individual level one error term for each individual in each school. \[ Level~1:~~~{y_{ij} = \beta_{0j} + \beta_{10}(sex_{ij}) + e_{ij}}~~~ (1.1)\]

\[ Level~2~Intercept:~~~{\beta_{0j} = \gamma_{00} + u_{0j}} ~~~ (1.2)\]

\[Mixed~model: ~~~{y_{ij} = \gamma_{00} + \beta_{10}(sex_{ij}) + u_{0j} + e_{ij}} ~~~(1.3)\]

Here is the r-code for the multilevel model described in equation 1 using the nlme package. There is the fixed part, which will have the average intercept and the parameter estimate for sex across the schools. To model the random intercepts, we use the 1 to signify that this is a random intercepts model where we want an individual intercept for each school.

library(nlme)
model1 = lme(fixed = like ~ sex, random = ~1 | school, data = dat)
sumModel1 = summary(model1); sumModel1
## Linear mixed-effects model fit by REML
##  Data: dat 
##        AIC      BIC    logLik
##   5588.159 5609.072 -2790.079
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept) Residual
## StdDev:   0.3873544 1.802345
## 
## Fixed effects: like ~ sex 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  5.165005 0.09943611 1349 51.94295  0.0000
## sex         -0.172538 0.10093339 1349 -1.70943  0.0876
##  Correlation: 
##     (Intr)
## sex -0.506
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.5297625 -0.6484376  0.1676498  0.6158628  2.1095839 
## 
## Number of Observations: 1380
## Number of Groups: 30

We are using REML, because it accounts for the included parameters correctly in the standard errors (i.e. making them larger by accounting for the reduced degrees of freedom).

AIC, BIC, and logLik are all model comparison statistics.

Random effects: The random effects demonstrate the amount of variance (they are reported in standard deviations) in the dependent variable associated with the schools (intercept) and the individual (residual). These variances can be used to measure the amount of variance associated between and within school using the following equation 2: \[{\rho_{1} = {\tau^2 / (\tau^2 +\sigma^2)} }~~~ (2)\]

When we plug in the random effect standard deviation values we need to square them to get the variances. We see that almost 20% of the variation in the dependent variable is associated with differences between schools. Therefore, there is some indication that students in schools are more alike on the like dependent variable. \[{ .177= {0.387 / (0.387 +1.802))} }\] Fixed effects: These are interpreted the same as parameter estimates in a regular single level regression.

Correlation (inter): The correlation between the fixed intercept and fixed slope for sex. If the value is above zero, then we could say that as the intercept increases so does the slope and the opposite for a negative relationship. In our case, as the intercept or mean for like move up, we see a decrease in the slope for female.

Standardized residuals: These are the residuals scaled by the standard deviation.

Now we can estimate the random slope model. The random slope model adds one unique component which is u1j. This is the random level two error for the slope allowing the slope to differ for each school for the covariate female. It is indexed with j, because it can vary for each school. It is multiplied by the covariate value of sex, because it is a slope therefore we need to multiply it by the covariate for which it is measuring the slope. \[ Level~1:~~~{y_{ij} = \beta_{0j} + \beta_{1j}(sex_{ij}) + e_{ij}}~~~ (3.1)\]

\[ Level~2~Intercept:~~~{\beta_{0j} = \gamma_{00} + u_{0j}} ~~~ (3.2)\]

\[ Level~2~Slope:~~~{\beta_{1j} = \gamma_{10} + u_{1j}} ~~~ (3.2)\] \[ Mixed~model:~~~{y_{ij} = \gamma_{00} + \gamma_{10}(sex_{ij}) +u_{0j} + u_{1j}(sex_{ij}) + e_{ij}}~~~ 3.3\]

For this model, in nlme, instead of using one, we now use the term that we want to have a random slope which is sex.

model2 = lme(fixed = like ~ sex, random = ~ sex | school, data = dat)
sumModel2 = summary(model2); sumModel2
## Linear mixed-effects model fit by REML
##  Data: dat 
##        AIC      BIC    logLik
##   5587.354 5618.725 -2787.677
## 
## Random effects:
##  Formula: ~sex | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.4884554 (Intr)
## sex         0.4409793 -0.672
## Residual    1.7907265       
## 
## Fixed effects: like ~ sex 
##                 Value Std.Error   DF  t-value p-value
## (Intercept)  5.175373 0.1135385 1349 45.58254  0.0000
## sex         -0.167497 0.1287471 1349 -1.30098  0.1935
##  Correlation: 
##     (Intr)
## sex -0.674
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.6382774 -0.6890349  0.1538854  0.6525723  2.0776952 
## 
## Number of Observations: 1380
## Number of Groups: 30

The only new parameter provided is the stdDev for the sex variable, which is the standard deviation in the slope coefficients for sex for each school.

For the next model let us revert back to the random intercepts only model found in equations 1. Now we want to estimate the effects of a level two variable school funding. This parameter estimate is located in the level two intercept section, because in this model, we are assuming that school funding does not affect the level one slope coefficient, sex; however, we are assuming it does affect the mean (i.e. intercept) for like for each school. One new part of this equation is gamma01, which is the average slope coefficient for the level two school funding variable. The second new part is found in the mixed model and it is the covariate school funding (sf), which is multiplied by gamma01, which is its slope coefficient. \[ Level~1:~~~{y_{ij} = \beta_{0j} + \beta_{10}(sex_{ij}) + e_{ij}}~~~ (4.1)\]

\[ Level~2~Intercept:~~~{\beta_{0j} = \gamma_{00} + \gamma_{01}(sf_{j}) + u_{0j}} ~~~ (4.2)\]

\[Mixed~model: ~~~{y_{ij} = \gamma_{00} + \beta_{10}(sex_{ij}) +\gamma_{01}(sf_{j}) + u_{0j} + e_{ij}} ~~~(4.3)\]

Modeling a level two covariate with random intercepts is easy in nlme. We just include the level two covariate in the model and nlme does the rest. The new variable is the school funding (sf) variable which is centered at 0 and has a standard deviation of 20.

model3 = lme(fixed = like ~ sex + sf, random = ~ sex | school, data = dat)
sumModel3 = summary(model3); sumModel3
## Linear mixed-effects model fit by REML
##  Data: dat 
##        AIC      BIC    logLik
##   5593.502 5630.096 -2789.751
## 
## Random effects:
##  Formula: ~sex | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.5049574 (Intr)
## sex         0.4356761 -0.69 
## Residual    1.7872192       
## 
## Fixed effects: like ~ sex + sf 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  5.151508 0.11629448 1348 44.29710  0.0000
## sex         -0.183337 0.12815881 1348 -1.43055  0.1528
## sf           0.006920 0.00288272 1348  2.40052  0.0165
##  Correlation: 
##     (Intr) sex   
## sex -0.672       
## sf  -0.086 -0.048
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.6464944 -0.6817821  0.1511480  0.6538521  2.0563392 
## 
## Number of Observations: 1380
## Number of Groups: 30

Let us slightly change the data set so that we can create a model that makes sense for a random slopes model with the level two school funding variable. Let us change the covariate sex to intervention (interven). Now we can extend the model in equation 3 to include a random slopes component where school funding influences the slope of the coefficient for the intervention. Below is the new set of equations: \[ Level~1:~~~{y_{ij} = \beta_{0j} + \beta_{1j}(interven_{ij}) + e_{ij}}~~~ (5.1)\]

\[ Level~2~Intercept:~~~{\beta_{0j} = \gamma_{00} + \gamma_{01}(sf_{j}) + u_{0j}} ~~~ (5.2)\]

\[ Level~2~Slope:~~~{\beta_{1j} = \gamma_{10} +\gamma_{11}(sf_{j}) + u_{1j}} ~~~ (5.3)\]

\[Mixed~model: ~~~{y_{ij} = \gamma_{00} + \ + \gamma_{10}(interven_{ij}) +\gamma_{01}(sf_{j}) + \gamma_{11}(sf_{j})(interven_{ij}) + u_{0j} +u_{1j}(interven_{ij}) + e_{ij}} ~~~(5.4)\]

The only new part is gamma11, which is the slope coefficient for the level two variable school funding and its interaction with intervention and is interpreted as a standard interaction regression coefficient.

Now we have the full mixed model with the level two covariate school funding interaction effect with the intervention covariate included in the fixed effects part of the model. It is still relatively easy to implement this model in nlme and just add the interaction effect between intervention and school funding.

colnames(dat) = c("id", "school", "interven", "like", "sf")
model4 = lme(fixed = like ~ interven + sf + interven*sf, random = ~ interven | school, data = dat)
sumModel4 = summary(model4); sumModel4
## Linear mixed-effects model fit by REML
##  Data: dat 
##        AIC      BIC    logLik
##   5603.353 5645.169 -2793.677
## 
## Random effects:
##  Formula: ~interven | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.4996059 (Intr)
## interven    0.4109411 -0.664
## Residual    1.7876695       
## 
## Fixed effects: like ~ interven + sf + interven * sf 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  5.160148 0.11594937 1347 44.50345  0.0000
## interven    -0.206248 0.12772435 1347 -1.61479  0.1066
## sf           0.004357 0.00399013 1347  1.09199  0.2750
## interven:sf  0.004853 0.00515821 1347  0.94081  0.3470
##  Correlation: 
##             (Intr) intrvn sf    
## interven    -0.658              
## sf          -0.123  0.094       
## interven:sf  0.087 -0.188 -0.689
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.6477755 -0.6902642  0.1582382  0.6603740  2.0359215 
## 
## Number of Observations: 1380
## Number of Groups: 30

As expected the only new parameter of interest is the fixed effect interaction term between school funding and the student’s intervention status.

References: Finch, W. H., Bolin, J. E., & Kelley, K. (2014). Multilevel modeling using R. Crc Press.