This week, we are going to use a larger version of the HSB dataset back from 1988. Same structure- students nested within schools. This is the same dataset that Garson uses in Chapter 3, so you can recreate what he has as well.

Load Some Packages to Help with the Analysis and Data Management:

library(tidyverse) 
library(Hmisc) # label()
library(lme4)
library(lmerTest)

Load in the Data and Use glimpse to Summarize the Data Structure

hsb <- read_csv("hsbmerged.csv")
Parsed with column specification:
cols(
  schoolid = col_double(),
  minority = col_double(),
  female = col_double(),
  ses = col_double(),
  mathach = col_double(),
  size = col_double(),
  sector = col_double(),
  pracad = col_double(),
  disclim = col_double(),
  himinty = col_double(),
  meanses = col_double()
)
glimpse(hsb)
Rows: 7,185
Columns: 11
$ schoolid <dbl> 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1…
$ minority <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ female   <dbl> 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0…
$ ses      <dbl> -1.528, -0.588, -0.528, -0.668, -0.158, 0.022, -0.618, -0.998, -0.888, -0.458, -1.448, -0.658, -0.468, -0.988…
$ mathach  <dbl> 5.876, 19.708, 20.349, 8.781, 17.898, 4.583, -2.832, 0.523, 1.527, 21.521, 9.475, 16.057, 21.178, 20.178, 20.…
$ size     <dbl> 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842, 842,…
$ sector   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ pracad   <dbl> 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0…
$ disclim  <dbl> 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.597, 1.59…
$ himinty  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ meanses  <dbl> -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.428, -0.42…

Quick Data Cleaning and Coding Tips!

The summary command from base R is a quick and easy way to explore a new dataset.

summary(hsb)
    schoolid       minority          female            ses               mathach            size          sector      
 Min.   :1224   Min.   :0.0000   Min.   :0.0000   Min.   :-3.758000   Min.   :-2.832   Min.   : 100   Min.   :0.0000  
 1st Qu.:3020   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:-0.538000   1st Qu.: 7.275   1st Qu.: 565   1st Qu.:0.0000  
 Median :5192   Median :0.0000   Median :1.0000   Median : 0.002000   Median :13.131   Median :1016   Median :0.0000  
 Mean   :5278   Mean   :0.2747   Mean   :0.5282   Mean   : 0.000143   Mean   :12.748   Mean   :1057   Mean   :0.4931  
 3rd Qu.:7342   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 0.602000   3rd Qu.:18.317   3rd Qu.:1436   3rd Qu.:1.0000  
 Max.   :9586   Max.   :1.0000   Max.   :1.0000   Max.   : 2.692000   Max.   :24.993   Max.   :2713   Max.   :1.0000  
     pracad          disclim           himinty        meanses         
 Min.   :0.0000   Min.   :-2.4160   Min.   :0.00   Min.   :-1.188000  
 1st Qu.:0.3200   1st Qu.:-0.8170   1st Qu.:0.00   1st Qu.:-0.317000  
 Median :0.5300   Median :-0.2310   Median :0.00   Median : 0.038000  
 Mean   :0.5345   Mean   :-0.1319   Mean   :0.28   Mean   : 0.006138  
 3rd Qu.:0.7000   3rd Qu.: 0.4600   3rd Qu.:1.00   3rd Qu.: 0.333000  
 Max.   :1.0000   Max.   : 2.7560   Max.   :1.00   Max.   : 0.831000  

Summary gives you a quick, high-level summary of all the variables in a dataset. It is also really useful for your own records. You can just copy and paste into a Word file and have it for your records.

Now, data cleaning and coding! We’re going to learn how to apply labels to (1) variables themselves, and (2) the values within a given variable. Both are useful and a good practice for cleaning up your data.

Let’s give some variables a label:

str(hsb$sector)
 'labelled' num [1:7185] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "label")= chr "School Sector"

Let’s create a factor version of sector and give specific values of sector a label:

str(hsb.clean$sector.factor)
 Factor w/ 2 levels "Public","Parochial": 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "label")= chr "School Sector"
table(hsb.clean$sector.factor)

   Public Parochial 
     3642      3543 

Run summary again - notice how it handles factors…

summary(hsb.clean)
    schoolid       minority          female            ses               mathach            size          sector      
 Min.   :1224   Min.   :0.0000   Min.   :0.0000   Min.   :-3.758000   Min.   :-2.832   Min.   : 100   Min.   :0.0000  
 1st Qu.:3020   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:-0.538000   1st Qu.: 7.275   1st Qu.: 565   1st Qu.:0.0000  
 Median :5192   Median :0.0000   Median :1.0000   Median : 0.002000   Median :13.131   Median :1016   Median :0.0000  
 Mean   :5278   Mean   :0.2747   Mean   :0.5282   Mean   : 0.000143   Mean   :12.748   Mean   :1057   Mean   :0.4931  
 3rd Qu.:7342   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 0.602000   3rd Qu.:18.317   3rd Qu.:1436   3rd Qu.:1.0000  
 Max.   :9586   Max.   :1.0000   Max.   :1.0000   Max.   : 2.692000   Max.   :24.993   Max.   :2713   Max.   :1.0000  
     pracad          disclim           himinty        meanses            sector.factor  female.factor     minority.factor
 Min.   :0.0000   Min.   :-2.4160   Min.   :0.00   Min.   :-1.188000   Public   :3642   Male  :3390   Non-Minority:5211  
 1st Qu.:0.3200   1st Qu.:-0.8170   1st Qu.:0.00   1st Qu.:-0.317000   Parochial:3543   Female:3795   Minority    :1974  
 Median :0.5300   Median :-0.2310   Median :0.00   Median : 0.038000                                                     
 Mean   :0.5345   Mean   :-0.1319   Mean   :0.28   Mean   : 0.006138                                                     
 3rd Qu.:0.7000   3rd Qu.: 0.4600   3rd Qu.:1.00   3rd Qu.: 0.333000                                                     
 Max.   :1.0000   Max.   : 2.7560   Max.   :1.00   Max.   : 0.831000                                                     

Exploring Our Outcome, Match Achievement

Now, let’s take a look at the mathach variable using the describe function from the Hmisc package:

describe(hsb.clean$mathach)
hsb.clean$mathach : Std. Math Score 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
    7185        0     6031        1    12.75     7.91    1.210    3.301    7.275   13.131   18.317   22.016   23.123 

lowest : -2.832 -2.830 -2.721 -2.658 -2.657, highest: 24.677 24.707 24.812 24.869 24.993
hist(hsb.clean$mathach, 
     main = "Distribution of Math Achievement Scores",
     sub = "N = 7,185. Students from the High School and Beyond Survey (1988).",
     xlab = "Standardized Math Achievement Score")

Running a Null Model

model.null <- lmer(mathach ~ (1|schoolid), REML = FALSE, data = hsb.clean)
summary(model.null)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: mathach ~ (1 | schoolid)
   Data: hsb.clean

     AIC      BIC   logLik deviance df.resid 
 47121.8  47142.4 -23557.9  47115.8     7182 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.06262 -0.75365  0.02676  0.76070  2.74184 

Random effects:
 Groups   Name        Variance Std.Dev.
 schoolid (Intercept)  8.553   2.925   
 Residual             39.148   6.257   
Number of obs: 7185, groups:  schoolid, 160

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  12.6371     0.2436 157.6209   51.87   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You also get AIC and BIC, two very common information criterion measures that are used to assess model fit:

Post-Estimation Task: Calculate ICC

These commands are run after you an any kind of MLM. They are called post-estimation commands because they require that you have the model saved in the memory (it is the last thing you ran). If you do something else and try to run them, you will likely get an error.

Here’s a quick and easy way to get our friend the ICC!

null.ICC
[1] 0.1793044

Using lmerTest to Evaluate Random Effects:

rand(model.null)
ANOVA-like table for random-effects: Single term deletions

Model:
mathach ~ (1 | schoolid)
               npar logLik   AIC    LRT Df Pr(>Chisq)    
<none>            3 -23558 47122                         
(1 | schoolid)    2 -24050 48104 983.92  1  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Always Interesting: Compare Our MLM to a Regular Regression

model.regular <- lm(mathach ~ NULL, data = hsb.clean)
summary(model.regular)

Call:
lm(formula = mathach ~ NULL, data = hsb.clean)

Residuals:
Std. Math Score 
     Min       1Q   Median       3Q      Max 
-15.5799  -5.4729   0.3831   5.5691  12.2451 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.74785    0.08115   157.1   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.878 on 7184 degrees of freedom
LS0tCnRpdGxlOiAiTXVsdGlsZXZlbCBNb2RlbGluZywgTW9kdWxlIDM6IEludHJvIHRvIE1MTSBhbmQgdGhlIE51bGwgTW9kZWwiCmF1dGhvcjogIkRyLiBCcm9kYSIKb3V0cHV0OiBodG1sX25vdGVib29rIAotLS0KClRoaXMgd2Vlaywgd2UgYXJlIGdvaW5nIHRvIHVzZSBhIGxhcmdlciB2ZXJzaW9uIG9mIHRoZSBIU0IgZGF0YXNldCBiYWNrIGZyb20gMTk4OC4gU2FtZSBzdHJ1Y3R1cmUtIHN0dWRlbnRzIG5lc3RlZCB3aXRoaW4gc2Nob29scy4gVGhpcyBpcyB0aGUgc2FtZSBkYXRhc2V0IHRoYXQgR2Fyc29uIHVzZXMgaW4gQ2hhcHRlciAzLCBzbyB5b3UgY2FuIHJlY3JlYXRlIHdoYXQgaGUgaGFzIGFzIHdlbGwuCgojIExvYWQgU29tZSBQYWNrYWdlcyB0byBIZWxwIHdpdGggdGhlIEFuYWx5c2lzIGFuZCBEYXRhIE1hbmFnZW1lbnQ6CmBgYHtyLCBwcmludD1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpIApsaWJyYXJ5KEhtaXNjKSAjIGxhYmVsKCkKbGlicmFyeShsbWU0KQpsaWJyYXJ5KGxtZXJUZXN0KQoKYGBgCgojIExvYWQgaW4gdGhlIERhdGEgYW5kIFVzZSBgZ2xpbXBzZWAgdG8gU3VtbWFyaXplIHRoZSBEYXRhIFN0cnVjdHVyZQpgYGB7cn0KaHNiIDwtIHJlYWRfY3N2KCJoc2JtZXJnZWQuY3N2IikKCmdsaW1wc2UoaHNiKQpgYGAKCiMgUXVpY2sgRGF0YSBDbGVhbmluZyBhbmQgQ29kaW5nIFRpcHMhCgojIyBUaGUgYHN1bW1hcnlgIGNvbW1hbmQgZnJvbSBgYmFzZSBSYCBpcyBhIHF1aWNrIGFuZCBlYXN5IHdheSB0byBleHBsb3JlIGEgbmV3IGRhdGFzZXQuCmBgYHtyfQpzdW1tYXJ5KGhzYikKYGBgCgpTdW1tYXJ5IGdpdmVzIHlvdSBhIHF1aWNrLCBoaWdoLWxldmVsIHN1bW1hcnkgb2YgYWxsIHRoZSB2YXJpYWJsZXMgaW4gYSBkYXRhc2V0LiBJdCBpcyBhbHNvIHJlYWxseSB1c2VmdWwgZm9yIHlvdXIgb3duIHJlY29yZHMuIFlvdSBjYW4ganVzdCBjb3B5IGFuZCBwYXN0ZSBpbnRvIGEgV29yZCBmaWxlIGFuZCBoYXZlIGl0IGZvciB5b3VyIHJlY29yZHMuCgpOb3csIGRhdGEgY2xlYW5pbmcgYW5kIGNvZGluZyEgV2UncmUgZ29pbmcgdG8gbGVhcm4gaG93IHRvIGFwcGx5IGxhYmVscyB0byAoMSkgdmFyaWFibGVzIHRoZW1zZWx2ZXMsIGFuZCAoMikgdGhlIHZhbHVlcyB3aXRoaW4gYSBnaXZlbiB2YXJpYWJsZS4gQm90aCBhcmUgdXNlZnVsIGFuZCBhIGdvb2QgcHJhY3RpY2UgZm9yIGNsZWFuaW5nIHVwIHlvdXIgZGF0YS4KCiMjIExldCdzIGdpdmUgc29tZSB2YXJpYWJsZXMgYSBsYWJlbDoKYGBge3J9CmxhYmVsKGhzYiRtYXRoYWNoKSA9ICJTdGQuIE1hdGggU2NvcmUiCmxhYmVsKGhzYiRzZWN0b3IpID0gIlNjaG9vbCBTZWN0b3IiCgpzdHIoaHNiJHNlY3RvcikKYGBgCgojIyBMZXQncyBjcmVhdGUgYSBmYWN0b3IgdmVyc2lvbiBvZiBgc2VjdG9yYCBhbmQgZ2l2ZSBzcGVjaWZpYyB2YWx1ZXMgb2YgYHNlY3RvcmAgYSBsYWJlbDoKYGBge3J9CmhzYi5jbGVhbiA8LSBoc2IgJT4lCiAgbXV0YXRlKC4sCiAgICAgICAgIHNlY3Rvci5mYWN0b3IgPSBhc19mYWN0b3Ioc2VjdG9yKSwKICAgICAgICAgZmVtYWxlLmZhY3RvciA9IGFzX2ZhY3RvcihmZW1hbGUpLAogICAgICAgICBtaW5vcml0eS5mYWN0b3IgPSBhc19mYWN0b3IobWlub3JpdHkpKQoKbGV2ZWxzKGhzYi5jbGVhbiRzZWN0b3IuZmFjdG9yKSA9IGMoIlB1YmxpYyIsICJQYXJvY2hpYWwiKQpsZXZlbHMoaHNiLmNsZWFuJGZlbWFsZS5mYWN0b3IpID0gYygiTWFsZSIsICJGZW1hbGUiKQpsZXZlbHMoaHNiLmNsZWFuJG1pbm9yaXR5LmZhY3RvcikgPSBjKCJOb24tTWlub3JpdHkiLCAiTWlub3JpdHkiKQoKc3RyKGhzYi5jbGVhbiRzZWN0b3IuZmFjdG9yKQpgYGAKCmBgYHtyfQp0YWJsZShoc2IuY2xlYW4kc2VjdG9yLmZhY3RvcikKYGBgCgojIyBSdW4gYHN1bW1hcnlgIGFnYWluIC0gbm90aWNlIGhvdyBpdCBoYW5kbGVzIGZhY3RvcnMuLi4KYGBge3J9CnN1bW1hcnkoaHNiLmNsZWFuKQpgYGAKCgojIEV4cGxvcmluZyBPdXIgT3V0Y29tZSwgTWF0Y2ggQWNoaWV2ZW1lbnQgCgojIyBOb3csIGxldCdzIHRha2UgYSBsb29rIGF0IHRoZSBgbWF0aGFjaGAgdmFyaWFibGUgdXNpbmcgdGhlIGBkZXNjcmliZWAgZnVuY3Rpb24gZnJvbSB0aGUgYEhtaXNjYCBwYWNrYWdlOgpgYGB7cn0KZGVzY3JpYmUoaHNiLmNsZWFuJG1hdGhhY2gpCmBgYAoKYGBge3J9Cmhpc3QoaHNiLmNsZWFuJG1hdGhhY2gsIAogICAgIG1haW4gPSAiRGlzdHJpYnV0aW9uIG9mIE1hdGggQWNoaWV2ZW1lbnQgU2NvcmVzIiwKICAgICBzdWIgPSAiTiA9IDcsMTg1LiBTdHVkZW50cyBmcm9tIHRoZSBIaWdoIFNjaG9vbCBhbmQgQmV5b25kIFN1cnZleSAoMTk4OCkuIiwKICAgICB4bGFiID0gIlN0YW5kYXJkaXplZCBNYXRoIEFjaGlldmVtZW50IFNjb3JlIikKYGBgCgojIFJ1bm5pbmcgYSBOdWxsIE1vZGVsCmBgYHtyfQptb2RlbC5udWxsIDwtIGxtZXIobWF0aGFjaCB+ICgxfHNjaG9vbGlkKSwgUkVNTCA9IEZBTFNFLCBkYXRhID0gaHNiLmNsZWFuKQpzdW1tYXJ5KG1vZGVsLm51bGwpCgpgYGAKCllvdSBhbHNvIGdldCBBSUMgYW5kIEJJQywgdHdvIHZlcnkgY29tbW9uIGluZm9ybWF0aW9uIGNyaXRlcmlvbiBtZWFzdXJlcyB0aGF0IGFyZSB1c2VkIHRvIGFzc2VzcyBtb2RlbCBmaXQ6CgojIFBvc3QtRXN0aW1hdGlvbiBUYXNrOiBDYWxjdWxhdGUgSUNDCgpUaGVzZSBjb21tYW5kcyBhcmUgcnVuIGFmdGVyIHlvdSBhbiBhbnkga2luZCBvZiBNTE0uIFRoZXkgYXJlIGNhbGxlZCBwb3N0LWVzdGltYXRpb24gY29tbWFuZHMgYmVjYXVzZSB0aGV5IHJlcXVpcmUgdGhhdCB5b3UgaGF2ZSB0aGUgbW9kZWwgc2F2ZWQgaW4gdGhlIG1lbW9yeSAoaXQgaXMgdGhlIGxhc3QgdGhpbmcgeW91IHJhbikuIElmIHlvdSBkbyBzb21ldGhpbmcgZWxzZSBhbmQgdHJ5IHRvIHJ1biB0aGVtLCB5b3Ugd2lsbCBsaWtlbHkgZ2V0IGFuIGVycm9yLgoKSGVyZSdzIGEgcXVpY2sgYW5kIGVhc3kgd2F5IHRvIGdldCBvdXIgZnJpZW5kIHRoZSBJQ0MhCmBgYHtyfQoKbnVsbC5JQ0MgPC0gOC41NTMvKDguNTUzICsgMzkuMTQ4KQpudWxsLklDQwoKYGBgCgojIFVzaW5nIGBsbWVyVGVzdGAgdG8gRXZhbHVhdGUgUmFuZG9tIEVmZmVjdHM6CmBgYHtyfQpyYW5kKG1vZGVsLm51bGwpCmBgYAoKIyBBbHdheXMgSW50ZXJlc3Rpbmc6IENvbXBhcmUgT3VyIE1MTSB0byBhIFJlZ3VsYXIgUmVncmVzc2lvbgpgYGB7cn0KbW9kZWwucmVndWxhciA8LSBsbShtYXRoYWNoIH4gTlVMTCwgZGF0YSA9IGhzYi5jbGVhbikKc3VtbWFyeShtb2RlbC5yZWd1bGFyKQpgYGAK