General Introduction

This manual is a comprehensive introduction to hierarchical linear modeling (HLM) in R. Hierarchical linear models (HLM) allow for a more nuanced analysis of data with nested structures. Crucially, HLM accounts for sources of dependence in data sets. For example, in educational settings, students are nested within schools, and schools represent a source of dependence. It is plausible that school-specific characteristics influence educational outcomes. HLM analyses allow for an examination these group-level characteristics, which cannot be accounted for in traditional ordinary least squares regression. This manual will cover three main aspects of HLM: 1. Data Management; 2. Analysis and Output; 3. Graphing/Visualizing Data.

HLM Data Management

by Sarah Grover

In this section you will learn how to: 1. Load a Stata file into R 2. Add a new variable to a dataset 3. Merge data files 4. Write data to a csv file 5. Change data from long format to wide format 6. Change data from wide format to long format

Download HSB datasets in .dat format CSV file also available at: http://www.ats.ucla.edu/stat/paperexamples/singer/

Save the files into a folder on your computer. Set your working directory to the file that the datasets are saved in For Mac Users: If you are unsure what the appropriate file path is, you can look it up by right clicking on the file and selecting “Get Info” A dialogue box will pop up, the file path is under “Where:”

setwd('~/Desktop/')

Check that your working directory is set to the correct file. Running this command should return the file path that you set in the code above.

getwd()

## [1] "/Users/Sean/Desktop"

LOAD A STATA FILE INTO R

Sometimes people may share data with you that is not saved in a csv or other delimited file. If you have your data already in a csv file you can skip this step.

Install the ‘forgeign’ library and load it. This package allows you to upload data files from SPSS, SAS, and Stata to R.

For complete documentation see: http://cran.r-project.org/web/packages/foreign/foreign.pdf

### You only need to install the package once
### After it is installed, you only need to run the 
### library statement to load the package for each new
### active R session. 

install.packages('foreign', repos = 'http://cran.rstudio.com/')

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library(foreign)

## Warning: package 'foreign' was built under R version 3.1.2

In this example, we read in a Stata file (.dta).
function = read.dta() Description: reads in a Stata version 5-12 binary format into a data frame. Note: this function does not support Stata formats after version 12 Usage: read.dta(file, convert.dates = T, convert.factors = T, missing.type = F, convert.underscore = F, warn.missing.labels = T) Arguments: file = filename or URL as a character string convert.dates = Convert Stata dates to Date class, and date-times to POSIXct class? Default is TRUE (i.e., yes). convert.factors = Use Stata value labels to create factors? (Stata Version 6.0 or later) Default is TRUE. missing.type = For Stata version 8 or later, store information about different types of missing data? Default is FALSE (i.e., no). convert.underscore = Convert “_" in Stata variable names to “.” in R names? Default is FALSE. warn.missing.labels = Warn if a variable is specified with value labels and those value labels are not present in the file? Default is TRUE.

Because we have no missing data or date information, we simply write the file path and leave all of the set default arguments.

### Reads in Level 1 (Student level) data
### 7185 observations of 5 variables

hsb1 <- read.dta('hsb1.dta')

### Reads in Level 2 (School level) data
### 160 observations of 7 variables

hsb2 <- read.dta('hsb2.dta')

CREATING A NEW VARIABLE AND ADDING IT TO A DATASET

We have now loaded the student data and the school data. The High School and Beyond (HSB) data was collected in 1980. This data is part of a longitudinal study of American youth conducted by the National Opinion Research Center on behalf of the National Center for Educaiton Statistics (NCES).

The student data (hsb1) includes the following variables: id: School id number, which school the student attends minority: White student = 0, Minority student (e.g., Black) = 1 female: Male = 0, Female = 1 ses: Composite SES, sum of 5 variables (standardized within grade prior to summing): Father’s occupation, Father’s education, Mother’s education Family income, and additive sum of following scale assessing whether the family had each of the following in their possession: (1) daily newspaper, (2) encyclopedia, (3) typewriter (4) electric dishwasher (5) 2 or more cars or trucks (6) more than 50 books (7) a room of your own (8) pocket calculator mathach: Mathematics test score in senior year of high school Mean = 12.7, SD = 6.8

We have a school id in the student level data, but we also need to add an identifier for each individual student. Because each row in hsb1 is the data for a single student, we simply create a variable ‘p’ that is numbered 1 through 7185, the total number of students in the dataset.

hsb1$p <- c(1:7185)

### This also works: 

hsb1$p <- c(1:nrow(hsb1))

MERGING DATA FILES

The school level data (hsb2) includes the following variables: id: School id number size: School size sector: Type of school, Catholic = 1, Public = 0 pracad: Proportion of students in the academic track disclim: Continuous measure of the disciplinary climate in the school himinty: School enrollment > 40% minority = 1 (n=44), school enrollment <= 40% minority = 0 (n=116) meanses: Average SES for the school, grand mean centered so that 0 = school with average SES level

The next step is to combine these two datasets into one, using school id as the matching key. To do this we will use the merge() function. Description: Merge two data frames by common columns or row names, or do other versions of database “join” operations. Usage: merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = F, all.x = all, all.y = all, sort = T, suffixes = c(“.x”,“.y”), incomparables = NULL, …) Arguments: x, y: data frames to be merged into one by, by.x, by.y: specifications of columns to be used for merging, these are your id/matching key variables all.x: logical, should extra rows be added to the output for each row in x that has no matching row in y? Default is FALSE. all.y: logical, same as all.x but for the y file. all: logical, should extra rows be added to the output for each row in x and y that does not have a corresponding row in the other file? Default is FALSE. sort: logical, should the result be sorted on the ‘by’ columns? Default is TRUE. suffixes: a character vector of length 2 specifying the suffixes to be used for making unique names for variables that appear in both files but are not specified in the ‘by’ statement incomparables: values which cannot be matched. See ?match. This is intended to be used for merging on one column, so
these are incomparable values of that column.

### Merge hsb2 with hsb1 by school id and assign to object called hsb
### 7185 observations of 12 variables

hsb <- merge(hsb2, hsb1, by = c('id'))

WRITE DATA TO A CSV FILE

This data set is already in long format, so each school has a row for each student. This is the format that we would want our final dataset to be in to use it to conduct Multilevel Models (MLMs). You might want to export this dataset as a csv to share it with others. To export the dataset we use the function write.table() Description: prints the argument x, after converting it to a data frame, to a file Usage: write.table(x, file = “”, append = F, quote = T, sep = " “, eol =”“, na =”NA“, dec =”.“, row.names = T, col.names = T, qmethod = c(”escape“,”double“), fileEncoding =”“) Arguments: x: object to be written, preferably a matrix or data frame. If not, the function will attempt to coerce it to a data frame. file: character string with desired file name append: logical, Append output to the file? Only relevant if file is a character string. If TRUE, the output is appended to the file, if FALSE, any existing file of the same name is destroyed (i.e., writes over the existing file). Default is FALSE. quote: logical or numeric. If TRUE, any character or factor columns will be surrounded by double quotes. If a numeric vector, the numbers index the columns to be put in quotes. In both cases, the row and coumn names are quoted if written. If FALSE, nothing is quoted. Default is TRUE. sep: the field separator string. Values within each row of x are separated by this string. For csv sep = ‘,’. Default is ‘’ eol: the character(s) to print at the end of each line. For example, eol = ‘’ will produce Windows’ line endings on a Unix-alike OS, and eol = ‘’ will produce files as expected by Mac Excel (2004). Default is ‘’ na: the string to use for missing values in the data. For example na = ‘NA’, na = ‘.’, or na = ‘-999’. Default is ‘NA’. dec: the string to use for decimal points in numeric or complex columns: must be a single character. Default is ‘.’ row.names = logical or character vector of row names to be written. Default = TRUE. col.names = logical indicating whether the column names of x are to be written along with x, or a character vector of the column names to be written. This is useful if you want to rename all of the variable names in the dataset. Default is TRUE. qmethod = a character string specifying how to deal with embedded double quote characters when quoting strings. Must be ‘escape’ in which case the quote character is escaped in C style by a backslash, or ‘double’ in which case it is doubled. You can just specify the initial letter ‘e’ or ‘d’. Default is ‘escape’ fileEncoding: character string, if non-empty declares the encoding to be used on a file so the character data can be re-encoded as they are written.

### File writes to file specified in working directory
### We do row.names = F, because we don't want an additional vector
### of row numbers saved to our file. 

write.table(hsb,'HSBFinal.csv', sep=',', row.names=F)

There is also a function made specifically for writing csv files write.csv() Descritpion: This is another way of writing a csv file. The arguments are the same as the arguments for write.table, with a few exceptions: Usage: write.csv(…) …: arguments to write.table: append, col.names sep, dec, and qmethod cannot be altered. qmethod: default is ‘double’ If col.names = NA and row.names = T, a blank column is added, which is the convention used for CSV files to be read by spreadsheets.

### Alternative way to write csv file
write.csv(hsb,'HSBFinal2.csv',row.names = F)

For the next two sections you need to install and load the ‘reshape2’ package. Complete reference manual here: http://cran.r-project.org/web/packages/reshape/reshape.pdf

install.packages('reshape2', repos = 'http://cran.rstudio.com/')

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library(reshape2)

## Warning: package 'reshape2' was built under R version 3.1.2

CONVERTING DATA FROM LONG FORMAT TO WIDE FORMAT

Sometimes you may need to convert a file back from wide format into long format. While R analyzes repeated measures and nested data (e.g., students nested within schools) in long format, SPSS uses wide format to analyze repeated measures and nested data.

We want data with one row per school, with a column for each students’ math achievement, SES, gender, and minority status. In order to do this, first we need to create a student level id variable that numbers students within each school, from the first student to the ith student in that school.

### Order the data by school id
### Creates a vector of school id values in ascending order

by.schoolid <- order(hsb$id)
hsb <- hsb[by.schoolid,]

### For Loop that Counts the number of students in a school and assigns
### a corresponding id number

## list of all unique school ids
school.list <- unique(hsb$id)

## empty vector to store unique participant ids in
plist <- vector()

## 1st for loop iterates through school list 
## 2nd for loop iterates through rows within each school and 
## counts from 1 to the number of rows in each school and then appends
## these numbers to plist before moving to the next school 
for(s in school.list) {
  for(i in 1:nrow(hsb[hsb$id==s,])){
    plist <- append(plist,i) 
  }
}

### creates a new variable in hsb that is the vector of student ids
## generated in the for loop above
hsb$participant <- plist

In the next step we use the function to change the data from long format to wide format. We want one row for each school, we keep the school level variables in the same format, but create a new column for each students’ math achievement score, gender, minority status, and ses.

We use the dcast function Description: reshapes existing object into data frame Usage: dcast(data, formula, fun.aggregate = NULL, …, margins = NULL, subset = NULL, fill = NULL, drop = T, value.var = guess_value(data)) Arguments: data: your data frame formula: casting formula; The cast formula has the following format: variable you don’t want to split by + other variable you don’t want to split by (in our example all of the school level vars) ~ variable you want to split by (in our example, student) fun.aggregate: aggregation function needed if variables do not identify a single observation for each output cell. Defaults to length
(with a message) if needed but not specified margins: vector of variable names (can include “grand_col” and “grand_row) to compute margins for, or TRUE to compute all margins. Any variables that can not be margined over will be silently dropped subset: quoted expression used to subset data prior to reshaping, e.g. subset = .(variable==”length“). fill: value with which to fill in structural missings, defaults to value from applying fun.aggregate to 0 length vector drop: should missing combinations dropped or kept? Default is TRUE. value.var: value to be placed under the new split variable columns

### Data frame with school level variables in each column, plus a column 
### for each students' math achievement score

hsb.wide.math <- dcast(hsb, id + size + sector + pracad + disclim + 
                         himinty +
                    meanses ~ participant, value.var='mathach')

### Creates a character vector of student# + .mathach
v <- c(1:67)
math <- paste(v,'.mathach',sep='')

### Replaces column names in hsb.wide.math

names(hsb.wide.math) <- c('id','size', 'sector', 'pracad', 
                          'disclim', 'himinty', 'meanses', math)

### Repeats reshape above but this time there is only a column for school id
### and a column for each students' minority status, will merge this and 
### other data files to create the final wide format data frame

hsb.wide.minority <- dcast(hsb, id ~ participant, value.var='minority')
minority <- paste(v,'.minority',sep='')
names(hsb.wide.minority) <- c('id',minority)

### Data frame with school id and each students' gender in a column
### to be merged with other data frames

hsb.wide.female <- dcast(hsb, id ~ participant, value.var='female')
female <- paste(v,'.female',sep='')
names(hsb.wide.female) <- c('id',female)

### Data frame with school id and each students' ses in a column
### to be merged with other data frames

hsb.wide.ses <- dcast(hsb, id ~ participant, value.var='ses')
ses <- paste(v,'.ses',sep='')
names(hsb.wide.ses) <- c('id',ses)

### Merging data files one by one to create final wide format 
### data frame

hsb.wide1 <- merge(hsb.wide.math, hsb.wide.minority, by=c('id'))

hsb.wide2 <- merge(hsb.wide1, hsb.wide.female, by=c('id'))

hsb.wide.final <- merge(hsb.wide2, hsb.wide.ses, by=c('id'))

### Writes  data frame to csv file in your working directory
### with NAs = '.' 

write.table(hsb.wide.final, 'hsb.wide.csv', sep = ',', row.names = F,
            na = '.')

CHANGING DATA FROM WIDE FORMAT TO LONG FORMAT

The next step is to change our data from wide format back to the original long format. To do this we use the melt function from the reshape2 package The melt function has several uses, below are details for it’s use with data frames. Usage: melt(data, id.vars, measure.vars, variable.name = “variable”, …, na.rm = F, value.name = “value”, factorsAsStrings = T) Arguments: data: the data frame you wish to alter (hsb.wide.final in our example). id.vars: the variables you wish to stay the same (in our example these are the school level variables). Can be input as a numeric vector indexing the columns or as a character vector of column names. measure.vars: the variables that you wish to collapse into one to long
format column. Can be a numeric vector indexing columns or a character vector of column names. variable.name: column name for measured variables (in our example this is student.id) Default label is measure. …: further arguments passed to or from other methods na.rm: Should NA values be removed from the data set? This will convert explicit missings to implicit missings. Default is FALSE. value.name: the name of the variable of values (in our example this is mathach, minority, female, and ses). Default label is value. factorsAsStrings: Control whether factors are converted to character when melted as measure variables. When FALSE, coercion is
forced if levels are not identical across the measure.vars

This example is pretty unusual in that we want to have a row for each student with a column for their math achievement score, their gender, minority status, and ses, plus their school’s variables. Most examples online only show one variable being transformed. Here we transform each of the four student level variables separately. I tried to write a function to automate this process since it is repetative, but I couldn’t get one to work. You can get the desired result by repeating use of the melt function for each new variable you want to create and then merge each of the resulting data frames together to get the data frame in the desired format.

### Get all column names and their index number

names(hsb.wide.final)

##   [1] "id"          "size"        "sector"      "pracad"      "disclim"    
##   [6] "himinty"     "meanses"     "1.mathach"   "2.mathach"   "3.mathach"  
##  [11] "4.mathach"   "5.mathach"   "6.mathach"   "7.mathach"   "8.mathach"  
##  [16] "9.mathach"   "10.mathach"  "11.mathach"  "12.mathach"  "13.mathach" 
##  [21] "14.mathach"  "15.mathach"  "16.mathach"  "17.mathach"  "18.mathach" 
##  [26] "19.mathach"  "20.mathach"  "21.mathach"  "22.mathach"  "23.mathach" 
##  [31] "24.mathach"  "25.mathach"  "26.mathach"  "27.mathach"  "28.mathach" 
##  [36] "29.mathach"  "30.mathach"  "31.mathach"  "32.mathach"  "33.mathach" 
##  [41] "34.mathach"  "35.mathach"  "36.mathach"  "37.mathach"  "38.mathach" 
##  [46] "39.mathach"  "40.mathach"  "41.mathach"  "42.mathach"  "43.mathach" 
##  [51] "44.mathach"  "45.mathach"  "46.mathach"  "47.mathach"  "48.mathach" 
##  [56] "49.mathach"  "50.mathach"  "51.mathach"  "52.mathach"  "53.mathach" 
##  [61] "54.mathach"  "55.mathach"  "56.mathach"  "57.mathach"  "58.mathach" 
##  [66] "59.mathach"  "60.mathach"  "61.mathach"  "62.mathach"  "63.mathach" 
##  [71] "64.mathach"  "65.mathach"  "66.mathach"  "67.mathach"  "1.minority" 
##  [76] "2.minority"  "3.minority"  "4.minority"  "5.minority"  "6.minority" 
##  [81] "7.minority"  "8.minority"  "9.minority"  "10.minority" "11.minority"
##  [86] "12.minority" "13.minority" "14.minority" "15.minority" "16.minority"
##  [91] "17.minority" "18.minority" "19.minority" "20.minority" "21.minority"
##  [96] "22.minority" "23.minority" "24.minority" "25.minority" "26.minority"
## [101] "27.minority" "28.minority" "29.minority" "30.minority" "31.minority"
## [106] "32.minority" "33.minority" "34.minority" "35.minority" "36.minority"
## [111] "37.minority" "38.minority" "39.minority" "40.minority" "41.minority"
## [116] "42.minority" "43.minority" "44.minority" "45.minority" "46.minority"
## [121] "47.minority" "48.minority" "49.minority" "50.minority" "51.minority"
## [126] "52.minority" "53.minority" "54.minority" "55.minority" "56.minority"
## [131] "57.minority" "58.minority" "59.minority" "60.minority" "61.minority"
## [136] "62.minority" "63.minority" "64.minority" "65.minority" "66.minority"
## [141] "67.minority" "1.female"    "2.female"    "3.female"    "4.female"   
## [146] "5.female"    "6.female"    "7.female"    "8.female"    "9.female"   
## [151] "10.female"   "11.female"   "12.female"   "13.female"   "14.female"  
## [156] "15.female"   "16.female"   "17.female"   "18.female"   "19.female"  
## [161] "20.female"   "21.female"   "22.female"   "23.female"   "24.female"  
## [166] "25.female"   "26.female"   "27.female"   "28.female"   "29.female"  
## [171] "30.female"   "31.female"   "32.female"   "33.female"   "34.female"  
## [176] "35.female"   "36.female"   "37.female"   "38.female"   "39.female"  
## [181] "40.female"   "41.female"   "42.female"   "43.female"   "44.female"  
## [186] "45.female"   "46.female"   "47.female"   "48.female"   "49.female"  
## [191] "50.female"   "51.female"   "52.female"   "53.female"   "54.female"  
## [196] "55.female"   "56.female"   "57.female"   "58.female"   "59.female"  
## [201] "60.female"   "61.female"   "62.female"   "63.female"   "64.female"  
## [206] "65.female"   "66.female"   "67.female"   "1.ses"       "2.ses"      
## [211] "3.ses"       "4.ses"       "5.ses"       "6.ses"       "7.ses"      
## [216] "8.ses"       "9.ses"       "10.ses"      "11.ses"      "12.ses"     
## [221] "13.ses"      "14.ses"      "15.ses"      "16.ses"      "17.ses"     
## [226] "18.ses"      "19.ses"      "20.ses"      "21.ses"      "22.ses"     
## [231] "23.ses"      "24.ses"      "25.ses"      "26.ses"      "27.ses"     
## [236] "28.ses"      "29.ses"      "30.ses"      "31.ses"      "32.ses"     
## [241] "33.ses"      "34.ses"      "35.ses"      "36.ses"      "37.ses"     
## [246] "38.ses"      "39.ses"      "40.ses"      "41.ses"      "42.ses"     
## [251] "43.ses"      "44.ses"      "45.ses"      "46.ses"      "47.ses"     
## [256] "48.ses"      "49.ses"      "50.ses"      "51.ses"      "52.ses"     
## [261] "53.ses"      "54.ses"      "55.ses"      "56.ses"      "57.ses"     
## [266] "58.ses"      "59.ses"      "60.ses"      "61.ses"      "62.ses"     
## [271] "63.ses"      "64.ses"      "65.ses"      "66.ses"      "67.ses"

### First we subset the data so it only includes the school level variables
## and the math achievement scores

hsb.sub1 <- hsb.wide.final[,c(1:74)] 
hsb.long.math <- melt(hsb.sub1, id.vars=c(1:7), 
                      measure.vars=c(8:74),
                      variable.name='student.id',
                      value.name='mathach',
                      na.rm=T)

## Sort by school id and student.id

sorted <- order(hsb.long.math$id, hsb.long.math$student.id)
hsb.long.math <- hsb.long.math[sorted,]

### Create a subset that only includes schoolid and minority status

hsb.sub2 <- hsb.wide.final[,c(1,75:141)]

### Change minority data frame to long format. Column for school id, 
### student id, and minority status

hsb.long.min <- melt(hsb.sub2, id.vars=c(1), 
                      measure.vars=c(2:68),
                      variable.name='student.id',
                      value.name='minority',
                      na.rm=T)

## Sort by school id and student.id

sorted2 <- order(hsb.long.min$id, hsb.long.min$student.id)
hsb.long.min <- hsb.long.min[sorted2,]

### Create a subset that only includes schoolid and gender

hsb.sub3 <- hsb.wide.final[,c(1,142:208)]

### Change female data frame to long format. Column for school id, 
### student id, and gender

hsb.long.fem <- melt(hsb.sub3, id.vars=c(1), 
                      measure.vars=c(2:68),
                      variable.name='student.id',
                      value.name='female',
                      na.rm=T)

## Sort by school id and student.id

sorted3 <- order(hsb.long.fem$id, hsb.long.fem$student.id)
hsb.long.fem <- hsb.long.fem[sorted3,]

### Create a subset that only includes schoolid and student ses

hsb.sub4 <- hsb.wide.final[,c(1,209:275)]

### Change ses data frame to long format. Column for school id, 
### student id, and ses

hsb.long.ses <- melt(hsb.sub4, id.vars=c(1), 
                      measure.vars=c(2:68),
                      variable.name='student.id',
                      value.name='ses',
                      na.rm=T)

## Sort by school id and student.id

sorted4 <- order(hsb.long.ses$id, hsb.long.ses$student.id)
hsb.long.ses <- hsb.long.ses[sorted4,]

### Same basic for loop as use previously to provide each student within
### each school with a unique identifier for them in that school. 
  
ilist <- vector()
  for(g in unique(hsb.long.math$id)) {
  for(i in 1:nrow(hsb.long.math[hsb.long.math$id==g,])){
    ilist <- append(ilist,i) 
  }
}

### Replace student id variable with unique student idenifier, which is 
### currently old variable name e.g., mathach.1, mathach.2, etc. 
### We can use the same list for each data frame because they have all been
### sorted in the same order. 

hsb.long.math$student.id <- ilist
hsb.long.min$student.id <- ilist
hsb.long.fem$student.id <- ilist
hsb.long.ses$student.id <- ilist

### Merging data frames one at a time, by school id and then student id. 
### Important to list school id first because students are nested within
### schools and so their identiiers are only unique within their school. 

hsb.long1 <- merge(hsb.long.math, hsb.long.min, by=c('id','student.id'))

hsb.long2 <- merge(hsb.long1, hsb.long.fem, by=c('id', 'student.id'))

hsb.long.final <- merge(hsb.long2, hsb.long.ses, by=c('id', 'student.id'))

### You may want to share the dataframe with others, for the final step
### we write the long format file as a csv

write.table(hsb.long.final,'hsb.long.csv', sep=',', row.names=F)

                      Additional References

Changing Data from Wide to Long Format and Back http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/

HLM Models and Analyses

by Steff Guillermo

INTRODUCTION

In this section, we will go over how to write the code for HLM models and run them in R. This section will focus on the proper syntax for estimating fixed and random effects and how to interpret the output in R. This section will build fairly simple and slightly more complex models, as the primary focus of this section is on proper model building and interpretation rather than an exhaustive review of highly complex HLM models. This section will also include a short review of different library options for HLM analyses.

LME4

The majority of this section will focus on HLM models using the lme4 library. The lme4 library will allow you to analyze data using restricted maximum likelihood estimation (REML) rather than ordinary least squares (OLS). This makes the lme4 library ideal for HLM analyses. Thus, in order to run HLM models, it is necessary to download the lme4 library. To install it, run the following code:

library(lme4)

## Warning: package 'lme4' was built under R version 3.1.2

## Loading required package: Matrix
## Loading required package: Rcpp

There is one major drawback to the lme4 library. When running HLM models in this package, the analyses do not give p values in the output. However, this can be remedied by installing the lmerTest library, which will give p values for your analyses. Thus, it is necessary to install the lmerTest library. To install, run the following code:

install.packages('lmerTest',repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library(lmerTest)

## Warning: package 'lmerTest' was built under R version 3.1.3

## 
## Attaching package: 'lmerTest'
## 
## The following object is masked from 'package:lme4':
## 
##     lmer
## 
## The following object is masked from 'package:stats':
## 
##     step

LOGISTICAL ISSUES

This section of the code sets the working directory.

setwd("~/Desktop/")    
getwd()

## [1] "/Users/Sean/Desktop"

This section reads in the level 1 and level 2 data files. For a comprehensive review on data management, please see the first section of this manual.

library(foreign)
for (f in Sys.glob('*.dta')) 
  write.csv(read.dta(f), file = gsub('dta$', 'csv', f))

HSBdata1 <- read.table("hsb1.csv",header=T,sep=',',)
HSBdata2 <- read.table("hsb2.csv",header=T,sep=',',)

Merge the Level 1 and Level 2 Files. Again, for a comprehensive review on data management, please see the first section of this manual.

HSBfinal <- merge(HSBdata1, HSBdata2, by = c('id'))


HSBfinal$pnum <- HSBfinal$X.x

OBTAINING GROUP MEANS

Group mean centering variables is an important step in HLM. By group mean centering predictors, you receive an unadjusted estimate of your outcome variable. Group mean centering a predictor will give you an estimate of the outcome variable at the average level of that predictor for each school. For instance, group-mean centering SES will give you an estimate of the expected outcome for a student with the school’s average SES level.

The dataset we will be working with is the High School and Beyond Dataset (National Center for Educational Statistics).

Here is the code for group mean centering SES:

groupmeanSES <- aggregate(HSBfinal$ses, list(HSBfinal$id), FUN = mean, data=HSBfinal)
names(groupmeanSES)<- c('id','groupmeanSES')

The aggregate function allows you to aggregate the predictor of interest (in the above case, SES) by group. Since the subsequent analyses focus on school as the unit of analysis, we will aggregate across ‘id’, which represents the school ID in the dataset. When group mean centering variables in your data, you should aggregate across whatever group or variable you want to be the unit of analysis. The names function gives the new group mean centered variable the new name “groupmeanSES” in the data file, again aggregating by school id.

We will repeat the above process for various other predictors, such as gender and size of the school.

groupmeanfemale <- aggregate(HSBfinal$female, list(HSBfinal$id), FUN = mean, data=HSBfinal)
names(groupmeanfemale)<- c('id','groupmeanfemale')

groupmeanminority <- aggregate(HSBfinal$minority, list(HSBfinal$id), FUN = mean, data=HSBfinal)
names(groupmeanminority)<- c('id','groupmeanminority')

groupmeansize <- aggregate(HSBfinal$size, list(HSBfinal$id), FUN = mean, data=HSBfinal)
names(groupmeansize)<- c('id','groupmeansize')

groupmeansector <- aggregate(HSBfinal$sector, list(HSBfinal$id), FUN = mean, data=HSBfinal)
names(groupmeansector)<- c('id','groupmeansector')

Merge Grouped Mean Data with the overall dataset in order to use group mean centered variables in subsequent analyses.

HSBfinal2 <- merge(HSBfinal, groupmeanSES, by = c('id'))

NOTE ABOUT THE GROUPING VARIABLE

In this code, we are merging data again by the unit of analysis, school id. This can be seen in the section of the code, “by = c(‘id’)”. It is important to note that thus far, school id has been used in the aggregate function above and in the merge function below. This is because school id is the unit of analysis. Thus, in other nested structure datasets, ‘id’ in the above and below code should be replaced with whatever your grouping variable is (e.g., city ID if your data of interest are nested in city) in order to keep the unit of analysis consistent in group mean centering and data merging steps.

HSBfinal3 <- merge(HSBfinal2, groupmeanminority, by = c('id'))

HSBfinal4 <- merge(HSBfinal3, groupmeansize, by = c('id'))

HSBfinal5 <- merge(HSBfinal4, groupmeanfemale, by = c('id'))

HSBfinal6 <- merge(HSBfinal5, groupmeansector, by = c('id'))

MODEL SYNTAX/CODE

This following section will involve how to write R code for HLM analyses and how to interpret the output.

A SIMPLE MODEL

First, we will begin with an unconditional model, which will provide a general basis for more sophisticated models to follow. This unconditional model asks the question, “Is there variability in math achievement across schools?”

simplemodel <- lmer (mathach ~ 1 + 1|id , data=HSBfinal6)

In this model, the code for the model reads, “mathach ~ 1 + 1|id.” The “1|id” section of the model is specified in order to only estimate a random intercept. This intercept represents the average math achievement score across schools. The variance component, 8.614, tells us how much variance there is between schools in math achievement.

The residual section reads 39.148, and represents the variability that is left unexplained. To partition the variance, we take the variance component of the intercept and divide it by the sum of the residual variance and the intercept variance. This leaves us with: 8.614 / (8.614 + 39.148) = 0.18. This means that 18 percent of the variance in math achievement is due to school-level factors. The residual row under the Random Effects section tells us how much variability in math scores is left unexplained. We can use the group mean centered predictors (outlined above) to assess which ones influence math achievement and help explain that left over variability.

MORE SOPHISTICATED MODELS

Now, let’s build more complicated models. The following model examines whether a school’s average SES level influences math achievement.

ses.model <- lmer (mathach ~ 1 + groupmeanSES +(1|id), data=HSBfinal6)

Notice that the variance associated with the intercept is now 2.639. In the unconditional model, this value was 8.614, which means that our new model including the school’s mean SES accounts for some of the variance in math achievement scores.

Now, let’s examine another school-level factor: the proportion of minority students at a school. We will run this model and then compare this to our initial, unconditional model. Essentially, we are examining whether or not there is a minority gap in math achievement.

Note: The minority variable is coded as 0 = White, 1 = Black or Hispanic.

simplemodel.minority <- lmer (mathach ~ groupmeanminority + (1|id) , data=HSBfinal6)

The above output tells us that there is a significant minority gap across all schools, with minority students scoring, on average, 5.06 points lower than White students. Note that the variance component associated with the intercept is now 6.304, compared to 8.614 in our unconditional model. While this informs us that including minority in the model accounts for some of the variance, the variance component of the intercept in this model is still greater than the variance component in our SES model.

Notice that, in the model above, we receive an estimate for the slope of the minority variable in the Fixed Effects section, but we do not receive any information about the variance of this variable in the Random Effects section. What if we want to know whether the minority gap varies between schools? That is, does the slope for minority predicting math achievement vary from school to school? For that, we want to allow the slopes to vary across the different schools. In our model, we would want to estimate a random intercept and slope for the minority predictor, which will give us an estimate of the variance component associated with the minority predictor (the minority slope). For that test, we will want to run the following model:

minority.model.2 <- lmer (mathach ~ groupmeanminority + (groupmeanminority|id) , data=HSBfinal6)

The difference in the syntax between the two models lies in the last part of the code, where we specify either “(1|id)” or “(groupmeanminority|id).” This allows the slope for the groupmeanminority variable to vary across schools. Later on in this section, we will go through a comprehensive review of the differences between these two statements and their influence on the estimation of fixed versus random effects. After running the model above, we can see that there is now an estimate of the variance associated with the groupmeanminority slope. That variance component is 0.884, and conceptually, it tells us how much the effect of the minority gap on math achievement is due to schools. Again, we will review the fixed versus random estimations later on in this section. For now, let’s look at more complex models.

MULTIPLE PREDICTORS

Given the significant importance of the minority gap and the influence of SES thus far, let’s examine these effects simultaneously. In this model, we are predicting math achievement from the minority and SES variables, estimating a random intercept, and fixing the slope for both of the predictor variables (“1|id”).

ses.minority.model <- lmer (mathach ~ groupmeanminority + groupmeanSES + (1|id), data=HSBfinal6)

The output from the model above tells us that there are still significant effects of SES and a minority gap on math achievement. In this model, minority students score, on average, 1.54 points lower than their White counterparts and a one unit difference in school SES is associated with a 5.32 point change in expected math achievement (with higher SES schools having higher expected math achievement). The variance component associated with expected math achievement (the intercept) is 2.483. Adding both SES and minority variables reduced the variance component by 6.131, compared to our unconditional model.

AN INTERACTION MODEL

We will now run a model that looks at SES, the minority gap, and their interactive effects (again, estimating a random intercept, and fixing the slopes). This model will focus particularly on the proper syntax for including interactions in HLM models.

ses.minority.model.int <- lmer (mathach ~ groupmeanminority + groupmeanSES + groupmeanminority*groupmeanSES + (1|id), data=HSBfinal6)

By adding “groupmeanminoritygroupmeanSES" to the model, you have now included a term for estimating the interactive effects. Thus, in writing your model, the proper format to include interaction terms should be: “name of first predictor” ”name of second predictor." We can see that the interaction between groupmeanSES and groupmeanminority is not significant. Note that by including “(1|id)” at the end, we are telling the model to only estimate random effects for the intercept. The next section will focus on specifying random effects for the slopes of model predictors.

NOTE ON SYNTAX AND MODEL BUILDING: FIXED AND RANDOM EFFECTS

Now, the syntax for the models can be a bit tricky with fixed and random effects. Let’s go over three different models, the syntax, and exactly what these models are estimating. We will use the school-level predictor of SES.

In most of our models thus far, we have specified “(1|id)” at the end. This section of our code is the section where we specify the random effects in our model. The “(1|id)” statement indicates that we want a random estimate of the intercept “(1” analyzed at the level of the group “|id)”.

Let’s look at an example and examine a model that just estimates a random intercept. In this model, the code that reads, “~ 1 + (1|id)” indicates that we are asking the model to just estimate a random intercept. There are no predictors included in this model.

ses.model.1 <- lmer(mathach ~ 1 + (1|id), data=HSBfinal6)

As expected, we are given an intercept in the output under “Random Effects.”" But, what if we want to estimate an intercept and a best fit regression slope for our predictor of interest, SES? For that, we would run the following model:

ses.model.2 <- lmer(mathach ~ groupmeanSES + (1|id), data=HSBfinal6)

Notice how now, instead of a 1 following the tilda, we have the variable, “groupmeanSES.” This tells the model to estimate a slope for the effect of groupmeanSES on our outcome variable, math achievement. However, also note that the “(1|id)” does not allow the slope for SES to vary across schools–that is, we will estimate a random intercept but a fixed slope (best fit slope) for the effect of SES on math achievement across all schools.

Now, let’s see what the output looks like.

ses.model.2 <- lmer(mathach ~ groupmeanSES + (1|id), data=HSBfinal6)

As expected, we get an estimate of the intercept and a single estimate of the slope of SES, which is 5.86. Again, we also see the variance component associated with the intercept in the “Random Effects” section. But, what if we wanted the slopes for SES to vary across schools? That is, we want both the intercept and the slopes for SES predicting math achievement to vary (to be estimated as random effects). For that, we will run the following model:

ses.model.3 <- lmer(mathach ~  groupmeanSES + (groupmeanSES|id), data=HSBfinal6)

Notice how we still have “mathach ~ groupmeanSES” in our code. However, instead of “(1|id)”, we now have “(groupmeanSES|id)”. Remember, this section of the code is where the specify our random effects. Thus, we have specified both “groupmeanSES” and “id” to be random effects in our model. This will enable the slopes for SES predicting math achievement to vary from school to school. Let’s run the model.

ses.model.3 <- lmer(mathach ~  groupmeanSES + (groupmeanSES|id), data=HSBfinal6)

Notice now that we get a variance component in the random effects section for groupmeanSES. This tells us how much of the effect of SES on math achievement (the slope of SES predicting math achievement) varies due to school.

To review, here is a brief overview of the three different models, their descriptions, and important differences in the output:

ses.model.1 <- lmer(mathach ~ 1 + (1|id), data=HSBfinal6) : In this model, we are just estimating a random intercept. Here is the critical piece of the output to reflect that: summary(ses.model.1)

ses.model.2 <- lmer(mathach ~ groupmeanSES + (1|id), data=HSBfinal6) : In this model, we are estimating a random intercept and also a fixed slope for the effect of SES on math achievement. In this model, we are not allowing the slope to vary across schools, but rather, fixing a best fit slope for all schools. This is reflect in the “groupmeanSES” variable’s estimate under “Fixed Effects” and not “Random Effects”: summary(ses.model.2)

ses.model.3 <- lmer(mathach ~ groupmeanSES + (groupmeanSES|id), data=HSBfinal6) : In this model, we are still estimating the effect of SES on math achievement, but now, we are allowing the slope to vary between schools. We are also still estimating a random intercept. Notice now that, in contrast to the model above, the “groupmeanSES” variable is both a fixed and a random effect. In the “Random Effects” section, we now get an estimate of the variance component associated with the “groupmeanSES” variable, which tells us about the variability in the effect of SES on math achievement due to school. In the “Random Effects” section we also see a “Corr” section, which tells us whether the random effects of math achievement due to SES are related to the random effects of the slopes of the SES predictor: summary(ses.model.3)

Just note that, in all of the above models, it is necessary to specify your random effects. You can certainly specify more than one random effect, in which case you would simply include another term for your second random effect. Here is a template:

random.effect.model <- lmer(“outcome” ~ “predictor 1” + “predictor 2” + (“random effect 1” | “grouping unit”) + (“random effect 2” | “grouping unit”), data=yourdata)

OTHER PACKAGES

The present manual focused on the lme4 package and its applications to HLM. However, this is not to say that lme4 is the only package capable of conducting HLM analyses. The nlme library is also capable of running the models covered in this section. To install this library, run the following code:

library(nlme)

## 
## Attaching package: 'nlme'
## 
## The following object is masked from 'package:lme4':
## 
##     lmList

We will run our SES model from above using the nlme package. Recall that, above, our SES model looked like this:

ses.model <- lmer (mathach ~ 1 + groupmeanSES +(1|id), data=HSBfinal6)

The above model will undergo a few syntax changes to run under the nlme package. Here is the model:

attach(HSBfinal6)

## The following objects are masked _by_ .GlobalEnv:
## 
##     female, groupmeanfemale, groupmeanminority, groupmeansector,
##     groupmeanSES, groupmeansize, minority, ses

ses.model.new <- lme(mathach ~ 1 + groupmeanSES, data=HSBfinal6, random=~1|id)

Notice that the first part of the model, “mathach ~ 1 + groupmeanSES” remains the same. Also, the specifying the dataset, “data=HSBfinal6” is also the same. However, in the nlme model, you’ll notice that specifying random effects is a bit different. In nlme, there is a necessary “random” argument, where you need to write “random=”. Conceptually similar to lme4, this argument requires the user to specify the random effects in the model. This argument begins with “~” and specification of random effects follows. The “1|id” indicates that we are estimating an intercept and treating “id” as a random factor (again, remember that this is similar to the syntax of lme4).

The output will also look slightly different than that of lme4. The estimates (intercept, slope for the SES variable) remain the same, but nlme gives you AIC (Akaike Information Criteria), BIC (Bayesian Informaiton Criteria), and logLik (log likelihood) estimates, which are used for model fit. Also, a separate package is not required to calculate p values, as the nlme models will automatically compute these and display them in the output. This section is not intended to give a comprehensive review of the nlme library, but rather, to let the reader know that this library is also available for running HLM analyses. The above model gives the reader a general idea of the syntax for running lme models. When running this model, the output gives the same estimates as the lme4 models, however, there are slight differences in the content of the output. Most notably, the lme model automatically gives p values (without having to load other packages or libraries) and gives AIC, BIC, and logLik estimates. Another noticeable difference are the degrees of freedom. Lme4 will compute degrees of freedom using a Satterthwaite estimate. The estimation of degrees of freedom in nlme is a bit trickier. For a comprehensive review on the calculation of degrees of freedom in nlme, please refer to Pinheiro and Bates (2000), page 91.

CONCLUSION

This section gave an overview of the various different types of HLM models we can run using the lme4 library, the syntax for fixed and random effects, and how to interpret the output. The next section of the manual will cover various graphing techniques and creating different graphs for your HLM data.

Visualizing Multilevel Data

by Sean Hudson

This is a how-to manual for how to visualize multilevel data. The emphasis will be on graphing fixed and random effects from the HSB data set, a widely used dataset for multilevel didactic purposes. I will cover base R plot methods of visualization, as well as higher level methods through the “lattice” package.

Get appropriate packages:

install.packages('knitr',repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library('knitr')
install.packages('lattice',repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library('lattice')
install.packages('ggplot2',repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library('ggplot2')
install.packages('lme4',repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library('lme4')
install.packages('sjPlot',repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/nx/v3ny_c315513b4tqg0m0lryw0000gn/T//RtmpwgeYoo/downloaded_packages

library('sjPlot')

Variance Breakdown

Pie Chart

Let’s visualize the model in which we regress Math Achievement on Group Mean centered SES and the intercept, allowing both to vary randomly by school. Let’s look at how to do this using a “lower” level method, just looking at a breakdown of the error using a pie chart:

In this first chunk of code, I’m simply obtaining the variance components for the various levels of the model (i.e., the slope and intercept variation at level 2 and the remaining level 1 variance in math achievement) and adding them together to give us the total residual, which I will use to breakdown the variance in the pie chart.

##print out the variance breakdown
model <- lmer (mathach ~ groupmeanSES +(ses|id), data=HSBfinal2)
summary(model)

## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [merModLmerTest]
## Formula: mathach ~ groupmeanSES + (ses | id)
##    Data: HSBfinal2
## 
## REML criterion at convergence: 46742.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.04932 -0.72567  0.01349  0.75209  2.85108 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept)  2.935   1.713         
##           ses          4.915   2.217    -0.09
##  Residual             36.787   6.065         
## Number of obs: 7185, groups:  id, 160
## 
## Fixed effects:
##              Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)   12.7874     0.1584 151.8600   80.70   <2e-16 ***
## groupmeanSES   4.6762     0.4027 179.2100   11.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## groupmenSES -0.018

totalVar <- 2.94+4.92+36.79

Now it’s time for the pie chart, which is pretty straightforward.

pie(c(2.94/totalVar,4.92/totalVar,36.79/totalVar), labels = c('6.6%','11%', '82.4%' ),
    main = 'Breakdown of Variance', col = c('red','pink','lightgreen'))
legend(-1.75,1,legend =c('Level 2: Intercept','Level 2: Slope','Level 1'),
       col = c('red','pink','lightgreen'),
       pch = 22, pt.bg = c('red','pink','lightgreen'),
       cex = .75)

Visualizing Random and Fixed Effects

Perhaps the most useful way to visualize this multilevel model is to plot the fixed effect as well as the variation around the fixed effect for every school. In the chunk below, I’m grabbing the fixed effect intercept and slope and putting that information in an object, “fix”. Then I’m grabbing the random slopes and intercepts for each school, and putting them in an object named, “rand”. Finally, I’m adding the random slopes and intercepts to the fixed slope and intercept (and assigning that to the object “paramsStudent”) to obtain a Math Achievement ~ SES line for every school.

fix <- fixef(model)
rand <- ranef(model)
paramsStudent <- cbind((rand$id[1]+fix[1]),(rand$id[2]+fix[2]))

Next, I’m setting up the plot parameters, without yet adding the random or fixed effects. This is pretty straightforward: specifying the data and formula, setting the y and x limits based on the minimum and maximum values of each, and adding x, y, and main plot labels.

plot(data = HSBfinal2, mathach ~ groupmeanSES,type = 'n', 
     ylim = c(min(HSBfinal2$mathach),max(HSBfinal2$mathach)),
     xlim = c(min(HSBfinal2$groupmeanSES),max(HSBfinal2$groupmeanSES)),
     cex.main = .75,
     xlab = 'SES (Group-mean Centered)', ylab = "Math AchievementSES ",
     main = "Variability in Slope and Intercepts- School")

Now it’s time to add the regression lines for every school. I do this using a for-loop. It will iterate through however many schools there are (in this case, 160), and for each school, draw a line of best fit (in light blue), adding each line to the same graph.

plot(data = HSBfinal2, mathach ~ groupmeanSES,type = 'n', 
     ylim = c(min(HSBfinal2$mathach),max(HSBfinal2$mathach)),
     xlim = c(min(HSBfinal2$groupmeanSES),max(HSBfinal2$groupmeanSES)),
     cex.main = .75,
     xlab = 'SES (Group-mean Centered)', ylab = "Math AchievementSES ",
     main = "Variability in Slope and Intercepts- School")

for(i in 1:length(unique(HSBfinal2$id))){
  abline(a = paramsStudent[i,1], b = paramsStudent[i,2],col = 'lightblue')
  par<- par(new=F)
}

Finally, we add a red line representing the fixed effect.

plot(data = HSBfinal2, mathach ~ groupmeanSES,type = 'n', 
     ylim = c(min(HSBfinal2$mathach),max(HSBfinal2$mathach)),
     xlim = c(min(HSBfinal2$groupmeanSES),max(HSBfinal2$groupmeanSES)),
     cex.main = .75,
     xlab = 'SES (Group-mean Centered)', ylab = "Math AchievementSES ",
     main = "Variability in Slope and Intercepts- School")

for(i in 1:length(unique(HSBfinal2$id))){
  abline(a = paramsStudent[i,1], b = paramsStudent[i,2],col = 'lightblue')
  par<- par(new=F)
}

abline(a = fix[1], b = fix[2], lwd= 2,col = 'red')

Scatterplot of Intercepts and Slopes

What is the relationship between the random intercepts and slopes? This plot is pretty simple. I’m simply creating a scatterplot of each school’s slope and intercept (from the paramsStudent object created above).

plot(paramsStudent, ylim = c(3,9),xlim = c(8,18),cex.main = .75,
     xlab = 'Intercepts', ylab = "Slopes ",
     main = "Correlation betwen Random Slopes and Intercepts = - .09" )

Multilevel Graphing using Lattice

So that was a basic run through of some important multilevel graphs and how to create them using the basic R plot function. Now it’s time to look at some of the graphs we can make using the lattice package.

Scatterplot Panel

This first plot is a simple multiple scatterplot showing the relationship between math achievement and ses for a subset of the groups.

The first chunk here is simply subsetting the data so that it’s easier to visualize. Here we are just looking at a random sample of 20 groups.

groups <- unique(HSBfinal2$id)[sample(1:160,20)]
subset<- HSBfinal2[HSBfinal2$id%in%groups,]

Now let’s graph it, using the xyplot function in the lattice package.

xyplot(mathach~ses |as.factor(id),subset ,
       col.line = 'black',
       main = 'Variability in Math Achievement~SES Relationship')

This is the same, except now I want to add a regression line to each subplot. I’ll do this with the “type” parameter. The “p” argument simply tells the function that we are requesting points and the “r” argument tells the function we want a straight regression line.

xyplot(mathach~ses |as.factor(id),subset ,
       col.line = 'black',
       type = c("p", "r"),
       main = 'Variability in Math Achievement~SES Relationship')

If we want smooth regression lines, we can substitute “smooth” for the “r” argument.

suppressWarnings(xyplot(mathach~ses |as.factor(id),subset ,
       col.line = 'black',
       type = c("p", "smooth"),
       main = 'Variability in Math Achievement~SES Relationship'))

Grouped Data on Same Plot

Another useful thing we can do is specify the grouping variable using the “group” argument. The difference between the two ways of specification is that this method will include all of the data on a single plot, differentiating them via color. Obviously this can look pretty messy as the size of the grouping factor increases. For visualization purposes, it would be better to either use the previous method or further reduce the number of groups used for the graph.

groups <- unique(HSBfinal2$id)[sample(1:160,5)]
subset<- HSBfinal2[HSBfinal2$id%in%groups,]

xyplot(mathach~ses,subset ,
       type = c("p", "smooth"),
       group = HSBfinal2$id,
       main = 'Variability in Math Achievement~SES Relationship')

QQ plots

Now let’s make a qq plot via the qqmath function in the lattice package. These plots show the point estimates of the random intercepts and slopes, along with their associated errors.

qqmath(ranef(model, condVar = TRUE), strip = FALSE)

## $id

Sources

Finally, I’d like to acknowledge the variety of sources from which I gathered pieces of information presented in this how-to manual:

http://www.stat.ubc.ca/~jenny/STAT545A/block09_xyplotLattice.html

http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf

http://faculty.smu.edu/kyler/training/AERA_overheads.pdf

Introduction to Multilevel Modeling in R

Sarah Grover, Steff Guillermo, and Sean Hudson

May 3, 2015