LMM Practice

This week we practiced analyzing and modeling Linear Mixed Models

The following problems will serve as examples on how to code and analyze the R output

#1

Consider the Penicillin data set in lme4. The data come from a book published by Davies and Goldsmith in 1972. The goal was to “assess the variability between samples of penicillin by the B. subtilis method. In this test method a bulk-innoculated nutrient agar medium is poured into a Petri dish of approximately 90 mm diameter, known as a plate. When the medium has set, six small hollow cylinders or pots (about 4 mm in diameter) are cemented onto the surface at equally spaced intervals. A few drops of the penicillin solutions to be compared are placed in the respective cylinders, and the whole plate is placed in an incubator for a given time. Penicillin diffuses from the pots into the agar, and this produces a clear circular zone of inhibition of growth of the organisms, which can be readily measured. The diameter of the zone is related in a known way to the concentration of penicillin in the solution.”

library(lme4)
library(lattice)
data("Penicillin")

Pen <- Penicillin
attach(Pen)

What is the response in this data set?

The diameter of the zone is the response.

Random effects are defined by which variables?

The random effects would be defined by both the plate variable and sample variable.

Are the random effects nested or crossed?

The random effects appear to be crossed in this case.

Create at least plot and explain what it shows.

plot(sample,diameter)

plot(plate, diameter)

The first plot shows that there seems to be some variance in the samples of Pencillin.

The second plot shows that there could be some variance in the plates as well.

Find the grand mean using a model with the appropriate random effects.

randmodel <- lmer((diameter ~ 1 + (1|plate) + (1|sample)), data = Pen)
summary(randmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: diameter ~ 1 + (1 | plate) + (1 | sample)
##    Data: Pen
## 
## REML criterion at convergence: 330.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.07923 -0.67140  0.06292  0.58377  2.97959 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  plate    (Intercept) 0.7169   0.8467  
##  sample   (Intercept) 3.7311   1.9316  
##  Residual             0.3024   0.5499  
## Number of obs: 144, groups:  plate, 24; sample, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  22.9722     0.8086   28.41

The grand mean is 22.9722

Use confint command to create a 90% confidence interval for the grand mean

confint(randmodel, level =.9)[4,]
## Computing profile confidence intervals ...
##      5 %     95 % 
## 21.60872 24.33572

Which source of variability is the largest?

As I expected, after observing the graphs, the largest source of variability comes from the sample.

#2

Consider the Pastes data set in lme4. This data set is also from Davies and Goldsmith. A chemical paste product is delivered in casks. Several casks come from one “batch” of paste. The researchers randomly selected a couple casks from each batch. From each of the selected casks they took a sample and tested the samples. Each sample was tested twice. In this data set, a from batch A is unrelated to a in batch B.

What is the response?

The strength of the sample of paste is the response

Random effects are defined by which variables?

The variables of batch, cask, and sample are all random effects in this scenario.

Are the random effects nested or crossed?

These random effects are nested because there is a natural progression to these random effects.

Create at least one plot

xyplot(strength ~ batch|cask)

It appears there is a lot of variance in the strength of the paste by cask.

Find the grand mean using a model that includes the appropriate random effects.

pastemodel <- lmer(strength ~ 1 + (1|batch/cask/sample), data = Pastes)
summary(pastemodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: strength ~ 1 + (1 | batch/cask/sample)
##    Data: Pastes
## 
## REML criterion at convergence: 247
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4798 -0.5156  0.0095  0.4720  1.3897 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  sample:(cask:batch) (Intercept) 0.05655  0.2378  
##  cask:batch          (Intercept) 8.37721  2.8943  
##  batch               (Intercept) 1.65730  1.2874  
##  Residual                        0.67799  0.8234  
## Number of obs: 60, groups:  sample:(cask:batch), 30; cask:batch, 30; batch, 10
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  60.0533     0.6769   88.72

60.0533 is the grand mean.

Test to see if the lowest-level variance component is significant

reducedpaste <- lmer(strength ~ 1 + (1|batch), data = Pastes)

teststat <- 2*as.numeric(logLik(pastemodel) - logLik(reducedpaste))

pchisq(teststat, df = 1, lower.tail = FALSE)
## [1] 1.473854e-13

Small p-value indicates the full LMM (with 3 random effects) is significantly better.

#3

Consider the eggs data set in faraway. As described in Julian Faraway’s 2006 book: “Consistency between laboratory tests is important and yet the results may depend on who did the test and where the test was performed. In an experiment to test levels of consistency, a large jar of dried egg powder was divided up into a number of samples. Because the powder was homogenized, the fat content of the samples is the same, but this fact is withheld from the laboratories. Four samples were sent to each of six laboratories. Two of the samples were labeled as G and two as H, although in fact they were identical. The laboratories were then instructed to divide their samples into two parts and measure the fat content of each. So each laboratory reported eight measures, [and] each technician four measures.”

Investigate the data. To reiterate, after splitting the samples in half, each lab has 4 vials labeled “G” and 4 vials labeled “H.” Similar to the last problem, sample G for one technician is different from sample G for another technician. (Hint: sample is nested in technician) Also, technicians one and two are different for each lab

What is the response?

Fat is the response variable here.

Random effects defined by which variables?

Lab, Technician, and Sample are all random effects.

Random effects crossed or nested?

Due to the natural progression I believe these are nested random effects.

Find the grand mean using an appropriate model.

fatmodel <- lmer(Fat ~ 1 + (1|Lab/Technician/Sample),data = eggs)
summary(fatmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Fat ~ 1 + (1 | Lab/Technician/Sample)
##    Data: eggs
## 
## REML criterion at convergence: -64.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.04098 -0.46576  0.00927  0.59713  1.54276 
## 
## Random effects:
##  Groups                  Name        Variance Std.Dev.
##  Sample:(Technician:Lab) (Intercept) 0.003065 0.05536 
##  Technician:Lab          (Intercept) 0.006980 0.08355 
##  Lab                     (Intercept) 0.005920 0.07694 
##  Residual                            0.007196 0.08483 
## Number of obs: 48, groups:  
## Sample:(Technician:Lab), 24; Technician:Lab, 12; Lab, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.38750    0.04296   9.019

.38750 is the grand mean

What is the variance between technicians and between labs?

Technichians: .00698

Lab: .00592

What does removing the sample variance component do to the fixed effect?

refitmodel <- lmer(Fat ~ 1 + (1|Lab/Technician),data = eggs)
summary(refitmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Fat ~ 1 + (1 | Lab/Technician)
##    Data: eggs
## 
## REML criterion at convergence: -62.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.17799 -0.42424  0.08043  0.67361  1.77544 
## 
## Random effects:
##  Groups         Name        Variance Std.Dev.
##  Technician:Lab (Intercept) 0.008002 0.08945 
##  Lab            (Intercept) 0.005920 0.07694 
##  Residual                   0.009239 0.09612 
## Number of obs: 48, groups:  Technician:Lab, 12; Lab, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.38750    0.04296   9.019

The grand mean doesn’t change because removing a variance component does not change the intercept!