This week we practiced analyzing and modeling Linear Mixed Models
The following problems will serve as examples on how to code and analyze the R output
Consider the Penicillin data set in lme4. The data come from a book published by Davies and Goldsmith in 1972. The goal was to “assess the variability between samples of penicillin by the B. subtilis method. In this test method a bulk-innoculated nutrient agar medium is poured into a Petri dish of approximately 90 mm diameter, known as a plate. When the medium has set, six small hollow cylinders or pots (about 4 mm in diameter) are cemented onto the surface at equally spaced intervals. A few drops of the penicillin solutions to be compared are placed in the respective cylinders, and the whole plate is placed in an incubator for a given time. Penicillin diffuses from the pots into the agar, and this produces a clear circular zone of inhibition of growth of the organisms, which can be readily measured. The diameter of the zone is related in a known way to the concentration of penicillin in the solution.”
library(lme4)
library(lattice)
data("Penicillin")
Pen <- Penicillin
attach(Pen)
The diameter of the zone is the response.
The random effects would be defined by both the plate variable and sample variable.
The random effects appear to be crossed in this case.
plot(sample,diameter)
plot(plate, diameter)
The first plot shows that there seems to be some variance in the samples of Pencillin.
The second plot shows that there could be some variance in the plates as well.
randmodel <- lmer((diameter ~ 1 + (1|plate) + (1|sample)), data = Pen)
summary(randmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: diameter ~ 1 + (1 | plate) + (1 | sample)
## Data: Pen
##
## REML criterion at convergence: 330.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.07923 -0.67140 0.06292 0.58377 2.97959
##
## Random effects:
## Groups Name Variance Std.Dev.
## plate (Intercept) 0.7169 0.8467
## sample (Intercept) 3.7311 1.9316
## Residual 0.3024 0.5499
## Number of obs: 144, groups: plate, 24; sample, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 22.9722 0.8086 28.41
The grand mean is 22.9722
confint(randmodel, level =.9)[4,]
## Computing profile confidence intervals ...
## 5 % 95 %
## 21.60872 24.33572
As I expected, after observing the graphs, the largest source of variability comes from the sample.
Consider the Pastes data set in lme4. This data set is also from Davies and Goldsmith. A chemical paste product is delivered in casks. Several casks come from one “batch” of paste. The researchers randomly selected a couple casks from each batch. From each of the selected casks they took a sample and tested the samples. Each sample was tested twice. In this data set, a from batch A is unrelated to a in batch B.
The strength of the sample of paste is the response
The variables of batch, cask, and sample are all random effects in this scenario.
These random effects are nested because there is a natural progression to these random effects.
xyplot(strength ~ batch|cask)
It appears there is a lot of variance in the strength of the paste by cask.
pastemodel <- lmer(strength ~ 1 + (1|batch/cask/sample), data = Pastes)
summary(pastemodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: strength ~ 1 + (1 | batch/cask/sample)
## Data: Pastes
##
## REML criterion at convergence: 247
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4798 -0.5156 0.0095 0.4720 1.3897
##
## Random effects:
## Groups Name Variance Std.Dev.
## sample:(cask:batch) (Intercept) 0.05655 0.2378
## cask:batch (Intercept) 8.37721 2.8943
## batch (Intercept) 1.65730 1.2874
## Residual 0.67799 0.8234
## Number of obs: 60, groups: sample:(cask:batch), 30; cask:batch, 30; batch, 10
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 60.0533 0.6769 88.72
60.0533 is the grand mean.
reducedpaste <- lmer(strength ~ 1 + (1|batch), data = Pastes)
teststat <- 2*as.numeric(logLik(pastemodel) - logLik(reducedpaste))
pchisq(teststat, df = 1, lower.tail = FALSE)
## [1] 1.473854e-13
Small p-value indicates the full LMM (with 3 random effects) is significantly better.
Consider the eggs data set in faraway. As described in Julian Faraway’s 2006 book: “Consistency between laboratory tests is important and yet the results may depend on who did the test and where the test was performed. In an experiment to test levels of consistency, a large jar of dried egg powder was divided up into a number of samples. Because the powder was homogenized, the fat content of the samples is the same, but this fact is withheld from the laboratories. Four samples were sent to each of six laboratories. Two of the samples were labeled as G and two as H, although in fact they were identical. The laboratories were then instructed to divide their samples into two parts and measure the fat content of each. So each laboratory reported eight measures, [and] each technician four measures.”
Investigate the data. To reiterate, after splitting the samples in half, each lab has 4 vials labeled “G” and 4 vials labeled “H.” Similar to the last problem, sample G for one technician is different from sample G for another technician. (Hint: sample is nested in technician) Also, technicians one and two are different for each lab
Fat is the response variable here.
Lab, Technician, and Sample are all random effects.
Due to the natural progression I believe these are nested random effects.
fatmodel <- lmer(Fat ~ 1 + (1|Lab/Technician/Sample),data = eggs)
summary(fatmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Fat ~ 1 + (1 | Lab/Technician/Sample)
## Data: eggs
##
## REML criterion at convergence: -64.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.04098 -0.46576 0.00927 0.59713 1.54276
##
## Random effects:
## Groups Name Variance Std.Dev.
## Sample:(Technician:Lab) (Intercept) 0.003065 0.05536
## Technician:Lab (Intercept) 0.006980 0.08355
## Lab (Intercept) 0.005920 0.07694
## Residual 0.007196 0.08483
## Number of obs: 48, groups:
## Sample:(Technician:Lab), 24; Technician:Lab, 12; Lab, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.38750 0.04296 9.019
.38750 is the grand mean
Technichians: .00698
Lab: .00592
refitmodel <- lmer(Fat ~ 1 + (1|Lab/Technician),data = eggs)
summary(refitmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Fat ~ 1 + (1 | Lab/Technician)
## Data: eggs
##
## REML criterion at convergence: -62.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.17799 -0.42424 0.08043 0.67361 1.77544
##
## Random effects:
## Groups Name Variance Std.Dev.
## Technician:Lab (Intercept) 0.008002 0.08945
## Lab (Intercept) 0.005920 0.07694
## Residual 0.009239 0.09612
## Number of obs: 48, groups: Technician:Lab, 12; Lab, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.38750 0.04296 9.019
The grand mean doesn’t change because removing a variance component does not change the intercept!