Part 1: Analysis of Variance

In this laboratory we developed a mixed effects model. A mixed effects model is a type of regression model that as its name suggests contains both fixed effects comprised of independent variables that explain variation and random effects comprised of independent variables that do not contribute to the explanation of variation. In summary, linear mixed effects models describe the relationship between a response variable and independent variables when the data are collected and summarized in groups.

We explored a simple example of a mixed-effect model; We used an experiment in which measurements of crop yields were taken from 10 randomly selected fields having three soil types; sand, clay or loam.

We used tapply() to determine the yield variance by soil type

The Fligner-Killeen test will give us a good idea whether or not our initial variance results are significant. The Fligner-Killeen test returns a p-value much greater than 0.05 so we must accept the null hypothesis that there are no differences and can move forward.

The boxplots reveals that we can expect, the lowest yield for sand and the highest for loam. There is apparently an outlier for clay. The boxplot representing the yield for sand does not even overlap that of loam. However, the boxplot for clay overlaps both loam and sand. To really determine what is going on we performed analysis of variance.

The conclusion from the summary.lm() output is that neither Loam nor Sand produces a significantly higher yield than Clay, as shown by the values output by the summary.lm() function. This is despite the fact that the original analysis of variance indicated that significant differences in the means exist with a p-value of 0.025.

When we use the output from the analysis to create plots that substantiate the verification of the assumptions. The first plot is our check on the constancy of variance. We already check this with the Fligner-Killeen test. Now, checking it graphically we are looking to verify that there isn’t any pattern in the plot of the residuals versus fitted values. The normal probability plot used to test the assumption of normality of errors. It looks good. One plot checks the residuals; another plot checks that there are no highly influential values that would distort the parameter estimates. So, our analysis of variance looks good.

Part 2: Mixed Effects Models

To analyse the Mixed Effects Model we considered an example in which we investigate the effect of toxic agents on 48 animals. These 48 animals were exposed to three poisons and four treatments. The data indicate the survival time in tens of hours. The response “time” is of data type numeric but the treatments and levels are of data type factor.

The box plot shows that the variance may be related to the mean response, i.e. the greater the mean response the greater the variance. Now, because of the more complex situation we did interaction plots. From interaction plots it was very hard to tell because the variation may be due to noise in the data. Since we have replication we directly tested for any interaction effects , which showed that the interaction (poison:treat) is not statistically significant but the main effects are.

we used the transformed data we stored in the data object “g” to complete our analysis, first the interaction effects and then the main effects.

Mixed Effects Models Part 3: A more complex mixed effects model

We looked at the data from an experiment that measured productivity on a manufacturing task according to the type of machine and the operator. This data set was named Machines. It was in the MEMSS package in R.

The comparison of the models fm1 and fm3 are shown on the line labeled “fm3” in the ANOVA table. We see that the additional parameter “Machine:Worker” in fm3 is highly significant. The change in the deviance, i.e. 68.4338, is very large. So, we prefer the more complex model, fm3, over fm1. The most complex model, fm2, adds four more parameters but based on the p-value is not statistically significant.Also we checked for normailty.

APPENDIX

Part 1: Analysis of Variance

c1 <- c(6, 10, 8, 6, 14, 17, 9, 11, 7, 11)
c2 <- c(17, 15, 3, 11, 14, 12, 12, 8, 10, 13)
c3 <- c(13, 16, 9, 12, 15, 16, 17, 13, 18, 14)
obs <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

yields <- data.frame(cbind(obs, c1, c2, c3), ncol=4)
colnames(yields) <- c("", "Sand", "Clay", "Loam")
yields

yields <- yields[,-1]
yields <- yields[,-5]
yields


sapply(list(yields$Sand,yields$Clay,yields$Loam),mean)

(frame <- stack(yields))

yields <- yields[,-4]
yields

names(frame) <- c("yield","soil")
attach(frame)
head(frame)

tapply(yield,soil,var)

fligner.test(yield~soil)

plot(yield~soil,col="green")

sum((yield-mean(yield))^2)

sum((yields$Sand-mean(yields$Sand))^2)

sum((yields$Sand-mean(yields$Clay))^2)

sum((yields$Sand-mean(yields$Loam))^2)


SSError <- sum(sapply(list(yields$Sand,yields$Clay,yields$Loam),function (x
) sum((x-mean(x))^2) ))
SSError

SSTreatments <- 414.7-315.5
SSTreatments


MSTreatments <- SSTreatments/2
MSTreatments

MSError <- SSError/27
MSError

tapply(yield,soil,var)
mean(tapply(yield,soil,var))


MSTreatments/MSError


qf(.95,2,27)

1-pf(4.24,2,27)

summary(aov(yield~soil))


yieldsModel<-aov(formula = yield ~ soil)

summary.lm(yieldsModel)

means <- tapply(yield, soil, mean)
means


means[2] - means[1]

means[3] - means[1]

model.tables(yieldsModel,se=T)

sqrt(11.69/10)

qt(0.975, 18)

2*(1 - pt(2.88, df = 18))

par(mfrow=c(2,2))
plot(aov(yield~soil))

Part 2: Mixed Effects Models

data(package = "lme4")

library(lme4)

## Loading required package: Matrix

fm1 <- lmer(Yield ~ 1 + (1 | Batch), Dyestuff)
fm1

summary(fm1)

library(boot)
data("poisons")
str(poisons)

plot(time ~ treat + poison, data = poisons)

interaction.plot(poisons$treat, poisons$poison, poisons$time)

interaction.plot(poisons$poison, poisons$treat, poisons$time)

g <- lm(time ~ poison*treat, poisons)
anova(g)


qqnorm(g$residuals)

g <- lm(log(time) ~ poison*treat, poisons)
plot(g$fitted, g$res, xlab = "Fitted", ylab = "Residuals")

g <- lm(1/time ~ poison*treat, poisons)
plot(g$fitted, g$res, xlab = "Fitted", ylab = "Residuals")

qqnorm(g$residuals)

anova(g)


g <- lm(1/time ~ poison+treat, poisons)
summary(g)


par(mfrow=c(2,2))
plot(lm(g))

Mixed Effects Models Part 3: A more complex mixed effects model

data(package = "MEMSS")
library(MEMSS)

## 
## Attaching package: 'MEMSS'

## The following objects are masked from 'package:datasets':
## 
##     CO2, Orange, Theoph

data("Machines")
str(Machines)

xtabs(~Machine + Worker, Machines)


fm1 <- lmer(score ~ Machine + (1|Worker), Machines, REML=FALSE)
fm2 <- lmer(score ~ Machine + (Machine | Worker), Machines, REML = FALSE)
fm3 <- lmer(score ~ Machine + (1|Worker) +(1|Machine:Worker), Machines, REML = FALSE)
anova(fm1, fm2, fm3)

Laboratory #1 - Mixed Effects Models

Vinay Mariyappa

February 14, 2017

Part 1: Analysis of Variance

Part 2: Mixed Effects Models

Mixed Effects Models Part 3: A more complex mixed effects model

APPENDIX

Part 1: Analysis of Variance

Part 2: Mixed Effects Models

Mixed Effects Models Part 3: A more complex mixed effects model