Below I construct an ANOVA table by hand to model a three by three two-factor experiment testing the effect of protein source and calorie amount in animal diets on the bodyweight of animals.
Factor A is food source (beef, pork, grain). Factor B is calories (low, medium, high). MSE is 8.75. a = 3, b = 3, n = 2.
As seen in the ANOVA table, the p-value for the hypothesis test of the null that all the protein factor main effects (row effects) are 0 is 0.02, good evidence that there is a significant effect. Likewise for the calories factor main effects (column effects). The significance test for presence of at least one significant interaction effect also has good evidence, with a p-value of 0.005. It makes sense to include main effect terms for each factor and interaction terms in our model.
As above, I constructed an ANOVA table “by hand” by using all the formulas that sum up the different components of the partitioned sum of squares. This time, both factor main effects are highly significant, with very small p-values. This means that at least one level of germination time (in days) and at least one level of temperature (degrees C) have an effect on the free alpha nitrogen content of rice malt.
However, the interaction term is not significant, with a p-value of 0.415. The interaction plot backs this up. I traced levels of the temperature factor over a plot of the response against the days factor, and all four lines move in parallel. There is no evidence for interaction, so for this model we only need main effect terms to describe the mean structure.
# fit ANOVA modelbarley.model <-aov(y~week*water, data = barley)print(anova(barley.model))
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
week 4 1321.13 330.28 5.5293 0.003645 **
water 1 1178.13 1178.13 19.7232 0.000251 ***
week:water 4 208.87 52.22 0.8742 0.496726
Residuals 20 1194.67 59.73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# normality of residuals check (npp)qqnorm(residuals(barley.model))qqline(residuals(barley.model))
Show the code
# equal variance check (residuals against fitted)plot(barley.model$fitted.values, barley.model$residuals, xlab ="Fitted values", ylab ="Residuals", main ="Residuals vs. Fitted")
The interaction plot does not show strong evidence of interaction between the week and water factors, and the p-value for the interaction term in the ANOVA model reasonably supports this (p = 0.5). The weeks and the water factors both appear to have highly significant main effects. But the residuals are showing a departure from normality in both tails and the variance looks to slightly fan out along the fitted values. I will attempt a stabilizing Box-Cox transformation.
Show the code
library(MASS)# Box-Cox transformation, check the two plots againboxcoxResult <-boxcox(barley.model)
Show the code
lambda = boxcoxResult$x[which.max(boxcoxResult$y)]barley$yboxcox = (barley$y^lambda-1)/lambdabarley.model.boxcox <-aov(yboxcox~week*water, data = barley)print(anova(barley.model.boxcox))
Analysis of Variance Table
Response: yboxcox
Df Sum Sq Mean Sq F value Pr(>F)
week 4 39.339 9.835 5.6384 0.003313 **
water 1 41.234 41.234 23.6404 9.443e-05 ***
week:water 4 4.135 1.034 0.5926 0.671934
Residuals 20 34.884 1.744
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
default contrast might be inappropriate
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = yboxcox ~ week * water, data = barley)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0 -1.258 1.078 -1.167 0.257
(Adjusted p values reported -- single-step method)
The week and the water factors both have significant main effects on the number of seeds germinating. After a Tukey test, it appears that there is only a significant difference between 12 weeks and 1 week, indicating that a large duration of time is perhaps needed.
Problem 8.4
pacemakers problem
Show the code
library(ggplot2)library(MASS)substrates <-read.table("pr8.4", header =TRUE)substrates$flow <-as.factor(substrates$flow)substrates$temp <-as.factor(substrates$temp)substrates$laser <-as.factor(substrates$laser)# fit ANOVA modelmodel <-aov(y~flow*temp*laser, data = substrates)print(anova(model))
# normality of residuals check (npp)qqnorm(residuals(model))qqline(residuals(model))
Show the code
# equal variance check (residuals against fitted)plot(model$fitted.values, model$residuals, xlab ="Fitted values", ylab ="Residuals", main ="Residuals vs. Fitted")
I first check the main assumptions to determine the validity of an ANOVA model. The residuals sufficiently appear to come from a normal distribution. There is clearly unequal variance however, as the residuals fan out with increasing fitted values.
Show the code
library(MASS)# Box-Cox transformation, check the two plots againboxcoxResult <-boxcox(model)
Show the code
lambda = boxcoxResult$x[which.max(boxcoxResult$y)]substrates$yboxcox = (substrates$y^lambda-1)/lambdamodel.boxcox <-aov(yboxcox~flow*temp*laser, data = substrates)print(anova(model.boxcox))
plot(model.boxcox$fitted.values, model.boxcox$residuals, xlab ="Fitted values", ylab ="Residuals", main ="Residuals vs. Fitted")
After performing a Box-Cox transformation on the response with a lambda value of 0.18, the variance seems to have stabilized.
The ANOVA table shows that the firing profile time (8 hours vs. 13 hours) is the only factor main effect that is significant on the response. Therefore there is no evidence for the higher airflow and newer laser being better for this pacemaker substrate process.
Problem 9.3
In the pacemaker data in Problem 8.4, only the firing profile time factor had a significant main effect. The two-way (airflow x laser) interaction term as well as the three-way (airflow x profile x laser) term are also significant in the ANOVA table.
In the above code I run Scheffe’s test on all possible contrasts among the factor level means. There appear to possibly be some systematic patterns in the significance of differences in treatment means between different factor levels (the interactions). I don’t currently know what to say fully about these patterns.
Problem 9.5
Show the code
cells <-read.table("pr9.5", header =TRUE)cells$type <-as.factor(cells$type)# the amount variable is treated as continuous# fit ANOVA modelcells.model <-aov(y~amt*type, data = cells)print(anova(cells.model))
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
amt 1 15204.9 15204.9 65.488 4.020e-05 ***
type 1 22968.8 22968.8 98.927 8.837e-06 ***
amt:type 1 5569.2 5569.2 23.987 0.001197 **
Residuals 8 1857.4 232.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# normality of residuals check (npp)qqnorm(residuals(cells.model))qqline(residuals(cells.model))
Show the code
# equal variance check (residuals against fitted)plot(cells.model$fitted.values, cells.model$residuals, xlab ="Fitted values", ylab ="Residuals", main ="Residuals vs. Fitted")
Show the code
ScheffeTest(cells.model)
Warning in replications(paste("~", xx), data = mf): non-factors ignored: amt
Warning in replications(paste("~", xx), data = mf): non-factors ignored: amt,
type
Warning in ScheffeTest.aov(cells.model): 'which' specified some non-factors
which will be dropped
As seen in the ANOVA table, there are significant main effects and a significant interaction effect between the two factors. I am not sure how to run Scheffe’s test given that I had to treat the “amt” factor as continuous to not get NaNs / undefined values in the ANOVA table.