STAT 565 HW 4

Ryan Masson | Portland State University | 3/8/24

Exercise 8.1

Below I construct an ANOVA table by hand to model a three by three two-factor experiment testing the effect of protein source and calorie amount in animal diets on the bodyweight of animals.

Factor A is food source (beef, pork, grain). Factor B is calories (low, medium, high). MSE is 8.75. a = 3, b = 3, n = 2.

Show the code

library(knitr)
w <- matrix(c(76.0,86.8,101.8,
                       83.3,89.5,98.2,
                       83.8,83.5,86.2), 
                     nrow=3, 
                     ncol=3,
                     byrow = TRUE)

mu = mean(w)
a1 = mean(w[1,]) - mu
a2 = mean(w[2,]) - mu
a3 = mean(w[3,]) - mu
b1 = mean(w[,1]) - mu
b2 = mean(w[,2]) - mu
b3 = mean(w[,3]) - mu
ab11 = w[1,1] - mu - a1 - b1
ab12 = w[1,2] - mu - a1 - b2
ab13 = w[1,3] - mu - a1 - b3
ab21 = w[2,1] - mu - a2 - b1
ab22 = w[2,2] - mu - a2 - b2
ab23 = w[2,3] - mu - a2 - b3
ab31 = w[3,1] - mu - a3 - b1
ab32 = w[3,2] - mu - a3 - b2
ab33 = w[3,3] - mu - a3 - b3
SSa = (6*a1^2)+(6*a2^2)+(6*a3^2)
SSb = (6*b1^2)+(6*b2^2)+(6*b3^2)
SSab = 0
for (i in 1:3) {
  for(j in 1:3) {
    SSab = SSab + 2*((get(paste('ab',i,j,sep = '')))^2) 
  }
}
SSe = 8.75*9
MSa = SSa/2
MSb = SSb/2
MSab = SSab/4
MSe = SSe/9
p1 = 1 - pf(MSa/MSe, df1 = 2, df2 = 9)
p2 = 1 - pf(MSb/MSe, df1 = 2, df2 = 9)
p3 = 1 - pf(MSab/MSe, df1 = 4, df2 = 9)

anova_table <- data.frame(
  Source = c("Protein","Calories","Protein_Calories", "Error"),
  DF = c(2,2,4,9),
  SS = c(SSa,SSb,SSab,SSe),
  MS = c(MSa,MSb,MSab,MSe),
  "F" = c(MSa/MSe,MSb/MSe,MSab/MSe,''),
  p = c(p1,p2,p3,'')
)

knitr::kable(anova_table, format = "markdown")

Source	DF	SS	MS	F	p
Protein	2	104.5378	52.26889	5.97358730158729	0.0223370982843687
Calories	2	629.6578	314.82889	35.9804444444445	5.09158018454636e-05
Protein_Calories	4	274.7156	68.67889	7.84901587301587	0.00523033750241986
Error	9	78.7500	8.75000

As seen in the ANOVA table, the p-value for the hypothesis test of the null that all the protein factor main effects (row effects) are 0 is 0.02, good evidence that there is a significant effect. Likewise for the calories factor main effects (column effects). The significance test for presence of at least one significant interaction effect also has good evidence, with a p-value of 0.005. It makes sense to include main effect terms for each factor and interaction terms in our model.

Exercise 8.2

free alpha amino nitrogen content of rice malt

Show the code

ricemalt <- read.table('ex8.2', header = TRUE)

r <- matrix(c(ricemalt[1,3],ricemalt[2,3],
                     ricemalt[3,3],ricemalt[4,3],
                     ricemalt[5,3],ricemalt[6,3],
                     ricemalt[7,3],ricemalt[8,3],
                     ricemalt[9,3],ricemalt[10,3],
                     ricemalt[11,3],ricemalt[12,3]), 
                     nrow=3, 
                     ncol=4,
                     byrow = TRUE)

mu = mean(r)
a1 = mean(r[1,]) - mu
a2 = mean(r[2,]) - mu
a3 = mean(r[3,]) - mu
b1 = mean(r[,1]) - mu
b2 = mean(r[,2]) - mu
b3 = mean(r[,3]) - mu
b4 = mean(r[,4]) - mu
ab11 = r[1,1] - mu - a1 - b1
ab12 = r[1,2] - mu - a1 - b2
ab13 = r[1,3] - mu - a1 - b3
ab14 = r[1,4] - mu - a1 - b4
ab21 = r[2,1] - mu - a2 - b1
ab22 = r[2,2] - mu - a2 - b2
ab23 = r[2,3] - mu - a2 - b3
ab24 = r[2,4] - mu - a2 - b4
ab31 = r[3,1] - mu - a3 - b1
ab32 = r[3,2] - mu - a3 - b2
ab33 = r[3,3] - mu - a3 - b3
ab34 = r[3,4] - mu - a3 - b4
SSa = (8*a1^2)+(8*a2^2)+(8*a3^2)
SSb = (6*b1^2)+(6*b2^2)+(6*b3^2)+(6*b4^2)
SSab = 0
for (i in 1:3) {
  for(j in 1:4) {
    SSab = SSab + 2*((get(paste('ab',i,j,sep = '')))^2) 
  }
}
SStotal = 8097
SSe = (SStotal - SSa - SSb - SSab)
MSa = SSa/2
MSb = SSb/3
MSab = SSab/6
MSe = SSe/12
p1 = 1 - pf(MSa/MSe, df1 = 2, df2 = 12)
p2 = 1 - pf(MSb/MSe, df1 = 3, df2 = 12)
p3 = 1 - pf(MSab/MSe, df1 = 6, df2 = 12)

anova_table <- data.frame(
  Source = c("Days","Temperature","Days_Temperature", "Error"),
  DF = c(2,3,6,12),
  SS = c(SSa,SSb,SSab,SSe),
  MS = c(MSa,MSb,MSab,MSe),
  "F" = c(MSa/MSe,MSb/MSe,MSab/MSe,''),
  p = c(p1,p2,p3,'')
)

knitr::kable(anova_table, format = "markdown")

Source	DF	SS	MS	F	p
Days	2	5179.4633	2589.73167	120.720904335213	1.1267249799829e-08
Temperature	3	2518.0933	839.36444	39.1271559537985	1.7910765131024e-06
Days_Temperature	6	142.0167	23.66944	1.10335629564405	0.414667284797445
Error	12	257.4267	21.45222

Show the code

interaction.plot(ricemalt[,1], ricemalt[,2], ricemalt[,3])

As above, I constructed an ANOVA table “by hand” by using all the formulas that sum up the different components of the partitioned sum of squares. This time, both factor main effects are highly significant, with very small p-values. This means that at least one level of germination time (in days) and at least one level of temperature (degrees C) have an effect on the free alpha nitrogen content of rice malt.

However, the interaction term is not significant, with a p-value of 0.415. The interaction plot backs this up. I traced levels of the temperature factor over a plot of the response against the days factor, and all four lines move in parallel. There is no evidence for interaction, so for this model we only need main effect terms to describe the mean structure.

Problem 8.1

Show the code

barley <- read.table("pr8.1", header = TRUE)

barley$week <- as.factor(barley$week)
barley$water <- as.factor(barley$water)

interaction.plot(barley$week, barley$water, barley$y)

Show the code

# fit ANOVA model
barley.model <- aov(y~week*water, data = barley)
print(anova(barley.model))

Analysis of Variance Table

Response: y
           Df  Sum Sq Mean Sq F value   Pr(>F)    
week        4 1321.13  330.28  5.5293 0.003645 ** 
water       1 1178.13 1178.13 19.7232 0.000251 ***
week:water  4  208.87   52.22  0.8742 0.496726    
Residuals  20 1194.67   59.73                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

# normality of residuals check (npp)
qqnorm(residuals(barley.model))
qqline(residuals(barley.model))

Show the code

# equal variance check (residuals against fitted)
plot(barley.model$fitted.values, barley.model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

The interaction plot does not show strong evidence of interaction between the week and water factors, and the p-value for the interaction term in the ANOVA model reasonably supports this (p = 0.5). The weeks and the water factors both appear to have highly significant main effects. But the residuals are showing a departure from normality in both tails and the variance looks to slightly fan out along the fitted values. I will attempt a stabilizing Box-Cox transformation.

Show the code

library(MASS)

# Box-Cox transformation, check the two plots again
boxcoxResult <- boxcox(barley.model)

Show the code

lambda = boxcoxResult$x[which.max(boxcoxResult$y)]
barley$yboxcox = (barley$y^lambda-1)/lambda

barley.model.boxcox <- aov(yboxcox~week*water, data = barley)
print(anova(barley.model.boxcox))

Analysis of Variance Table

Response: yboxcox
           Df Sum Sq Mean Sq F value    Pr(>F)    
week        4 39.339   9.835  5.6384  0.003313 ** 
water       1 41.234  41.234 23.6404 9.443e-05 ***
week:water  4  4.135   1.034  0.5926  0.671934    
Residuals  20 34.884   1.744                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

qqnorm(residuals(barley.model.boxcox))
qqline(residuals(barley.model.boxcox))

Show the code

plot(barley.model.boxcox$fitted.values, barley.model.boxcox$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

The transformation has somewhat stabilized the residuals and the variance.

Show the code

library(multcomp)

Loading required package: mvtnorm

Loading required package: survival

Loading required package: TH.data


Attaching package: 'TH.data'

The following object is masked from 'package:MASS':

    geyser

Show the code

summary(glht(barley.model.boxcox, linfct = mcp(week = "Tukey")))

Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
default contrast might be inappropriate


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = yboxcox ~ week * water, data = barley)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)  
2 - 1 == 0   0.9052     1.0783   0.839   0.9150  
3 - 1 == 0   1.9517     1.0783   1.810   0.3955  
4 - 1 == 0   2.6096     1.0783   2.420   0.1507  
5 - 1 == 0   3.5859     1.0783   3.325   0.0249 *
3 - 2 == 0   1.0464     1.0783   0.970   0.8652  
4 - 2 == 0   1.7044     1.0783   1.581   0.5258  
5 - 2 == 0   2.6807     1.0783   2.486   0.1337  
4 - 3 == 0   0.6579     1.0783   0.610   0.9718  
5 - 3 == 0   1.6342     1.0783   1.515   0.5648  
5 - 4 == 0   0.9763     1.0783   0.905   0.8915  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Show the code

summary(glht(barley.model.boxcox, linfct = mcp(water = "Tukey")))

Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
default contrast might be inappropriate


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = yboxcox ~ week * water, data = barley)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0   -1.258      1.078  -1.167    0.257
(Adjusted p values reported -- single-step method)

The week and the water factors both have significant main effects on the number of seeds germinating. After a Tukey test, it appears that there is only a significant difference between 12 weeks and 1 week, indicating that a large duration of time is perhaps needed.

Problem 8.4

Show the code

library(ggplot2)
library(MASS)

substrates <- read.table("pr8.4", header = TRUE)
substrates$flow <- as.factor(substrates$flow)
substrates$temp <- as.factor(substrates$temp)
substrates$laser <- as.factor(substrates$laser)


# fit ANOVA model
model <- aov(y~flow*temp*laser, data = substrates)
print(anova(model))

Analysis of Variance Table

Response: y
                Df  Sum Sq Mean Sq  F value    Pr(>F)    
flow             1 0.00276 0.00276   0.4220    0.5341    
temp             1 1.37476 1.37476 210.4890 4.988e-07 ***
laser            1 0.00141 0.00141   0.2153    0.6550    
flow:temp        1 0.00141 0.00141   0.2153    0.6550    
flow:laser       1 0.00856 0.00856   1.3100    0.2855    
temp:laser       1 0.00076 0.00076   0.1158    0.7424    
flow:temp:laser  1 0.01156 0.01156   1.7694    0.2201    
Residuals        8 0.05225 0.00653                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

# normality of residuals check (npp)
qqnorm(residuals(model))
qqline(residuals(model))

Show the code

# equal variance check (residuals against fitted)
plot(model$fitted.values, model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

I first check the main assumptions to determine the validity of an ANOVA model. The residuals sufficiently appear to come from a normal distribution. There is clearly unequal variance however, as the residuals fan out with increasing fitted values.

Show the code

library(MASS)

# Box-Cox transformation, check the two plots again
boxcoxResult <- boxcox(model)

Show the code

lambda = boxcoxResult$x[which.max(boxcoxResult$y)]
substrates$yboxcox = (substrates$y^lambda-1)/lambda

model.boxcox <- aov(yboxcox~flow*temp*laser, data = substrates)
print(anova(model.boxcox))

Analysis of Variance Table

Response: yboxcox
                Df Sum Sq Mean Sq  F value    Pr(>F)    
flow             1 0.0394  0.0394   2.2813   0.16938    
temp             1 5.7532  5.7532 333.3767 8.325e-08 ***
laser            1 0.0105  0.0105   0.6108   0.45701    
flow:temp        1 0.0316  0.0316   1.8315   0.21295    
flow:laser       1 0.1433  0.1433   8.3060   0.02045 *  
temp:laser       1 0.0006  0.0006   0.0329   0.86062    
flow:temp:laser  1 0.1670  0.1670   9.6761   0.01443 *  
Residuals        8 0.1381  0.0173                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

qqnorm(residuals(model.boxcox))
qqline(residuals(model.boxcox))

Show the code

plot(model.boxcox$fitted.values, model.boxcox$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

After performing a Box-Cox transformation on the response with a lambda value of 0.18, the variance seems to have stabilized.

The ANOVA table shows that the firing profile time (8 hours vs. 13 hours) is the only factor main effect that is significant on the response. Therefore there is no evidence for the higher airflow and newer laser being better for this pacemaker substrate process.

Problem 9.3

In the pacemaker data in Problem 8.4, only the firing profile time factor had a significant main effect. The two-way (airflow x laser) interaction term as well as the three-way (airflow x profile x laser) term are also significant in the ANOVA table.

Show the code

library(DescTools)
ScheffeTest(model.boxcox)


  Posthoc multiple comparisons of means: Scheffe Test 
    95% family-wise confidence level

$flow
           diff     lwr.ci    upr.ci   pval    
2-1 -0.09920915 -0.4243485 0.2259302 0.9213    

$temp
         diff    lwr.ci     upr.ci    pval    
2-1 -1.199293 -1.524432 -0.8741537 6.7e-06 ***

$laser
           diff     lwr.ci    upr.ci   pval    
2-1 -0.05133223 -0.3764715 0.2738071 0.9978    

$`flow:temp`
              diff     lwr.ci     upr.ci    pval    
2:1-1:1 -0.0103177 -0.4701341  0.4494987 1.00000    
1:2-1:1 -1.1104016 -1.5702180 -0.6505852 0.00017 ***
2:2-1:1 -1.2985022 -1.7583186 -0.8386858 5.2e-05 ***
1:2-2:1 -1.1000839 -1.5599003 -0.6402675 0.00018 ***
2:2-2:1 -1.2881845 -1.7480009 -0.8283681 5.5e-05 ***
2:2-1:2 -0.1881006 -0.6479170  0.2717158 0.75245    

$`flow:laser`
               diff     lwr.ci    upr.ci   pval    
2:1-1:1  0.09009270 -0.3697237 0.5499091 0.9921    
1:2-1:1  0.13796961 -0.3218468 0.5977860 0.9271    
2:2-1:1 -0.15054137 -0.6103578 0.3092750 0.8930    
1:2-2:1  0.04787692 -0.4119395 0.5076933 0.9999    
2:2-2:1 -0.24063407 -0.7004505 0.2191824 0.5156    
2:2-1:2 -0.28851099 -0.7483274 0.1713054 0.3296    

$`temp:laser`
               diff     lwr.ci     upr.ci    pval    
2:1-1:1 -1.18738309 -1.6471995 -0.7275667 0.00010 ***
1:2-1:1 -0.03942229 -0.4992387  0.4203941 0.99996    
2:2-1:1 -1.25062526 -1.7104417 -0.7908088 6.9e-05 ***
1:2-2:1  1.14796080  0.6881444  1.6077772 0.00013 ***
2:2-2:1 -0.06324216 -0.5230586  0.3965743 0.99910    
2:2-1:2 -1.21120297 -1.6710194 -0.7513865 8.7e-05 ***

$`flow:temp:laser`
                    diff     lwr.ci      upr.ci    pval    
2:1:1-1:1:1 -0.025333954 -0.6756126  0.62494466 1.00000    
1:2:1-1:1:1 -1.302809743 -1.9530884 -0.65253113 0.00064 ***
2:2:1-1:1:1 -1.097290399 -1.7475690 -0.44701178 0.00212 ** 
1:1:2-1:1:1 -0.054438548 -0.7047172  0.59584007 0.99997    
2:1:2-1:1:1 -0.049739990 -0.7000186  0.60053862 0.99998    
1:2:2-1:1:1 -0.972431970 -1.6227106 -0.32215335 0.00473 ** 
2:2:2-1:1:1 -1.554152498 -2.2044311 -0.90387388 0.00018 ***
1:2:1-2:1:1 -1.277475789 -1.9277544 -0.62719717 0.00074 ***
2:2:1-2:1:1 -1.071956446 -1.7222351 -0.42167783 0.00248 ** 
1:1:2-2:1:1 -0.029104594 -0.6793832  0.62117402 1.00000    
2:1:2-2:1:1 -0.024406037 -0.6746847  0.62587258 1.00000    
1:2:2-2:1:1 -0.947098017 -1.5973766 -0.29681940 0.00561 ** 
2:2:2-2:1:1 -1.528818545 -2.1790972 -0.87853993 0.00020 ***
2:2:1-1:2:1  0.205519344 -0.4447593  0.85579796 0.90802    
1:1:2-1:2:1  1.248371195  0.5980926  1.89864981 0.00087 ***
2:1:2-1:2:1  1.253069753  0.6027911  1.90334837 0.00085 ***
1:2:2-1:2:1  0.330377773 -0.3199008  0.98065639 0.54670    
2:2:2-1:2:1 -0.251342755 -0.9016214  0.39893586 0.79597    
1:1:2-2:2:1  1.042851852  0.3925732  1.69313047 0.00299 ** 
2:1:2-2:2:1  1.047550409  0.3972718  1.69782902 0.00290 ** 
1:2:2-2:2:1  0.124858429 -0.5254202  0.77513704 0.99302    
2:2:2-2:2:1 -0.456862099 -1.1071407  0.19341652 0.22989    
2:1:2-1:1:2  0.004698557 -0.6455801  0.65497717 1.00000    
1:2:2-1:1:2 -0.917993422 -1.5682720 -0.26771481 0.00684 ** 
2:2:2-1:1:2 -1.499713951 -2.1499926 -0.84943534 0.00023 ***
1:2:2-2:1:2 -0.922691980 -1.5729706 -0.27241336 0.00662 ** 
2:2:2-2:1:2 -1.504412508 -2.1546911 -0.85413389 0.00023 ***
2:2:2-1:2:2 -0.581720528 -1.2319991  0.06855809 0.08616 .  

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the above code I run Scheffe’s test on all possible contrasts among the factor level means. There appear to possibly be some systematic patterns in the significance of differences in treatment means between different factor levels (the interactions). I don’t currently know what to say fully about these patterns.

Problem 9.5

Show the code

cells <- read.table("pr9.5", header = TRUE)
cells$type <- as.factor(cells$type)
# the amount variable is treated as continuous

# fit ANOVA model
cells.model <- aov(y~amt*type, data = cells)
print(anova(cells.model))

Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq F value    Pr(>F)    
amt        1 15204.9 15204.9  65.488 4.020e-05 ***
type       1 22968.8 22968.8  98.927 8.837e-06 ***
amt:type   1  5569.2  5569.2  23.987  0.001197 ** 
Residuals  8  1857.4   232.2                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

# normality of residuals check (npp)
qqnorm(residuals(cells.model))
qqline(residuals(cells.model))

Show the code

# equal variance check (residuals against fitted)
plot(cells.model$fitted.values, cells.model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

Show the code

ScheffeTest(cells.model)

Warning in replications(paste("~", xx), data = mf): non-factors ignored: amt

Warning in replications(paste("~", xx), data = mf): non-factors ignored: amt,
type

Warning in ScheffeTest.aov(cells.model): 'which' specified some non-factors
which will be dropped


  Posthoc multiple comparisons of means: Scheffe Test 
    95% family-wise confidence level

$type
     diff    lwr.ci    upr.ci    pval    
2-1 -87.5 -118.2259 -56.77412 7.5e-05 ***

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As seen in the ANOVA table, there are significant main effects and a significant interaction effect between the two factors. I am not sure how to run Scheffe’s test given that I had to treat the “amt” factor as continuous to not get NaNs / undefined values in the ANOVA table.