STAT 565 HW 4

Ryan Masson | Portland State University | 3/8/24

Exercise 8.1

animal weights

Below I construct an ANOVA table by hand to model a three by three two-factor experiment testing the effect of protein source and calorie amount in animal diets on the bodyweight of animals.

Factor A is food source (beef, pork, grain). Factor B is calories (low, medium, high). MSE is 8.75. a = 3, b = 3, n = 2.

Show the code
library(knitr)
w <- matrix(c(76.0,86.8,101.8,
                       83.3,89.5,98.2,
                       83.8,83.5,86.2), 
                     nrow=3, 
                     ncol=3,
                     byrow = TRUE)

mu = mean(w)
a1 = mean(w[1,]) - mu
a2 = mean(w[2,]) - mu
a3 = mean(w[3,]) - mu
b1 = mean(w[,1]) - mu
b2 = mean(w[,2]) - mu
b3 = mean(w[,3]) - mu
ab11 = w[1,1] - mu - a1 - b1
ab12 = w[1,2] - mu - a1 - b2
ab13 = w[1,3] - mu - a1 - b3
ab21 = w[2,1] - mu - a2 - b1
ab22 = w[2,2] - mu - a2 - b2
ab23 = w[2,3] - mu - a2 - b3
ab31 = w[3,1] - mu - a3 - b1
ab32 = w[3,2] - mu - a3 - b2
ab33 = w[3,3] - mu - a3 - b3
SSa = (6*a1^2)+(6*a2^2)+(6*a3^2)
SSb = (6*b1^2)+(6*b2^2)+(6*b3^2)
SSab = 0
for (i in 1:3) {
  for(j in 1:3) {
    SSab = SSab + 2*((get(paste('ab',i,j,sep = '')))^2) 
  }
}
SSe = 8.75*9
MSa = SSa/2
MSb = SSb/2
MSab = SSab/4
MSe = SSe/9
p1 = 1 - pf(MSa/MSe, df1 = 2, df2 = 9)
p2 = 1 - pf(MSb/MSe, df1 = 2, df2 = 9)
p3 = 1 - pf(MSab/MSe, df1 = 4, df2 = 9)

anova_table <- data.frame(
  Source = c("Protein","Calories","Protein_Calories", "Error"),
  DF = c(2,2,4,9),
  SS = c(SSa,SSb,SSab,SSe),
  MS = c(MSa,MSb,MSab,MSe),
  "F" = c(MSa/MSe,MSb/MSe,MSab/MSe,''),
  p = c(p1,p2,p3,'')
)

knitr::kable(anova_table, format = "markdown")
Source DF SS MS F p
Protein 2 104.5378 52.26889 5.97358730158729 0.0223370982843687
Calories 2 629.6578 314.82889 35.9804444444445 5.09158018454636e-05
Protein_Calories 4 274.7156 68.67889 7.84901587301587 0.00523033750241986
Error 9 78.7500 8.75000

As seen in the ANOVA table, the p-value for the hypothesis test of the null that all the protein factor main effects (row effects) are 0 is 0.02, good evidence that there is a significant effect. Likewise for the calories factor main effects (column effects). The significance test for presence of at least one significant interaction effect also has good evidence, with a p-value of 0.005. It makes sense to include main effect terms for each factor and interaction terms in our model.

Exercise 8.2

free alpha amino nitrogen content of rice malt
Show the code
ricemalt <- read.table('ex8.2', header = TRUE)

r <- matrix(c(ricemalt[1,3],ricemalt[2,3],
                     ricemalt[3,3],ricemalt[4,3],
                     ricemalt[5,3],ricemalt[6,3],
                     ricemalt[7,3],ricemalt[8,3],
                     ricemalt[9,3],ricemalt[10,3],
                     ricemalt[11,3],ricemalt[12,3]), 
                     nrow=3, 
                     ncol=4,
                     byrow = TRUE)

mu = mean(r)
a1 = mean(r[1,]) - mu
a2 = mean(r[2,]) - mu
a3 = mean(r[3,]) - mu
b1 = mean(r[,1]) - mu
b2 = mean(r[,2]) - mu
b3 = mean(r[,3]) - mu
b4 = mean(r[,4]) - mu
ab11 = r[1,1] - mu - a1 - b1
ab12 = r[1,2] - mu - a1 - b2
ab13 = r[1,3] - mu - a1 - b3
ab14 = r[1,4] - mu - a1 - b4
ab21 = r[2,1] - mu - a2 - b1
ab22 = r[2,2] - mu - a2 - b2
ab23 = r[2,3] - mu - a2 - b3
ab24 = r[2,4] - mu - a2 - b4
ab31 = r[3,1] - mu - a3 - b1
ab32 = r[3,2] - mu - a3 - b2
ab33 = r[3,3] - mu - a3 - b3
ab34 = r[3,4] - mu - a3 - b4
SSa = (8*a1^2)+(8*a2^2)+(8*a3^2)
SSb = (6*b1^2)+(6*b2^2)+(6*b3^2)+(6*b4^2)
SSab = 0
for (i in 1:3) {
  for(j in 1:4) {
    SSab = SSab + 2*((get(paste('ab',i,j,sep = '')))^2) 
  }
}
SStotal = 8097
SSe = (SStotal - SSa - SSb - SSab)
MSa = SSa/2
MSb = SSb/3
MSab = SSab/6
MSe = SSe/12
p1 = 1 - pf(MSa/MSe, df1 = 2, df2 = 12)
p2 = 1 - pf(MSb/MSe, df1 = 3, df2 = 12)
p3 = 1 - pf(MSab/MSe, df1 = 6, df2 = 12)

anova_table <- data.frame(
  Source = c("Days","Temperature","Days_Temperature", "Error"),
  DF = c(2,3,6,12),
  SS = c(SSa,SSb,SSab,SSe),
  MS = c(MSa,MSb,MSab,MSe),
  "F" = c(MSa/MSe,MSb/MSe,MSab/MSe,''),
  p = c(p1,p2,p3,'')
)

knitr::kable(anova_table, format = "markdown")
Source DF SS MS F p
Days 2 5179.4633 2589.73167 120.720904335213 1.1267249799829e-08
Temperature 3 2518.0933 839.36444 39.1271559537985 1.7910765131024e-06
Days_Temperature 6 142.0167 23.66944 1.10335629564405 0.414667284797445
Error 12 257.4267 21.45222
Show the code
interaction.plot(ricemalt[,1], ricemalt[,2], ricemalt[,3])

As above, I constructed an ANOVA table “by hand” by using all the formulas that sum up the different components of the partitioned sum of squares. This time, both factor main effects are highly significant, with very small p-values. This means that at least one level of germination time (in days) and at least one level of temperature (degrees C) have an effect on the free alpha nitrogen content of rice malt.

However, the interaction term is not significant, with a p-value of 0.415. The interaction plot backs this up. I traced levels of the temperature factor over a plot of the response against the days factor, and all four lines move in parallel. There is no evidence for interaction, so for this model we only need main effect terms to describe the mean structure.

Problem 8.1

barley problem
Show the code
barley <- read.table("pr8.1", header = TRUE)

barley$week <- as.factor(barley$week)
barley$water <- as.factor(barley$water)

interaction.plot(barley$week, barley$water, barley$y)

Show the code
# fit ANOVA model
barley.model <- aov(y~week*water, data = barley)
print(anova(barley.model))
Analysis of Variance Table

Response: y
           Df  Sum Sq Mean Sq F value   Pr(>F)    
week        4 1321.13  330.28  5.5293 0.003645 ** 
water       1 1178.13 1178.13 19.7232 0.000251 ***
week:water  4  208.87   52.22  0.8742 0.496726    
Residuals  20 1194.67   59.73                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# normality of residuals check (npp)
qqnorm(residuals(barley.model))
qqline(residuals(barley.model))

Show the code
# equal variance check (residuals against fitted)
plot(barley.model$fitted.values, barley.model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

The interaction plot does not show strong evidence of interaction between the week and water factors, and the p-value for the interaction term in the ANOVA model reasonably supports this (p = 0.5). The weeks and the water factors both appear to have highly significant main effects. But the residuals are showing a departure from normality in both tails and the variance looks to slightly fan out along the fitted values. I will attempt a stabilizing Box-Cox transformation.

Show the code
library(MASS)

# Box-Cox transformation, check the two plots again
boxcoxResult <- boxcox(barley.model) 

Show the code
lambda = boxcoxResult$x[which.max(boxcoxResult$y)]
barley$yboxcox = (barley$y^lambda-1)/lambda

barley.model.boxcox <- aov(yboxcox~week*water, data = barley)
print(anova(barley.model.boxcox))
Analysis of Variance Table

Response: yboxcox
           Df Sum Sq Mean Sq F value    Pr(>F)    
week        4 39.339   9.835  5.6384  0.003313 ** 
water       1 41.234  41.234 23.6404 9.443e-05 ***
week:water  4  4.135   1.034  0.5926  0.671934    
Residuals  20 34.884   1.744                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
qqnorm(residuals(barley.model.boxcox))
qqline(residuals(barley.model.boxcox))

Show the code
plot(barley.model.boxcox$fitted.values, barley.model.boxcox$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

The transformation has somewhat stabilized the residuals and the variance.

Show the code
library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data

Attaching package: 'TH.data'
The following object is masked from 'package:MASS':

    geyser
Show the code
summary(glht(barley.model.boxcox, linfct = mcp(week = "Tukey")))
Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
default contrast might be inappropriate

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = yboxcox ~ week * water, data = barley)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)  
2 - 1 == 0   0.9052     1.0783   0.839   0.9150  
3 - 1 == 0   1.9517     1.0783   1.810   0.3955  
4 - 1 == 0   2.6096     1.0783   2.420   0.1507  
5 - 1 == 0   3.5859     1.0783   3.325   0.0249 *
3 - 2 == 0   1.0464     1.0783   0.970   0.8652  
4 - 2 == 0   1.7044     1.0783   1.581   0.5258  
5 - 2 == 0   2.6807     1.0783   2.486   0.1337  
4 - 3 == 0   0.6579     1.0783   0.610   0.9718  
5 - 3 == 0   1.6342     1.0783   1.515   0.5648  
5 - 4 == 0   0.9763     1.0783   0.905   0.8915  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
Show the code
summary(glht(barley.model.boxcox, linfct = mcp(water = "Tukey")))
Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
default contrast might be inappropriate

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = yboxcox ~ week * water, data = barley)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0   -1.258      1.078  -1.167    0.257
(Adjusted p values reported -- single-step method)

The week and the water factors both have significant main effects on the number of seeds germinating. After a Tukey test, it appears that there is only a significant difference between 12 weeks and 1 week, indicating that a large duration of time is perhaps needed.

Problem 8.4

pacemakers problem
Show the code
library(ggplot2)
library(MASS)

substrates <- read.table("pr8.4", header = TRUE)
substrates$flow <- as.factor(substrates$flow)
substrates$temp <- as.factor(substrates$temp)
substrates$laser <- as.factor(substrates$laser)


# fit ANOVA model
model <- aov(y~flow*temp*laser, data = substrates)
print(anova(model))
Analysis of Variance Table

Response: y
                Df  Sum Sq Mean Sq  F value    Pr(>F)    
flow             1 0.00276 0.00276   0.4220    0.5341    
temp             1 1.37476 1.37476 210.4890 4.988e-07 ***
laser            1 0.00141 0.00141   0.2153    0.6550    
flow:temp        1 0.00141 0.00141   0.2153    0.6550    
flow:laser       1 0.00856 0.00856   1.3100    0.2855    
temp:laser       1 0.00076 0.00076   0.1158    0.7424    
flow:temp:laser  1 0.01156 0.01156   1.7694    0.2201    
Residuals        8 0.05225 0.00653                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# normality of residuals check (npp)
qqnorm(residuals(model))
qqline(residuals(model))

Show the code
# equal variance check (residuals against fitted)
plot(model$fitted.values, model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

I first check the main assumptions to determine the validity of an ANOVA model. The residuals sufficiently appear to come from a normal distribution. There is clearly unequal variance however, as the residuals fan out with increasing fitted values.

Show the code
library(MASS)

# Box-Cox transformation, check the two plots again
boxcoxResult <- boxcox(model) 

Show the code
lambda = boxcoxResult$x[which.max(boxcoxResult$y)]
substrates$yboxcox = (substrates$y^lambda-1)/lambda

model.boxcox <- aov(yboxcox~flow*temp*laser, data = substrates)
print(anova(model.boxcox))
Analysis of Variance Table

Response: yboxcox
                Df Sum Sq Mean Sq  F value    Pr(>F)    
flow             1 0.0394  0.0394   2.2813   0.16938    
temp             1 5.7532  5.7532 333.3767 8.325e-08 ***
laser            1 0.0105  0.0105   0.6108   0.45701    
flow:temp        1 0.0316  0.0316   1.8315   0.21295    
flow:laser       1 0.1433  0.1433   8.3060   0.02045 *  
temp:laser       1 0.0006  0.0006   0.0329   0.86062    
flow:temp:laser  1 0.1670  0.1670   9.6761   0.01443 *  
Residuals        8 0.1381  0.0173                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
qqnorm(residuals(model.boxcox))
qqline(residuals(model.boxcox))

Show the code
plot(model.boxcox$fitted.values, model.boxcox$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

After performing a Box-Cox transformation on the response with a lambda value of 0.18, the variance seems to have stabilized.

The ANOVA table shows that the firing profile time (8 hours vs. 13 hours) is the only factor main effect that is significant on the response. Therefore there is no evidence for the higher airflow and newer laser being better for this pacemaker substrate process.

Problem 9.3

In the pacemaker data in Problem 8.4, only the firing profile time factor had a significant main effect. The two-way (airflow x laser) interaction term as well as the three-way (airflow x profile x laser) term are also significant in the ANOVA table.

Show the code
library(DescTools)
ScheffeTest(model.boxcox)

  Posthoc multiple comparisons of means: Scheffe Test 
    95% family-wise confidence level

$flow
           diff     lwr.ci    upr.ci   pval    
2-1 -0.09920915 -0.4243485 0.2259302 0.9213    

$temp
         diff    lwr.ci     upr.ci    pval    
2-1 -1.199293 -1.524432 -0.8741537 6.7e-06 ***

$laser
           diff     lwr.ci    upr.ci   pval    
2-1 -0.05133223 -0.3764715 0.2738071 0.9978    

$`flow:temp`
              diff     lwr.ci     upr.ci    pval    
2:1-1:1 -0.0103177 -0.4701341  0.4494987 1.00000    
1:2-1:1 -1.1104016 -1.5702180 -0.6505852 0.00017 ***
2:2-1:1 -1.2985022 -1.7583186 -0.8386858 5.2e-05 ***
1:2-2:1 -1.1000839 -1.5599003 -0.6402675 0.00018 ***
2:2-2:1 -1.2881845 -1.7480009 -0.8283681 5.5e-05 ***
2:2-1:2 -0.1881006 -0.6479170  0.2717158 0.75245    

$`flow:laser`
               diff     lwr.ci    upr.ci   pval    
2:1-1:1  0.09009270 -0.3697237 0.5499091 0.9921    
1:2-1:1  0.13796961 -0.3218468 0.5977860 0.9271    
2:2-1:1 -0.15054137 -0.6103578 0.3092750 0.8930    
1:2-2:1  0.04787692 -0.4119395 0.5076933 0.9999    
2:2-2:1 -0.24063407 -0.7004505 0.2191824 0.5156    
2:2-1:2 -0.28851099 -0.7483274 0.1713054 0.3296    

$`temp:laser`
               diff     lwr.ci     upr.ci    pval    
2:1-1:1 -1.18738309 -1.6471995 -0.7275667 0.00010 ***
1:2-1:1 -0.03942229 -0.4992387  0.4203941 0.99996    
2:2-1:1 -1.25062526 -1.7104417 -0.7908088 6.9e-05 ***
1:2-2:1  1.14796080  0.6881444  1.6077772 0.00013 ***
2:2-2:1 -0.06324216 -0.5230586  0.3965743 0.99910    
2:2-1:2 -1.21120297 -1.6710194 -0.7513865 8.7e-05 ***

$`flow:temp:laser`
                    diff     lwr.ci      upr.ci    pval    
2:1:1-1:1:1 -0.025333954 -0.6756126  0.62494466 1.00000    
1:2:1-1:1:1 -1.302809743 -1.9530884 -0.65253113 0.00064 ***
2:2:1-1:1:1 -1.097290399 -1.7475690 -0.44701178 0.00212 ** 
1:1:2-1:1:1 -0.054438548 -0.7047172  0.59584007 0.99997    
2:1:2-1:1:1 -0.049739990 -0.7000186  0.60053862 0.99998    
1:2:2-1:1:1 -0.972431970 -1.6227106 -0.32215335 0.00473 ** 
2:2:2-1:1:1 -1.554152498 -2.2044311 -0.90387388 0.00018 ***
1:2:1-2:1:1 -1.277475789 -1.9277544 -0.62719717 0.00074 ***
2:2:1-2:1:1 -1.071956446 -1.7222351 -0.42167783 0.00248 ** 
1:1:2-2:1:1 -0.029104594 -0.6793832  0.62117402 1.00000    
2:1:2-2:1:1 -0.024406037 -0.6746847  0.62587258 1.00000    
1:2:2-2:1:1 -0.947098017 -1.5973766 -0.29681940 0.00561 ** 
2:2:2-2:1:1 -1.528818545 -2.1790972 -0.87853993 0.00020 ***
2:2:1-1:2:1  0.205519344 -0.4447593  0.85579796 0.90802    
1:1:2-1:2:1  1.248371195  0.5980926  1.89864981 0.00087 ***
2:1:2-1:2:1  1.253069753  0.6027911  1.90334837 0.00085 ***
1:2:2-1:2:1  0.330377773 -0.3199008  0.98065639 0.54670    
2:2:2-1:2:1 -0.251342755 -0.9016214  0.39893586 0.79597    
1:1:2-2:2:1  1.042851852  0.3925732  1.69313047 0.00299 ** 
2:1:2-2:2:1  1.047550409  0.3972718  1.69782902 0.00290 ** 
1:2:2-2:2:1  0.124858429 -0.5254202  0.77513704 0.99302    
2:2:2-2:2:1 -0.456862099 -1.1071407  0.19341652 0.22989    
2:1:2-1:1:2  0.004698557 -0.6455801  0.65497717 1.00000    
1:2:2-1:1:2 -0.917993422 -1.5682720 -0.26771481 0.00684 ** 
2:2:2-1:1:2 -1.499713951 -2.1499926 -0.84943534 0.00023 ***
1:2:2-2:1:2 -0.922691980 -1.5729706 -0.27241336 0.00662 ** 
2:2:2-2:1:2 -1.504412508 -2.1546911 -0.85413389 0.00023 ***
2:2:2-1:2:2 -0.581720528 -1.2319991  0.06855809 0.08616 .  

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the above code I run Scheffe’s test on all possible contrasts among the factor level means. There appear to possibly be some systematic patterns in the significance of differences in treatment means between different factor levels (the interactions). I don’t currently know what to say fully about these patterns.

Problem 9.5

Show the code
cells <- read.table("pr9.5", header = TRUE)
cells$type <- as.factor(cells$type)
# the amount variable is treated as continuous

# fit ANOVA model
cells.model <- aov(y~amt*type, data = cells)
print(anova(cells.model))
Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq F value    Pr(>F)    
amt        1 15204.9 15204.9  65.488 4.020e-05 ***
type       1 22968.8 22968.8  98.927 8.837e-06 ***
amt:type   1  5569.2  5569.2  23.987  0.001197 ** 
Residuals  8  1857.4   232.2                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show the code
# normality of residuals check (npp)
qqnorm(residuals(cells.model))
qqline(residuals(cells.model))

Show the code
# equal variance check (residuals against fitted)
plot(cells.model$fitted.values, cells.model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

Show the code
ScheffeTest(cells.model)
Warning in replications(paste("~", xx), data = mf): non-factors ignored: amt
Warning in replications(paste("~", xx), data = mf): non-factors ignored: amt,
type
Warning in ScheffeTest.aov(cells.model): 'which' specified some non-factors
which will be dropped

  Posthoc multiple comparisons of means: Scheffe Test 
    95% family-wise confidence level

$type
     diff    lwr.ci    upr.ci    pval    
2-1 -87.5 -118.2259 -56.77412 7.5e-05 ***

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As seen in the ANOVA table, there are significant main effects and a significant interaction effect between the two factors. I am not sure how to run Scheffe’s test given that I had to treat the “amt” factor as continuous to not get NaNs / undefined values in the ANOVA table.