The problem with these data is that large estimated treatment means are heavily associated with smaller estimates of variance, just from looking at the table of means. This could negatively affect the validity of our basic F-tests. Something needs to be done to stabilize the variance so our inferences will remain reasonabably valid. A power transformation on the data, arrived at through the Box-Cox method, could do this. This would allow us to still do inference on difference of means, but our point estimates and confidence intervals would be less reliable. There are also more complex variance models that can give estimates in the original non-transformed scale, but for which inference is not as well-developed.
Exercise 6.2
Show the code
setwd("~/Documents/datawork/psu/STAT 565/")milk <-read.table("ex6.2", header =TRUE)milk$method <-as.factor(milk$method)milk_lm <-lm(count ~ method, data = milk)milk_anova <-aov(count ~ method, data = milk)summary(milk_anova)
Df Sum Sq Mean Sq F value Pr(>F)
method 2 1.833e+13 9.167e+12 95.48 4.27e-08 ***
Residuals 12 1.152e+12 9.601e+10
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA F-test for difference between the treatment means has a very small p-value, indicating that we would reject the null hypothesis of equality between the treatment means. However, this is the assumptions checking chapter of the textbook, so I proceed to do that:
Show the code
plot(milk_lm, which =2)
Show the code
plot(milk_lm, which =1)
From a normal probability plot and a residual plot, there appears to be a departure from normality in both tails of the data, and there is nonconstant variance. This could damage the validity of my F-test.
Show the code
library(MASS)boxcox(milk_lm) # use lambda 0
Show the code
milk$count_trans <-log(milk$count)milk_lm_trans <-lm(count_trans ~ method, data = milk)plot(milk_lm_trans, which =2)
Show the code
plot(milk_lm_trans, which =1)
Show the code
milk_anova_trans <-aov(count_trans ~ method, data = milk)summary(milk_anova_trans)
Df Sum Sq Mean Sq F value Pr(>F)
method 2 119.77 59.88 1283 1.02e-14 ***
Residuals 12 0.56 0.05
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I used the Box-Cox method to find an appropriate power transformation for the response variable. 0 turned out to be an appropriate power so I refit the model with the log of the response, and plotting showed a stabilization of variance and normality. The p-value of the subsequent F-test is very small, so I reject the null hypothesis of equality of treatment means.
Exercise 6.4
Show the code
delivery <-read.table("ex6.4", header =TRUE)delivery$company <-as.factor(delivery$company)delivery_model <-lm(breakage ~ company, data = delivery)plot(delivery_model, which =1)
Show the code
bc <-boxcox(delivery_model)
The 95% confidence interval for the transformation power is about (-0.67, 0.75).
Exercise 6.5
This plot tells me that there is some serial dependence in the data. The residuals show correlation over time; the independence assumption is violated.
This residual plot shows a right-opening megaphone shape as you move through the treatment groups. This is a tell-tale sign of unequal variance, so the equal variance assumption is violated.
This plot appears to show reasonably equal variance. It does not show the violation of one of our assumptions.
This normal quantile-quantile plot is of the data, not the residuals, so it does not address one of the main assumptions. The normality assumption refers to the error term in our model, not the data itself.
Problem 6.1
Show the code
grass <-read.table("pr6.1", header =TRUE)grass$trt <-as.factor(grass$trt) grass_model <-lm(y ~ trt, data = grass)plot(grass_model, which =2)
Show the code
plot(grass_model, which =1)
The plots are showing some non-normality of residuals and nonconstant variance. A power transformation could help.
Show the code
boxcox(grass_model) # lambda = 1.75
Show the code
grass$y_trans <- grass$y ^1.75grass_model_trans <-lm(y_trans ~ trt, data = grass)plot(grass_model_trans, which =2)
Show the code
plot(grass_model_trans, which =1) # little effect of transformation
Df Sum Sq Mean Sq F value Pr(>F)
trt 5 6398 1280 71.2 3.2e-11 ***
Residuals 18 323 18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The transformation appeared to have little effect, which makes sense because 1 was in the Box-Cox confidence interval. An analysis of variance results in a p-value of about 3.2e-11. So the treatments are not equivalent.
Under nonirrigated conditions, there are now 4 treatment groups. We are testing for a quadratic effect of nitrogen, so consulting Table C.6 in the textbook, the contrast coefficients will be (1, -1, -1, 1). To apply this to our dataset that still includes irrigated treatment groups, the final contrast vector will be (1, 0, -1, -1, 1, 0).
Show the code
library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Warning: package 'survival' was built under R version 4.3.1
Loading required package: TH.data
Attaching package: 'TH.data'
The following object is masked from 'package:MASS':
geyser