# set global options
knitr::opts_chunk$set(
  echo = TRUE, # show code 
  warning = FALSE,  
  message = FALSE
  
) 

Case Study.

Winding speeds. In a completely randomized design to study the effect of the speed of winding thread (1: slow; 2: normal; 3: fast; 4: maximum) onto 75-yard spools, 16 runs of 10,000 spools each were made at each of the four winding speeds. The response variable is the number of thread breaks during the production run. Since the responses are counts, the researcher was concerned about the normality and equal variances assumptions of ANOVA model.

The dataset used in this question is “windingspeed.txt”.

  • Column 1: counts
  • Column 2: speed of winding thread (1: slow; 2: normal; 3: fast; 4: maximum)
  • Column 3: runs 1-16
setwd("~/Master's Program/Classes/Winter 2022/STA 106")
#Download data
# response = counts
# speed = factor
# run is the attempt, 16 attempts were made for each factor
windingspeed <- read.table(file="windingspeed.txt", col.name = c("counts", "speed of thread", "runs"))
View(windingspeed)
#Establish factor level means 
Y_bar_overall <- mean(windingspeed$counts)
Y_1 <- mean(windingspeed[windingspeed$speed.of.thread == 1,1])
n1 <- length(windingspeed[windingspeed$speed.of.thread == 1,1])
Y_2 <- mean(windingspeed[windingspeed$speed.of.thread == 2,1])
n2 <- length(windingspeed[windingspeed$speed.of.thread == 2,1])
Y_3 <- mean(windingspeed[windingspeed$speed.of.thread == 3,1])
n3 <- length(windingspeed[windingspeed$speed.of.thread == 3,1])
Y_4 <- mean(windingspeed[windingspeed$speed.of.thread == 4,1])
n4 <- length(windingspeed[windingspeed$speed.of.thread == 4,1])
#Establish Response Vector
level = c(rep(1,n1),rep(2,n2),rep(3,n3))
response <- windingspeed$counts #response is the observed response 
#variable outcomes for each observation, so we extract the observed values
#for the response variable

#Establish a vector of fitted values Y_hat
Y_hat=c(rep(Y_1,n1),rep(Y_2,n2),rep(Y_3,n3),rep(Y_4,n4))

#Establish vector of residuals
residuals <- Y_hat - response
# residuals
# sum(residuals)

#Establish SSE
#SSE = sum((Treatment mean - observed value)^2) for each treatment
SSE = sum((response-Y_hat)^2)

#Establish SSTR
#SSTR = sum((Treatment mean - overall mean)^2) for each treatment 
SSTR = sum((Y_hat - Y_bar_overall)^2)

#Establish SSTO
#SSTO = sum((overall mean - observed value)^2)
SSTO = sum((response-Y_bar_overall)^2)
# SSTO = SSE + SSTR check

#Establish ANOVA Table
n_T <- length(response)
r=4 #there are four unique speed (treatments)
AnovaTable = matrix(c(1,2,3,4,5,6,7,8,9),nrow = 3,ncol = 3)## initial setting


AnovaTable[1,1] = SSTR
AnovaTable[1,2] = r-1
AnovaTable[1,3] = AnovaTable[1,1]/AnovaTable[1,2]
 
AnovaTable[2,1] = SSE
AnovaTable[2,2] = n_T-r
AnovaTable[2,3] = AnovaTable[2,1]/AnovaTable[2,2]

SSTO = SSE + SSTR

AnovaTable[3,1] = SSTO
AnovaTable[3,2] = n_T-1
AnovaTable[3,3] = NA
 
rownames(AnovaTable) = c('Between treatments','Error (within treatments)','Total')
colnames(AnovaTable) = c('Squared Sum','Df','Mean Sum')

AnovaTable
##                           Squared Sum Df  Mean Sum
## Between treatments          1588.0469  3 529.34896
## Error (within treatments)    669.0625 60  11.15104
## Total                       2257.1094 63        NA
library(knitr)
library(kableExtra)

knitr::kable(AnovaTable, 
             "html",
             digits=6,
             caption = "ANOVA Table") %>% kable_styling() 
ANOVA Table
Squared Sum Df Mean Sum
Between treatments 1588.0469 3 529.34896
Error (within treatments) 669.0625 60 11.15104
Total 2257.1094 63 NA
  1. State the ANOVA model assumptions, and your strategy to verify whether or not the data meet those assumptions.

ANOVA has three major assumptions. (1) The first that the error terms are independent, this is typically verified in the design of the experiment. However, we could also verify by looking at a graphic representation of the residuals across factor levels and see if clustering is observed, if clustering is observed the error terms are not independent. (2) The second is whether the variance of error terms is equal across factor levels, this is the homogenity of error term variance. This is assessed by plotting the error term variance by boxplot. More formally this can also be assessed by the Hartley test if sample sizes across factor levels are equal and the error terms are normally distributed, or the Brown-Forscythe test which is non-parametric and doesn’t require equal sample sizes across factor levels. (3) The third is whether the error terms are normally distributed, assuming homogenity of error term variance checked in (2) this can be assessed via a QQ plot, since the number of observations for each factor level are small we need to pool all factor level observations together and assess normality via a QQ plot. If our error term variance is not equal we need to use studentized residuals to assess variance.

  1. To study whether or not the error variances are equal in a graphical way. What are your findings?
plot(windingspeed$speed.of.thread+rnorm(sd=0.05,n=n_T),residuals,xlab = 'speed',ylab='residuals',xaxt = 'n',
     main = 'Count of Breaks Residuals over Different Speeds of Thread',col = c("chartreuse4","deeppink4","darkgoldenrod1","blueviolet")[windingspeed$speed.of.thread] )
axis(side=1,at = c(1,2,3,4))
abline(h=0,col='cornflowerblue')

Looking at the plot it seems like error variances are not homogenous. It appears that the error variance grows larger with increasing speed.

  1. To study whether or not the error variances are equal using formal tests. State the alternatives. decision rule, and conclusion. What are your findings? Are your results consistent with the diagnosis in part (2)?
#Hartley Test
#Null all factor level error variances are equal
#Alternative hypothesis all factor level error variances are not equal
library(SuppDists)
si2=rep(NA,4)
for (i in 1:4){
  si2[i]=var(windingspeed$counts[windingspeed$speed.of.thread == i])
}

Hstat = (max(si2)/min(si2))

pmaxFratio(Hstat, df=16-1, k=4, lower.tail=FALSE)
## [1] 0.0003999325
# validate with automated function
library(PMCMRplus)

hartleyTest(counts~speed.of.thread, data=windingspeed)
## 
##  Hartley's maximum F-ratio test of homogeneity of variances
## 
## data:  counts by speed.of.thread
## F Max = 24.192, df = 15, k = 4, p-value = 0.0003999
#Brown-Forscythe Test
#first compute the median for each factor level
Ymedian = rep(0,4)
for(i in 1:4)
{
  Ymedian[i] = median(windingspeed$counts[windingspeed$speed.of.thread==i])
}


d = abs(windingspeed$counts-Ymedian[windingspeed$speed.of.thread])


# ANOVA 
#establish factor level means for d for each factor
mu_hat = rep(0,4)
for(i in 1:4)
{
  mu_hat[i] = mean(d[windingspeed$speed.of.thread==i])
}

# fitted value for d
d_hat=mu_hat[windingspeed$speed.of.thread]
  
MSTR = sum((d_hat-mean(d))^2)/(r-1)
MSE = sum((d-d_hat)^2)/(n_T-r)
F_BF = MSTR/MSE
## test statistic value

pf(F_BF, r-1,n_T-r, lower.tail=FALSE)
## [1] 3.040082e-05

Hartley test null hypothesis is that all factor level sample variances are equal, the alternative hypothesis is that not all factor level sample variances are equal. The decision rule is, if the p-value for our test statistic is less than or equal to the stipulated value of alpha, we reject the null hypothesis. By the Hartley test we get a p-value of .0003999. This is less than our alpha value of .05 therefore we reject the null hypothesis not all factor level sample variances are equal. The Brown-Forsythe test null hypothesis is that all factor level sample variances are equal, the alternative hypothesis is that not all factor level sample variances are equal. The Brown-Forsythe test gives us a p-value of 3.040082e-05. By both tests we reject the null hypothesis (though given we suspect deviations from normality by the qqplot we should prefer the Brown-Forscythe test). These findings of lack of homogenity of error term variance by formal tests are consistent with our findings in question 2.

  1. To study whether or not the normality assumption is met.
residuals <- as.data.frame(residuals <- Y_hat - response)


e <- residuals
si <- as.data.frame(si <- sqrt(si2))

Y_hat=c(rep(Y_1,n1),rep(Y_2,n2),rep(Y_3,n3),rep(Y_4,n4))


#Establish vector of residuals

r1 <- e[c(1:16),]/si[1,]
r2 <- e[c(17:32),]/si[2,]
r3 <- e[c(33:48),]/si[3,]
r4 <- e[c(49:64),]/si[4,]
studentized_residuals <- c(r1,r2,r3,r4)
# Assess normality of studentized residuals

qqnorm(studentized_residuals)
qqline(studentized_residuals)

hist(studentized_residuals)

# Assess normality by QQplot using non-studentized residuals

non_studentized_residuals <- as.numeric(Y_hat - response)
qqnorm(non_studentized_residuals)
qqline(non_studentized_residuals)

# Add histogram to view data spread
hist(non_studentized_residuals)

Since factor level sample sizes are small (16) we pool them together to assess normality, but based on the results from question 3 we do not have homogenity of variance so we need to used studentized residuals to assess normality. Based on the QQplot and the corresponding histogram our data does not look to be normally distributed, they appear to be a bit over-dispersed (they appear even more over-dispersed when we use the non-studentized residuals)

  1. Summarize your findings in (1)-(4) about whether the ANOVA assumptions are met.

In considering the independence of our error terms, it appears that the error terms are likely independent as we do not observe clustering and the experimental design seems like the observations should have been independent. In considering the homogeneity of variance assumption by graphical inspection factor level variances do not seem equal; further by both the Hartley test and the Brown-Forsythe test the factor level variances are not equal. Therefore our assumption of homogeneity of variance is not met. In considering the assumption of normality of error terms by making a qqplot of studentized residuals it doesn’t appear that our data is normally distributed and, based on a histogram of studentized_residuals it appears our data may be over-dispersed.

Regardless of your conclusions in (5), practice with transformations.

  1. For each winding speed, calculate \(\bar{Y}_{i}\), and \(s_{i} .\) Examine the relation and determine the transformation that is most appropriate here. What do you conclude?
r=4
Yi_bar= rep(0,r)
for (i in 1:r){
Yi_bar[i] <- mean(windingspeed[windingspeed$speed.of.thread == i,"counts"])
}

par(mfrow=c(1,3))
plot(1:4,si2/Yi_bar,main ="Square-root transformation",ylim = c(0,3))
plot(1:4,sqrt(si2)/Yi_bar,main ="Log transformation",ylim = c(0,3))
plot(1:4,sqrt(si2)/Yi_bar^2,main ="Reciprocal transformation",ylim = c(0,3))

Based on the graphs it appears as if either the log transformation or the reciprocal transformation could be used.

  1. Use the Box-Cox procedure to find an appropriate power transformation of \(Y\). first adding the constant \(1\) to each \(Y\) observation. Does \(\lambda=0 .\) a logarithmic transformation appear to be reasonable based on the Box-Cox procedure?
library(MASS)
boxcox(windingspeed$counts+1 ~ windingspeed$speed.of.thread)

Yes, the optimal value of lambda is approximately 0. When the value of lambda is 0, then the transformation of Y is a log transformation with base e or a natural log.

  1. The researcher decided to apply the logarithmic transformation \(Y^{\prime}=\log _{10} Y\) and examine its effectiveness. Obtain the transformed response data. fit ANOVA model. and obtain the residuals.
#first transform the data
windingspeed$transformedcounts=log10(windingspeed$counts)

#establish ANOVA parameters with the new transformed response variable
#establish factor level means
Yi_barnew= rep(0,r)
for (i in 1:r){
  Yi_barnew[i] = mean(windingspeed[windingspeed$speed.of.thread == i, "transformedcounts"])
}


# fitted value
Y_hatnew=c(rep(Yi_barnew[1],n1),rep(Yi_barnew[2],n2),rep(Yi_barnew[3],n3),rep(Yi_barnew[4],n4))

# residuals
print("New Transformed Residuals!")
## [1] "New Transformed Residuals!"
e_new = windingspeed$transformedcounts-Y_hatnew
e_new
##  [1]  0.070615229 -0.054323508 -0.230414767 -0.054323508  0.070615229
##  [6]  0.070615229 -0.054323508  0.246706488  0.167525242  0.070615229
## [11] -0.230414767  0.070615229  0.070615229 -0.230414767 -0.054323508
## [16]  0.070615229  0.105117704  0.038170915 -0.137920344  0.038170915
## [21]  0.105117704 -0.438950340  0.214262174 -0.041010331 -0.041010331
## [26]  0.214262174 -0.262859081  0.163109651  0.038170915 -0.137920344
## [31]  0.105117704  0.038170915  0.071146226 -0.229883769  0.138093016
## [36]  0.071146226 -0.008035020 -0.053792510  0.071146226  0.222413902
## [41] -0.162936980 -0.229883769  0.071146226  0.033357665 -0.229883769
## [46]  0.105908333 -0.008035020  0.138093016  0.036198678 -0.018158985
## [51] -0.349152204  0.106779752 -0.080306891 -0.152857559  0.009869739
## [56]  0.203689765 -0.152857559  0.185960998  0.061022261  0.127969051
## [61]  0.009869739  0.084503357 -0.240007734  0.167477592

New residuals shown above, ANOVA assumptions using these residuals assessed in next question.

  1. Test whether the ANOVA model assumptions are met in the transformed data. What are your findings about the effectiveness of the transformation?
#plot residuals to examine graphically if homogenity of variance is met 
plot(windingspeed$speed.of.thread+rnorm(sd=0.05,n=n_T),e_new,xlab = 'Speed of Thread',ylab='residuals of log10 transformed counts',xaxt = 'n',
     main = 'Winding speed log10 transformed counts residual plot',col = c("chartreuse4","deeppink4","darkgoldenrod1","blueviolet")[windingspeed$speed.of.thread] )
axis(side=1,at = c(1,2,3,4))
abline(h=0,col='cornflowerblue')

#Do Hartley and Brown-Forsythe tests to examine if homogenity of variance is met
#rigorously 

#Hartley first
#Ho all factor level variances are equal
#Ha not all factor level variances are equal

Si2_new <- rep(NA,4)
for (i in 1:4) {
  Si2_new[i] = var(windingspeed[windingspeed$speed.of.thread == i, "transformedcounts"])
}

Hstat_new <- (max(Si2_new)/min(Si2_new))
print("Hartley Test P-Value!")
## [1] "Hartley Test P-Value!"
pmaxFratio(Hstat_new, df=16-1, k=4, lower.tail=FALSE)
## [1] 0.830854
print("By Hartley Fail To Reject H0, factor level variances are indeed equal!")
## [1] "By Hartley Fail To Reject H0, factor level variances are indeed equal!"
#By Hartley we do not reject H0

#Brown-Forsythe next
#Ho all factor level variances are equal
#Ha not all factor level variances are equal

median = rep(0,4)
for(i in 1:4)
{
  Ymedian[i] = median(windingspeed$transformedcounts[windingspeed$speed.of.thread==i])
}


d = abs(windingspeed$transformedcounts-Ymedian[windingspeed$speed.of.thread])

# ANOVA 
#establish factor level means for d for each factor
mu_hat = rep(0,4)
for(i in 1:4)
{
  mu_hat[i] = mean(d[windingspeed$speed.of.thread==i])
}

# fitted value for d
d_hat=mu_hat[windingspeed$speed.of.thread]
  
MSTR = sum((d_hat-mean(d))^2)/(r-1)
MSE = sum((d-d_hat)^2)/(n_T-r)
F_BF = MSTR/MSE
## test statistic value
F_BF
## [1] 0.097864
print("Brown-Forsythe Test P-Value!")
## [1] "Brown-Forsythe Test P-Value!"
pf(F_BF, r-1,n_T-r, lower.tail=FALSE)
## [1] 0.9609102
print("By Brown-Forsythe test we fail to reject H0 as well!")
## [1] "By Brown-Forsythe test we fail to reject H0 as well!"
#Normality of Error Terms Test
#Since we met the assumption of error term variance homogenity we can simply 
#pool are factor level residuals together and use that to assess the normality
#of error terms (we don't need to use studentized residuals)
e_new = windingspeed$transformedcounts-Y_hatnew 
hist(e_new)

qqnorm(e_new)
qqline(e_new)

The three ANOVA assumptions are independence of error terms, homogeneity of error term variance, and normality of error terms. With consideration of our transformed response variable for this data set we can assume our first assumption is still met as we do not observe clustering of error terms and the experimental set up presumably had independent observations for the response variable. We can see that by the Brown-Forsythe test and the Hartley Test the resulting p-value for these tests for homogenity of factor level sample variances is (p>.05) as a result for both tests we fail to reject the null hypothesis that all factor level sample variances for the transformed response variable are equal. Finally in performing a QQplot and QQline we see a little deviation from a normal distribution of the error terms, based on the histogram of these residuals it appears the residuals are a bit left skewed, however, since our total observations are >30 we can say the departure from normality is not too serious considering the central limit theorem.