# set global options
knitr::opts_chunk$set(
echo = TRUE, # show code
warning = FALSE,
message = FALSE
)
Brainstorming. A researcher investigated whether brainstorming is more effective for larger groups than for smaller ones by setting up four groups of agribusiness executives, the group sizes being two, three, four, and five, respectively. He also set up four groups of agribusiness scientists, the group sizes being the same as for the agribusiness executives. The researchel gave each group the same problem: “How can Canada increase the value of its agricultural exports?” Each group was allowed 30 minutes to generate ideas. The variable of interest was the number ol different ideas proposed by the group. The results. classified by type of group (factor \(A\) ) and sizc of group (factor \(B\) ), were:
\[ \begin{array}{cccccc} & & \text { Factor B (size of group) } \\ & \begin{array}{c} \text { Factor A } \\ \text { (type of group) } \end{array} & \begin{array}{c} j=1 \\ \text { Two } \end{array} & \begin{array}{c} j=2 \\ \text { Three } \end{array} & \begin{array}{c} j=3 \\ \text { Four } \end{array} & \begin{array}{c} j=4 \\ \text { Five } \end{array} \\ \hline i=1 & \text { Agribusiness executives } & 18 & 22 & 31 & 32 \\ i=2 & \text { Agribusiness scientists } & 15 & 23 & 29 & 33 \end{array} \]
- (1). Plot the data. Does it appear that interaction effects are present? Does it appear that factor \(A\) and factor \(B\) main effects are present? Discuss.
#import data
i <- c(1,1,1,1,2,2,2,2)
j <- c(1,2,3,4,1,2,3,4)
y <- c(18,22,31,32,15,23,29,33)
plot(1:4,y[i==1], type='b',col="pink",xaxt = 'n',main = 'Interaction plot',xlab="Size of Group from 2 to 5",ylab="Number of Ideas",ylim=c(10,45))
lines(1:4,y[i==2], type='b',col="cornflowerblue")
axis(side=1,at = c(1,2,3,4),labels = c('2 (j=1)','3 (j=2)','4 (j=3)', '5 (j=4)'))
# Add a legend
legend(1, 42, legend=c("Executives (i=1)", "Scientists (i=2)"),
col=c("pink","cornflowerblue"), pch=15,
cex=0.8)
At any given value of j the differential in the response variable between i=1 & i=2 is not very big. So the main effect of A (type of group) may not really be present. However for any fixed value of i there does appear to be noticable difference in the response variable j so it seems possible that the main effect of B (size of the group) is present. The lines are not parallel, with multiple intersections. So it seems likely that interaction may be present.
a=2
b=4
# least squares estimate
## overall mean
Ybar_dot_dot=mean(y)
## factor A main effects
alpha_i_hat=rep(NA,a)
for(k in 1:a){
alpha_i_hat[k]=mean(y[i==k])-Ybar_dot_dot
}
## factor B main effects
beta_j_hat=rep(NA,b)
for(k in 1:b){
beta_j_hat[k]=mean(y[j==k])-Ybar_dot_dot
}
# calcualte SS
SSA = b*sum(alpha_i_hat^2)
SSB = a*sum(beta_j_hat^2)
#fitted values
y_hat = rep(0,a*b)
for(h in 1:length(y_hat))
{
y_hat[h] = Ybar_dot_dot+alpha_i_hat[i[h]]+beta_j_hat[j[h]]
}
e = y-y_hat
SSE = sum(e^2)
SSTO = sum((y-Ybar_dot_dot)^2)
AnovaTable = matrix(0,nrow = 4,ncol = 3)
AnovaTable[1,1] = SSA
AnovaTable[1,2] = a-1
AnovaTable[1,3] = SSA/(a-1)
AnovaTable[2,1] = SSB
AnovaTable[2,2] = b-1
AnovaTable[2,3] = SSB/(b-1)
AnovaTable[3,1] = SSE
AnovaTable[3,2] = (a-1)*(b-1)
AnovaTable[3,3] = SSE/((a-1)*(b-1))
AnovaTable[4,1] = SSTO
AnovaTable[4,2] = a*b-1
AnovaTable[4,3] = '-'
AnovaTable = as.data.frame(AnovaTable)
rownames(AnovaTable) = c('factor A','factor B','Error','Total')
colnames(AnovaTable) = c('SS','df','MS')
library(knitr)
library(kableExtra)
knitr::kable(AnovaTable,
"html",
digits=2,
align = "c",
caption = "ANOVA Table")%>%kable_styling(full_width = F)
| SS | df | MS | |
|---|---|---|---|
| factor A | 1.125 | 1 | 1.125 |
| factor B | 318.375 | 3 | 106.125 |
| Error | 6.375 | 3 | 2.125 |
| Total | 325.875 | 7 |
|
# F test for significance Factor A
MSA = (SSA/(a-1))
MSE = (SSE)/((a-1)*(b-1))
Fstar = MSA/MSE
alpha = .01
Fcritical = qf((1-alpha),df1=(a-1),df2=((a-1)*(b-1)))
if (Fstar > Fcritical){
print("Fstar > Fcritical. REJECT NULL HYPOTHESIS")
} else {
print("Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS")
}
## [1] "Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS"
# F test for significance of Factor B
MSB = (SSB/(b-1))
Fstar = MSB/MSE
alpha = .01
Fcritical = qf((1-alpha),df1=(b-1),df2=((a-1)*(b-1)))
if (Fstar > Fcritical){
print("Fstar > Fcritical. REJECT NULL HYPOTHESIS")
} else {
print("Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS")
}
## [1] "Fstar > Fcritical. REJECT NULL HYPOTHESIS"
To test for whether Factor A is significant use an F test at alpha = .01. Null hypothesis is that the main effect of factor A at all factor levels is equal to 0. Alternative hypothesis is that the main effect of factor A at all factor levels is not equal to 0. The test statistic is equal to MSA/MSE. If the test statistic is greater than the critical value of F at alpha = .01 with df1=a-1, df2=(a-1)*(b-1), then we reject the null hypothesis. In this case the test statistic was NOT greater than the critical value and subsequently we fail to reject the null hypothesis that the main effect of factor A at all factor levels is equal to zero.
To test for whether Factor b is significant use an F test at alpha = .01. Null hypothesis is that the main effect of factor b at all factor levels is equal to 0. Alternative hypothesis is that the main effect of factor b at all factor levels is not equal to 0. The test statistic is equal to MSb/MSE. If the test statistic is greater than the critical value of F at alpha = .01 with df1=b-1, df2=(a-1)*(b-1), then we reject the null hypothesis. In this case the test statistic WAS greater than the critical value and subsequently we reject the null hypothesis that the main effect of factor B at all factor levels is equal to zero.
g = 7
alpha=.10
B = qt(1-(.1/14),df=3)
#Factor A pairwise comparisons
Dhat_12 = mean(y[i==1])-mean(y[i==2])
CI12 <- c(Dhat_12-B*sqrt((MSE*2)/4),Dhat_12+B*sqrt((MSE*2)/4))
CI12
## [1] -4.545773 6.045773
#Factor B pairwise comparisons
Dhat_12 = mean(y[j==1])-mean(y[j==2])
CI12 <- c(Dhat_12-B*sqrt(MSE*(2/a)), Dhat_12+B*sqrt(MSE*(2/a)))
CI12
## [1] -13.489354 1.489354
Dhat_13 = mean(y[j==1])-mean(y[j==3])
CI13 <- c(Dhat_13-B*sqrt(MSE*(2/a)), Dhat_13+B*sqrt(MSE*(2/a)))
CI13
## [1] -20.989354 -6.010646
Dhat_14 = mean(y[j==1])-mean(y[j==4])
CI14 <- c(Dhat_14-B*sqrt(MSE*(2/a)), Dhat_14+B*sqrt(MSE*(2/a)))
CI14
## [1] -23.489354 -8.510646
Dhat_23 = mean(y[j==2])-mean(y[j==3])
CI23 = c(Dhat_23-B*sqrt(MSE*(2/a)), Dhat_23+B*sqrt(MSE*(2/a)))
CI23
## [1] -14.98935407 -0.01064593
Dhat_24 = mean(y[j==2])-mean(y[j==4])
CI24 = c(Dhat_24-B*sqrt(MSE*(2/a)),Dhat_24+B*sqrt(MSE*(2/a)))
CI24
## [1] -17.489354 -2.510646
Dhat_34 = mean(y[j==3])-mean(y[j==4])
CI34 = c(Dhat_34-B*sqrt(MSE*(2/a)), Dhat_34+B*sqrt(MSE*(2/a)))
The only pairwise difference of Factor A was not significant the average of i=1 minus i=2, .75 (-4.545773, 6.045773)
The following pairwise differences of Factor B were not significant the difference between j=1 and j=2, -6 (-13.489354,1.489354), the difference between j=3 and j=4, -2.5 (-9.989354, 4.989354). The following pairwise comparisons of Factor B were significant, the difference between j=1 and j=3, -13.5 (-20.989354, -6.010646) the difference between j=1 and j=4; -16 (-23.489354, -8.510646). The difference between j=2 and j=3, -7.5 (-14.98935407, -0.01064593). And lastly difference between j=2 and j=4, -10, (-17.489354, -2.510646).
g = 3
alpha=.05
B = qt(1-(.05/6),df=3)
Dhat_21 = mean(y[j==2])-mean(y[j==1])
Dhat_21
## [1] 6
CI1 <- c(Dhat_21-B*sqrt(MSE*(2/a)), Dhat_21+B*sqrt(MSE*(2/a)))
CI1
## [1] -1.079734 13.079734
Dhat_32 = mean(y[j==3])-mean(y[j==2])
Dhat_32
## [1] 7.5
CI2 <- c(Dhat_32-B*sqrt(MSE*(2/a)), Dhat_32+B*sqrt(MSE*(2/a)))
CI2
## [1] 0.4202663 14.5797337
Dhat_43 = mean(y[j==4])-mean(y[j==3])
Dhat_43
## [1] 2.5
CI3 <- c(Dhat_43-B*sqrt(MSE*(2/a)), Dhat_43+B*sqrt(MSE*(2/a)))
CI3
## [1] -4.579734 9.579734
Using alpha = .05 and a g=3 only the pairwise comparison of j=3 minus j=2 was significant (0.4202663, 14.5797337) with Dhat = 7.5. The other comparisons of j=2 minus j=1, 6 (-1.079734, 13.079734); and j=4 minus j=3, 2.5 (-4.579734, 9.579734) were both not significant. It would appear perhaps the confidence is too high.
# Examine what the Tukey and Sheffe multiples would be
# Sheffe
S= sqrt((b-1)*qf(1-.05,df1=b-1,df2=((a-1)*(b-1))))
#Tukey
T = (1/sqrt(2))*qtukey(1-.05,b,((a-1)*(b-1)))
In assessing the efficiency of the Sheffe, Tukey, and Bonferroni multipliers at alpha = .05. The Tukey multiplier has the smallest value at 4.825669, as compared to the Bonferroni multiplier used in question 5 which had a value of 4.856657. Thus Tukey is more efficenct than Bonferroni in this instance.
(7). It is desired to estimate \(\mu_{14} .\). Obtain a point estimate of \(\mu_{14}\), and Construct a 99 percent confidence interval for \(\mu_{14}\). Interpret your interval estimate.
(8). Conduct the Tukey test for additivity: use \(\alpha=.01\). State the alternatives, decision rule, and conclusion. If the additive model is not appropriate, what might you do?
alpha=0.01
D_hat= sum((alpha_i_hat[i])*(beta_j_hat[j])*y)/(sum(alpha_i_hat^2)*sum(beta_j_hat^2))
SSAB=sum((D_hat*(alpha_i_hat[i])*(beta_j_hat[j]))^2)
SSE=sum((y-Ybar_dot_dot-alpha_i_hat[i]-beta_j_hat[j]-D_hat*alpha_i_hat[i]*beta_j_hat[j])^2)
MSAB=SSAB/1
MSE=SSE/(a*b-a-b)
Fstar = MSAB/MSE
Fcritical = qf(1-.01,df1=1,df2=((a-1)*(b-1)))
if (Fstar > Fcritical) {
print("Fstar > Fcritical. REJECT NULL HYPOTHESIS")
} else {
print("Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS")
}
## [1] "Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS"
Under the Tukey test for additivity we stipulate that the null hypothesis is the interaction of main effects is zero. The alternative hypothesis is that the interaction of main effects is NOT zero. The test is an F test where F = MSAB/MSE, the decision rule is if F ≥ critical value we reject the null hypothesis. In this case the test statistic F (0.5987716) is less than the critical value (34.11622), therefore we fail to reject the null hypothesis that the interaction term is equal to zero.
If the additive model were not appropriate you may consider stipulating a larger alpha value/decreasing your confidence. You may also consider transforming the response variable by a box-cox transformation, a log transformation, or a square root transformation. These may work, but may not; if the interaction term can not be made insignificant that may just be a limitation of your data/model.