# set global options
knitr::opts_chunk$set(
  echo = TRUE, # show code 
  warning = FALSE,  
  message = FALSE
  
) 

Case Study.

Brainstorming. A researcher investigated whether brainstorming is more effective for larger groups than for smaller ones by setting up four groups of agribusiness executives, the group sizes being two, three, four, and five, respectively. He also set up four groups of agribusiness scientists, the group sizes being the same as for the agribusiness executives. The researchel gave each group the same problem: “How can Canada increase the value of its agricultural exports?” Each group was allowed 30 minutes to generate ideas. The variable of interest was the number ol different ideas proposed by the group. The results. classified by type of group (factor \(A\) ) and sizc of group (factor \(B\) ), were:

\[ \begin{array}{cccccc} & & \text { Factor B (size of group) } \\ & \begin{array}{c} \text { Factor A } \\ \text { (type of group) } \end{array} & \begin{array}{c} j=1 \\ \text { Two } \end{array} & \begin{array}{c} j=2 \\ \text { Three } \end{array} & \begin{array}{c} j=3 \\ \text { Four } \end{array} & \begin{array}{c} j=4 \\ \text { Five } \end{array} \\ \hline i=1 & \text { Agribusiness executives } & 18 & 22 & 31 & 32 \\ i=2 & \text { Agribusiness scientists } & 15 & 23 & 29 & 33 \end{array} \]

  • (1). Plot the data. Does it appear that interaction effects are present? Does it appear that factor \(A\) and factor \(B\) main effects are present? Discuss.
#import data
i <- c(1,1,1,1,2,2,2,2)
j <- c(1,2,3,4,1,2,3,4)
y <- c(18,22,31,32,15,23,29,33)


plot(1:4,y[i==1], type='b',col="pink",xaxt = 'n',main = 'Interaction plot',xlab="Size of Group from 2 to 5",ylab="Number of Ideas",ylim=c(10,45))
lines(1:4,y[i==2], type='b',col="cornflowerblue")     
axis(side=1,at = c(1,2,3,4),labels = c('2 (j=1)','3 (j=2)','4 (j=3)', '5 (j=4)'))
# Add a legend
legend(1, 42, legend=c("Executives (i=1)", "Scientists (i=2)"),
       col=c("pink","cornflowerblue"), pch=15,
       cex=0.8)

At any given value of j the differential in the response variable between i=1 & i=2 is not very big. So the main effect of A (type of group) may not really be present. However for any fixed value of i there does appear to be noticable difference in the response variable j so it seems possible that the main effect of B (size of the group) is present. The lines are not parallel, with multiple intersections. So it seems likely that interaction may be present.

  • (2). Obtain the ANOVA table.
a=2 
b=4
# least squares estimate 

## overall mean
Ybar_dot_dot=mean(y)

## factor A main effects
alpha_i_hat=rep(NA,a)
for(k in 1:a){
  alpha_i_hat[k]=mean(y[i==k])-Ybar_dot_dot
}
 

## factor B main effects
beta_j_hat=rep(NA,b)
for(k in 1:b){
  beta_j_hat[k]=mean(y[j==k])-Ybar_dot_dot
}

# calcualte SS

SSA = b*sum(alpha_i_hat^2)
SSB = a*sum(beta_j_hat^2)

#fitted values
y_hat = rep(0,a*b)

for(h in 1:length(y_hat))
{
    y_hat[h] = Ybar_dot_dot+alpha_i_hat[i[h]]+beta_j_hat[j[h]] 
}

e = y-y_hat


SSE = sum(e^2)
SSTO = sum((y-Ybar_dot_dot)^2)
AnovaTable = matrix(0,nrow = 4,ncol = 3)
AnovaTable[1,1] = SSA
AnovaTable[1,2] = a-1
AnovaTable[1,3] = SSA/(a-1)
AnovaTable[2,1] = SSB
AnovaTable[2,2] = b-1
AnovaTable[2,3] = SSB/(b-1)
AnovaTable[3,1] = SSE
AnovaTable[3,2] = (a-1)*(b-1)
AnovaTable[3,3] = SSE/((a-1)*(b-1))
AnovaTable[4,1] = SSTO
AnovaTable[4,2] = a*b-1
AnovaTable[4,3] = '-'

AnovaTable = as.data.frame(AnovaTable)
rownames(AnovaTable) = c('factor A','factor B','Error','Total')
colnames(AnovaTable) = c('SS','df','MS')

library(knitr)
library(kableExtra)
 
knitr::kable(AnovaTable, 
            "html",
             digits=2,
             align = "c",
             caption = "ANOVA Table")%>%kable_styling(full_width = F)
ANOVA Table
SS df MS
factor A 1.125 1 1.125
factor B 318.375 3 106.125
Error 6.375 3 2.125
Total 325.875 7
  • (3). Conduct separate tests for type of group and size of group main effects. In each test, use level of significance \(\alpha=.01\) and state the alternatives, decision rule, and conclusion.
  • # F test for significance Factor A 
    MSA = (SSA/(a-1))
    MSE = (SSE)/((a-1)*(b-1))
    Fstar = MSA/MSE
    alpha = .01
    Fcritical = qf((1-alpha),df1=(a-1),df2=((a-1)*(b-1)))
    if (Fstar > Fcritical){
      print("Fstar > Fcritical. REJECT NULL HYPOTHESIS")
    } else {
      print("Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS")
    }
    ## [1] "Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS"
    # F test for significance of Factor B
    MSB = (SSB/(b-1))
    Fstar = MSB/MSE
    alpha = .01
    Fcritical = qf((1-alpha),df1=(b-1),df2=((a-1)*(b-1)))
    if (Fstar > Fcritical){
      print("Fstar > Fcritical. REJECT NULL HYPOTHESIS")
    } else {
      print("Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS")
    }
    ## [1] "Fstar > Fcritical. REJECT NULL HYPOTHESIS"

    To test for whether Factor A is significant use an F test at alpha = .01. Null hypothesis is that the main effect of factor A at all factor levels is equal to 0. Alternative hypothesis is that the main effect of factor A at all factor levels is not equal to 0. The test statistic is equal to MSA/MSE. If the test statistic is greater than the critical value of F at alpha = .01 with df1=a-1, df2=(a-1)*(b-1), then we reject the null hypothesis. In this case the test statistic was NOT greater than the critical value and subsequently we fail to reject the null hypothesis that the main effect of factor A at all factor levels is equal to zero.

    To test for whether Factor b is significant use an F test at alpha = .01. Null hypothesis is that the main effect of factor b at all factor levels is equal to 0. Alternative hypothesis is that the main effect of factor b at all factor levels is not equal to 0. The test statistic is equal to MSb/MSE. If the test statistic is greater than the critical value of F at alpha = .01 with df1=b-1, df2=(a-1)*(b-1), then we reject the null hypothesis. In this case the test statistic WAS greater than the critical value and subsequently we reject the null hypothesis that the main effect of factor B at all factor levels is equal to zero.

    • (4). Make all pairwise comparisons among type of group and size of group; use the Bonferroni procedure with a 90 percent family confidence coefficient. State your findings.
    g = 7
    alpha=.10
    B = qt(1-(.1/14),df=3)
    
    
    #Factor A pairwise comparisons
    Dhat_12 = mean(y[i==1])-mean(y[i==2])
    CI12 <- c(Dhat_12-B*sqrt((MSE*2)/4),Dhat_12+B*sqrt((MSE*2)/4))
    CI12
    ## [1] -4.545773  6.045773
    #Factor B pairwise comparisons
    Dhat_12 = mean(y[j==1])-mean(y[j==2])
    CI12 <- c(Dhat_12-B*sqrt(MSE*(2/a)), Dhat_12+B*sqrt(MSE*(2/a)))
    CI12
    ## [1] -13.489354   1.489354
    Dhat_13 = mean(y[j==1])-mean(y[j==3])
    CI13 <- c(Dhat_13-B*sqrt(MSE*(2/a)), Dhat_13+B*sqrt(MSE*(2/a)))
    CI13
    ## [1] -20.989354  -6.010646
    Dhat_14 = mean(y[j==1])-mean(y[j==4])
    CI14 <- c(Dhat_14-B*sqrt(MSE*(2/a)), Dhat_14+B*sqrt(MSE*(2/a)))
    CI14
    ## [1] -23.489354  -8.510646
    Dhat_23 = mean(y[j==2])-mean(y[j==3])
    
    CI23 = c(Dhat_23-B*sqrt(MSE*(2/a)), Dhat_23+B*sqrt(MSE*(2/a)))
    CI23
    ## [1] -14.98935407  -0.01064593
    Dhat_24 = mean(y[j==2])-mean(y[j==4])
    
    CI24 = c(Dhat_24-B*sqrt(MSE*(2/a)),Dhat_24+B*sqrt(MSE*(2/a)))
    CI24
    ## [1] -17.489354  -2.510646
    Dhat_34 = mean(y[j==3])-mean(y[j==4])
    CI34 = c(Dhat_34-B*sqrt(MSE*(2/a)), Dhat_34+B*sqrt(MSE*(2/a)))

    The only pairwise difference of Factor A was not significant the average of i=1 minus i=2, .75 (-4.545773, 6.045773)

    The following pairwise differences of Factor B were not significant the difference between j=1 and j=2, -6 (-13.489354,1.489354), the difference between j=3 and j=4, -2.5 (-9.989354, 4.989354). The following pairwise comparisons of Factor B were significant, the difference between j=1 and j=3, -13.5 (-20.989354, -6.010646) the difference between j=1 and j=4; -16 (-23.489354, -8.510646). The difference between j=2 and j=3, -7.5 (-14.98935407, -0.01064593). And lastly difference between j=2 and j=4, -10, (-17.489354, -2.510646).

    • (5). Obtain confidence intervals for \(D_{1}=\mu_{.2}-\mu_{.1}, D_{2}=\mu_{.3}-\mu_{.2}\), and \(D_{3}=\mu_{.4}-\mu_{.3}\); use the Bonferroni procedure with a 95 percent family confidence coefficient. State your findings.
    g = 3
    alpha=.05
    B = qt(1-(.05/6),df=3)
    Dhat_21 = mean(y[j==2])-mean(y[j==1])
    Dhat_21
    ## [1] 6
    CI1 <- c(Dhat_21-B*sqrt(MSE*(2/a)), Dhat_21+B*sqrt(MSE*(2/a)))
    CI1
    ## [1] -1.079734 13.079734
    Dhat_32 = mean(y[j==3])-mean(y[j==2])
    Dhat_32
    ## [1] 7.5
    CI2 <- c(Dhat_32-B*sqrt(MSE*(2/a)), Dhat_32+B*sqrt(MSE*(2/a)))
    CI2
    ## [1]  0.4202663 14.5797337
    Dhat_43 = mean(y[j==4])-mean(y[j==3])
    Dhat_43
    ## [1] 2.5
    CI3 <- c(Dhat_43-B*sqrt(MSE*(2/a)), Dhat_43+B*sqrt(MSE*(2/a)))
    CI3
    ## [1] -4.579734  9.579734

    Using alpha = .05 and a g=3 only the pairwise comparison of j=3 minus j=2 was significant (0.4202663, 14.5797337) with Dhat = 7.5. The other comparisons of j=2 minus j=1, 6 (-1.079734, 13.079734); and j=4 minus j=3, 2.5 (-4.579734, 9.579734) were both not significant. It would appear perhaps the confidence is too high.

    • (6). Is the Bonferroni procedure used in part (5) the most efficient one here? Explain.
    # Examine what the Tukey and Sheffe multiples would be
    
    # Sheffe
    S= sqrt((b-1)*qf(1-.05,df1=b-1,df2=((a-1)*(b-1))))
    
    
    #Tukey
    T = (1/sqrt(2))*qtukey(1-.05,b,((a-1)*(b-1)))

    In assessing the efficiency of the Sheffe, Tukey, and Bonferroni multipliers at alpha = .05. The Tukey multiplier has the smallest value at 4.825669, as compared to the Bonferroni multiplier used in question 5 which had a value of 4.856657. Thus Tukey is more efficenct than Bonferroni in this instance.

    • (7). It is desired to estimate \(\mu_{14} .\). Obtain a point estimate of \(\mu_{14}\), and Construct a 99 percent confidence interval for \(\mu_{14}\). Interpret your interval estimate.

    • (8). Conduct the Tukey test for additivity: use \(\alpha=.01\). State the alternatives, decision rule, and conclusion. If the additive model is not appropriate, what might you do?

    alpha=0.01
    
    D_hat= sum((alpha_i_hat[i])*(beta_j_hat[j])*y)/(sum(alpha_i_hat^2)*sum(beta_j_hat^2))
    SSAB=sum((D_hat*(alpha_i_hat[i])*(beta_j_hat[j]))^2)
    SSE=sum((y-Ybar_dot_dot-alpha_i_hat[i]-beta_j_hat[j]-D_hat*alpha_i_hat[i]*beta_j_hat[j])^2)
    
    MSAB=SSAB/1
    MSE=SSE/(a*b-a-b)
    
    Fstar = MSAB/MSE
    
    
    Fcritical = qf(1-.01,df1=1,df2=((a-1)*(b-1)))
    
    
    if (Fstar > Fcritical) {
      print("Fstar > Fcritical. REJECT NULL HYPOTHESIS")
    } else {
      print("Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS")
    }
    ## [1] "Fstar < Fcritical. FAIL TO REJECT NULL HYPOTHESIS"

    Under the Tukey test for additivity we stipulate that the null hypothesis is the interaction of main effects is zero. The alternative hypothesis is that the interaction of main effects is NOT zero. The test is an F test where F = MSAB/MSE, the decision rule is if F ≥ critical value we reject the null hypothesis. In this case the test statistic F (0.5987716) is less than the critical value (34.11622), therefore we fail to reject the null hypothesis that the interaction term is equal to zero.

    If the additive model were not appropriate you may consider stipulating a larger alpha value/decreasing your confidence. You may also consider transforming the response variable by a box-cox transformation, a log transformation, or a square root transformation. These may work, but may not; if the interaction term can not be made insignificant that may just be a limitation of your data/model.