In Chapter 2, we explored Simple Comparative Experiments, which provide the foundation for comparing two treatment groups. These comparisons are conducted using completely randomized designs, supported by statistical hypothesis testing and confidence interval estimation. The chapter introduces four main types of statistical tests: (1) the Z-test, (2) the t-test, (3) the chi-squared test, and (4) the F-test—each suited to different assumptions about the data (e.g., known vs. unknown variance, normality, and sample independence). Chapter 3 builds upon this framework by extending the comparison from two groups to more than two, introducing Analysis of Variance (ANOVA). While Chapter 2 focuses on pairwise comparisons—which can inflate the Type I error rate when repeated—Chapter 3 addresses this by using the F-distribution to partition total variability and evaluate the overall effect of a factor with multiple levels in a single, unified test.

Consider :

\[ y_{ij} \text{ for : } \\ i = \text{ treatment} \\ j = \text{ obs. idx} \\ \]

Q1)

Consider the Means-model :

\[ y_{ij} = \mu_i + \epsilon_{ij} \]

Consider that we are using a one-way-ANOVA; testing if at least one mean differs :

\[ H_o : \mu_1 = \mu_2 ... =\mu_a \\ H_a : \mu_i \ne \mu_j \]

a) Calc \(SS_{treatment}\)

Consider the partitioning of the total sum of squares :

\[ SST = SS_{within} + SS_{between} \\ SS_{total} = SS_{error} + SS_{treatment} \\ \implies \\ SS_{treatment} = SS_{total} - SS_{error} \]

So : \(\boxed{927.71}\)

b) Calc \(df_{factor}\)

Consider the following implication :

\[ SS_{total} = SS_{error} + SS_{treatment} \\ \implies \\ df_{total} = df_{error} + df_{treatment} \]

So we are looking for \(df_{factor} = df_{treatment}\)

\[ df_{total} = df_{error} + df_{treatment} \\ \implies \\ df_{treatment} = df_{total} - df_{error} \]

So : \(\boxed{4}\)

c) Calc \(MS_{error}\)

Consider :

\[ ME_{error} \\ = \frac{SS_{E}}{N-a} \\ = \frac{SS_{E}}{df_{E}} \\ = \frac{SS_{W}}{df_{W}} \]

So : \(\boxed{7.4612}\)

d) Calc \(F_o\)

\[ F_{o} \\ = \frac{MS_{treatment}}{MS_{error}} \\ = \frac{MS_{B}}{MS_{W}} \]

So : \(\boxed{33.09521}\)

e) P-value \(F_o\)

pf(33.09521, df1 = 4, df2 = 25,lower.tail = FALSE)
## [1] 1.184656e-09

Q2)

This study is not a designed experiment; it is an observational study because participants were not randomly assigned to different levels of chocolate consumption.

As a result, it does not establish a cause-and-effect relationship between chocolate consumption and depression.

While the data may show an association, the lack of randomization and control for confounding variables means we cannot conclude that chocolate causes depression.

To establish causality, researchers would need to conduct a randomized controlled experiment where individuals are randomly assigned to consume varying amounts of chocolate, with other variables controlled, and then measure their depression levels over time.

Q3)

a) Hyp. Test : Does Mixing Technique affect Strength of cement?\(\alpha = 5\%\)

So what we are testing is :

\[ H_o : \mu_1 = \mu_2 =\mu_1 = \mu_2 \\ H_a : \mu_i \ne \mu_j \]

From the box-plot, its clear we should reject the null hyp. – lets see this numerically :

# Perform one-way ANOVA
anova_result <- aov(Strength ~ Technique, data = cement_data)

# View the ANOVA table
summary(anova_result)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Technique    3 489740  163247   12.73 0.000489 ***
## Residuals   12 153908   12826                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the F-test, we find extraordinary strong evidence against the Null-hypothesis–therefore it is likely that Mixing technique affects the average Tensile strength.

b) Box plots :

c) Use the Fisher-LSD ( \(\alpha = 5\%\) )

\[ LSD = t_{\frac{\alpha}{2}, N-a} * \sqrt{2 * \frac{MSE}{n}} \]

The following is pairwise comparisons.

t <- 2.179
MSE <- 12826
n = 4
LSD = t*sqrt(2*MSE/n)
print(LSD)
## [1] 174.497
##       1       2       3       4 
## 2971.00 3156.25 2933.75 2666.25
## 1 vs 2 1 vs 3 1 vs 4 2 vs 3 2 vs 4 3 vs 4 
## 185.25  37.25 304.75 222.50 490.00 267.50
## 1 vs 2 1 vs 3 1 vs 4 2 vs 3 2 vs 4 3 vs 4 
##   TRUE  FALSE   TRUE   TRUE   TRUE   TRUE

As indicated by the LSD score & the comparison, there appears to be a huge difference. All for except 1 vs. 3 – which makes sense if we look at the figure above.

d/e) Diagnostics

From this, we see :

Based upon our diagnostics, its clear that the normality assumption is met! (QQ-plot)

Well, based on the residuals vs. fitted values, I would say our model doesnt an okay job at capturing the general pattern of our categories. Furthermoer, residual spread is not scattered. As a result, we uphold the assumption of constant variance (heteroskedasticity ass.).

Q4)

\[ HSD = q_{\alpha, a, N - a} \cdot \sqrt{\frac{MSE}{n}} \]

a) Turkey Test 🦃

# Tukey's Honestly Significant Difference (HSD) test
TukeyHSD(anova_result)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Strength ~ Technique, data = cement_data)
## 
## $Technique
##        diff        lwr        upr     p adj
## 2-1  185.25  -52.50029  423.00029 0.1493561
## 3-1  -37.25 -275.00029  200.50029 0.9652776
## 4-1 -304.75 -542.50029  -66.99971 0.0115923
## 3-2 -222.50 -460.25029   15.25029 0.0693027
## 4-2 -490.00 -727.75029 -252.24971 0.0002622
## 4-3 -267.50 -505.25029  -29.74971 0.0261838

So here we see that there are less significant comparisons! With only

4-1 
4-2
4-3

being significant. Tukey’s HSD test reveals that Mixing Technique 4 produces significantly lower tensile strength compared to Techniques 1, 2, and 3. No other pairwise differences were statistically significant at the 0.05 level. This indicates that Technique 4 is clearly inferior in performance, while Techniques 1, 2, and 3 are statistically comparable in their tensile strength outcomes.

b) Fisher vs. Tukey :

Fisher’s Least Significant Difference (LSD) and Tukey’s Honestly Significant Difference (HSD) tests both serve to compare treatment means after a significant ANOVA result, but they differ substantially in statistical rigor. Fisher’s LSD is notably more liberal; it requires only a significant overall F-test and then conducts pairwise comparisons without correcting for multiple testing. This approach increases the likelihood of detecting differences—but at the cost of a higher Type I error rate. Tukey’s HSD, in contrast, rigorously controls the family-wise error rate across all comparisons by using the studentized range distribution. This makes it a more conservative and statistically sound choice, particularly when making multiple comparisons. Consequently, Tukey’s method is less likely to flag differences as significant unless the evidence is strong, which explains why some pairs deemed significant by LSD failed to reach significance under Tukey’s test in our analysis.

  • Tukey = safer for multiple comparisons

  • Fisher = more powerful but riskier

Q5)

a)

Well, clearly based on the plot the catalyst make a difference with a clear and noticeable downtrend.

Furthermore, consider the following R Results :

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Catalyst     3  85.68   28.56   9.916 0.00144 **
## Residuals   12  34.56    2.88                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we see a significant difference

0.00144 **

We reject the null hyp. and statistically are convinced of a likely difference in means.

b) Diagnostics

Okay, so the plots are a little worrysom; the normality assumption is sorta wavy being an over and under prediction of what we should normally expect. However, generally the normality assumptions appears to hold. I would however require more data to be confident in these differences.

# Sample mean and count for Catalyst 1
mean1 <- mean(catalyst_data$Concentration[catalyst_data$Catalyst == 1])
n1 <- length(catalyst_data$Concentration[catalyst_data$Catalyst == 1])

# MSE from ANOVA
mse <- summary(anova_model)[[1]]["Residuals", "Mean Sq"]

# Standard error
se <- sqrt(mse / n1)

# t critical value for 99% CI with df = 12
t_critical <- qt(0.995, df = 12)

# CI bounds
lower <- mean1 - t_critical * se
upper <- mean1 + t_critical * se

c(mean = mean1, lower = lower, upper = upper)
##     mean    lower    upper 
## 56.90000 54.58171 59.21829

Q6)

a) Does there appear to be an affect?

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Method       2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is strong statistical evidence that at least one method differs significantly in mean particle count. So, not all methods have the same effect on particle reduction. The graphic clearly displaying a difference.

b) Diagnostics

plot(anova_model)

If we notice the Predicted vs Residuals indicates that as we increase in x, fitted value, we see an increase in variance. Violating the constant variance assumption. However Normality QQ-plot indicates no violations of the normaliy assumptions–with some minor deviations indicating slight over and under estimations from what we should expect from a normal dist.

c) Transformation to combat heteroskedasticity

Because constant variance is violated, the standard ANOVA results might not be fully trustworthy. We now have a solid options: (1) Transformation

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Method       2  63.90   31.95    9.84 0.00295 **
## Residuals   12  38.96    3.25                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After applying a square root transformation to stabilize variance, the ANOVA still shows a statistically significant effect of method on particle count (F = 9.84, p = 0.00295). This confirms that the choice of method significantly affects particle reduction. The residual diagnostics after transformation indicate that the assumption of constant variance is now reasonably met, with no apparent patterns in the residuals vs. fitted plot. Additionally, the normal probability plot of residuals shows points closely following the reference line, suggesting that the normality assumption holds well. Together, these improvements validate the ANOVA model assumptions, allowing us to trust the conclusion that method has a meaningful impact on particle count.