Q1) One-way-Anova

\[ H_o : \mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_a : \mu_i \ne \mu_j \]

We want to know at \(\alpha = 5\%\) level, with \(\text{Power} = 1 - \beta = .9\), given these population means \(\{\mu_1 , \mu_2, \mu_3, \mu_4\}\) with \(Var(\epsilon)=5^2\), how large of a sample ( \(n\) ) do we require to reject \(H_o\)?

So, first things first what test are we gonna use?

F-test! Specifically the

\[ \boxed{\text{One way Anova}} \]

Reasoning : Anova is a commonly used test to compare means.

\[ F_o = \frac{MS_{treatment}}{MS_{error}} \]

Suppose we have 4 treatments corresponding to the 4 treatments respectively \(t_1, t_2, t_3, t_4\). We want to know if any of the treatments have an affect.

What we need for Sample size determination :

\[ \text{Effect-Size} = \sqrt{\frac{\sum_{n(t_i)}(\mu_i - \bar{y}_{..})^2}{n(t_i) * \sigma^2}} \]

Therefore :

mu1 <- 50
mu2 <- 60
mu3 <- 50 
mu4 <- 60
df <- data.frame(pop_means = c(mu1, mu2, mu3, mu4))
grand_mean <- mean(df$pop_means)
sum_of_squares <- sum((df$pop_means - grand_mean)^2)
effect_size <- sqrt(sum_of_squares/(nrow(df) * 25))
number_of_treatment_groups <- 4
power <- .9

library(pwr)
pwr.anova.test(
  k = number_of_treatment_groups, 
  n = NULL, 
  f = effect_size, 
  sig.level = 0.05, 
  power = .9)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 4.658119
##               f = 1
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Therefore, we need about 5 observations per group.

Q2) Est. Sample Size

So this question is all about,

If we change the parameters, what happens?

a/b) \(\sigma \rightarrow \infty\)

effect_size2 <- sqrt(sum_of_squares/(nrow(df) * 36))
effect_size3 <- sqrt(sum_of_squares/(nrow(df) * 49))

pwr.anova.test(
  k = number_of_treatment_groups, 
  n = NULL, 
  f = effect_size2, 
  sig.level = 0.05, 
  power = .9)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 6.180857
##               f = 0.8333333
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group
pwr.anova.test(
  k = number_of_treatment_groups, 
  n = NULL, 
  f = effect_size3, 
  sig.level = 0.05, 
  power = .9)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 7.998751
##               f = 0.7142857
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

c) Sample size relationship

As we can see from an increase in \(sd(\epsilon) = \sigma\), as we become less precise with our estimate, \(n \rightarrow \infty\). In other words, as we are less precise about our estimates, we require a more information from each sample to be sure these treatments are different.

d) Deep interpretation

Lets re-consider the relevant eqs :

\[ \text{Effect-Size} = \sqrt{\frac{\sum_{n(t_i)}(\mu_i - \bar{y}_{..})^2}{n(t_i) * \sigma^2}} \]

\[ F_{crit} = F_{\alpha, df_{treatment}, df_{error}} \\ F_o = \frac{MS_{treatment}}{MS_{error}} \\ F_{crit} > F_{stat} \]

Consider the following desmosdemonstration ( https://www.desmos.com/calculator/rvmyiktrex ) and My F-dist

Interpretation :

In practice, when planning how many samples we need for ANOVA, we should carefully think about the effect size, the F-distribution, and the key formulas. The effect size measures how much the group means differ compared to the noise in the data, and it is calculated by taking the square root of the sum of squared differences between each group mean and the grand mean, divided by the error variance times the number of treatments. A larger effect size means we can detect differences more easily with a smaller sample size, while a smaller effect size means we need a bigger sample size. The critical value F_crit is based on the significance level alpha and the degrees of freedom for treatments and error, and the decision rule is to reject the null hypothesis if the observed F-statistic F_stat is greater than F_crit. Visualizing the F-distributions, such as using the Desmos demonstration, helps show how under the alternative hypothesis the distribution shifts to the right, making it easier to exceed F_crit and achieve higher power. As the effect size increases or the error variance decreases, this shift becomes larger and fewer samples are needed. Therefore, in practice, it is important to estimate a realistic effect size from prior studies or pilot experiments, consider the impact of variability in the data, and use careful power calculations to choose a sample size that balances practicality and the ability to detect meaningful differences.

Q3) LSD

a) Analyze Data

# Fit the RCBD model
rcbd_model <- aov(Measurement ~ Oil + Truck, data = oil_data)

# Summary of the ANOVA
summary(rcbd_model)
##             Df  Sum Sq  Mean Sq F value   Pr(>F)    
## Oil          2 0.00671 0.003353   6.353   0.0223 *  
## Truck        4 0.09210 0.023025  43.626 1.78e-05 ***
## Residuals    8 0.00422 0.000528                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we are using RCBD. We are blocking by Truck and treating the oil as Treatment.

Based on our test, we find that our blocking is statistically sign.

Truck        43.626 1.78e-05 ***

So it was a good idea to block. And, we find that the different oils appear to make a difference.

Oil           6.353   0.0223 * 

Conclusion :
There is strong evidence that different oils lead to different brake-specific fuel consumption (p = 0.0223). Blocking by Truck was effective and highly significant, indicating substantial differences between trucks that were controlled for. The experiment shows that choice of oil affects fuel economy in these diesel engines

b) Fisher LSD method for Comparison :

In other words, now that we know that there is a significant difference in at least one of the means, we must ask, which ones?

Recall :

\[ LSD = t_{\alpha, df_{error}}\sqrt{\frac{2MS_{error}}{n}} \]

Whereby we obtain \(|y_i - y_j|\) of each pair to do pairwise comparison. And if \(LSD < |y_i - y_j|\) then we say, there is an indication that \(t_i\) (treatement) is significantly different.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
t_crit <- qt(1 - .05/2, df = 8)
n_per_group <- 5
LSD <- t_crit * sqrt(2 * 0.000528/ n_per_group)


oil_means <- 
  oil_data %>% 
  group_by(Oil) %>% 
  summarize(oil_mean = mean(Measurement)) 

abs_dif <- 
  c(
  abs(oil_means$oil_mean[1] - oil_means$oil_mean[2]), 
  abs(oil_means$oil_mean[2] - oil_means$oil_mean[3]),
  abs(oil_means$oil_mean[1] - oil_means$oil_mean[3])
  ) 

names(abs_dif) <- c("1 vs 2", "2 vs 3", "1 vs 3")

abs_dif > LSD
## 1 vs 2 2 vs 3 1 vs 3 
##   TRUE   TRUE  FALSE

Therefore, Oil 2 produces significantly different (higher) brake-specific fuel consumption compared to both Oil 1 and Oil 3. However, Oils 1 and 3 are similar and do not differ significantly.

c) Analyze Residuals : Diagnostics :

Okay, this is not looking good…

Despite the QQ-plot indicating normality of the residuals, if we take a look at the residuals vs. Fitted example, what we see is that there is issues of non-const variance with a clear U-ish-pattern – indicating that we should transform our data. We have a clear cluster of values around 0.5

Or, perhaps this would clear up with more data? It could be that we just are capturing noise.

Q4) Latin Squares

A) Plot Data and Analyze Experiment at \(\alpha = 5\%\)

Consider that we are doing this as a Latin-squares design, whereby we are controlling for 2 nuisance variables – namely, batch and Day

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Ingredient   4 141.44   35.36  11.309 0.000488 ***
## Batch        4  15.44    3.86   1.235 0.347618    
## Day          4  12.24    3.06   0.979 0.455014    
## Residuals   12  37.52    3.13                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation :
Ingredients impact reaction time significantly, but batch and day effects are negligible–indicating that blocking doesn’t appear to be necesary; as, controlling for the variation by the nuisance variables wasn’t impactful.

##     p adj
## B-A FALSE
## C-A FALSE
## D-A  TRUE
## E-A  TRUE
## C-B FALSE
## D-B FALSE
## E-B FALSE
## D-C  TRUE
## E-C  TRUE
## E-D FALSE

After performing the tukey-test, the pairs for which is significant are the following :

D-A  TRUE
E-A  TRUE
D-C  TRUE
E-C  TRUE

With the family-wise error rate controlled for, the ingredient pairs with the most statistically different means are above.

C) Diagnostics

Normality :

According to the QQ-plot, what we see is rather normal residuals–although, at the extremes, what we see is an over prediction for lower values and under-prediction for higher values. Which is not uncommon as at the extremes we become more subject to the affect of noise and often require more data to capture the true essence of those extreme values.

Heteroskedasticity :

Yes! I like this! Very beautiful rand. scatter!

Outliers :

Well, like i said extreme values appear to be off-however off by 2 SD; which itself isnt terrible however not perfect; but then again, we should really expect determinism (perfection) with such a small sample size

Conclusion :

Ingredients significantly impacted reaction time, while Batch and Day effects were negligible, indicating that blocking was unnecessary. Tukey’s HSD test showed that Ingredients D and E differ significantly from A and C, and diagnostic plots confirmed that model assumptions were reasonably satisfied.

Q5) Balanced Incomplete Block Design

a) Parameters Def.

The parameters of the BIBD are as follows

\[ \text{a = Number of different treatments} \\ \text{b = Number of blocks } \\ \text{k = Number of treatments per block} \\ \text{r = Number of times each treatment appears across all blocks} \\ \lambda = \text{Number of times each pair of treatments appears together in the same block} \]

Therefore :

\[ \text{a = 7} \\ \text{b = 7} \\ \text{k = 3} \\ \text{r = 3} \\\lambda= 1 \]

b) Analysis

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## Concentration  6 2037.6   339.6  16.117 0.000451 ***
## Day            6  394.1    65.7   3.117 0.070148 .  
## Residuals      8  168.6    21.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation :
Hardwood concentration significantly impacts the strength of the paper. Day-to-day variation shows weak evidence of affecting strength but is not significant at the 0.05 level.

c) Diagnostics

Normality :

According to the QQ-plot, what we see is rather normal residuals–although, at the extremes, what we see is an under prediction for lower values and over-prediction for higher values.

Heteroskedasticity :

There is a funnel shape – indicating non-const. variance, implying Heteroskedasticity. Therefore, we require a transformation. Suppose we try the box-cox method

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select

##               Df    Sum Sq   Mean Sq F value   Pr(>F)    
## Concentration  6 7.596e+34 1.266e+34  24.880 9.24e-05 ***
## Day            6 1.953e+34 3.256e+33   6.399  0.00987 ** 
## Residuals      8 4.071e+33 5.088e+32                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion :

Several transformations, including log, square root, and the Box-Cox recommended \(y^8\) , were attempted to stabilize variance. None substantially improved the heteroskedasticity, likely due to intrinsic data patterns and small sample size. Thus, the analysis proceeded on the original scale, with cautious interpretation acknowledging modest variance instability.