1 Question 3.23

The effective life of insulating fluids at an accelerated load of 35 kV is being studied. Test data have been obtained for four types of fluids. The results from a completely randomized experiment were as follows

(a) Is there any indication that the fluids differ?. Use \(\alpha = 0.05\).
(b) Which fluid would you select, given that the objective is long life?
(c) Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

1.1 Solution - Question 3.23

Creating the Dataframe:

ftype1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
ftype2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
ftype3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
ftype4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)

# Observations
obs <- c(ftype1,ftype2,ftype3,ftype4)

# Mean from each treatment
means <-c(rep(mean(ftype1),6),rep(mean(ftype2),6), rep(mean(ftype3),6),rep(mean(ftype4),6))

# Residuals
res <- obs - means

# Creating the Dataframe:
df_fluids <- data.frame(ftype1,ftype2,ftype3,ftype4)
df_fluids <- pivot_longer(df_fluids, c(ftype1,ftype2,ftype3,ftype4))
df_fluids$name <- as.factor(df_fluids$name)
colnames(df_fluids) <-c("ftype","life")
rmarkdown::paged_table(df_fluids)

1.1.1 Section (a)

(a) Is there any indication that the fluids differ?. Use \(\alpha = 0.05\).

We need to check the assumption of normality and constant variance.

## QQPlot
qqnorm(res, main="Normal QQ Plot")

## Residuals vs Pop means
plot(means,res, main="Residuals vs Populations means")

## Boxplot
boxplot(life ~ ftype, data=df_fluids, main ="BoxPlot of observations")

The QQ Plot follows a linear trend, indicating that the assumption of normality is satisfied. Additionally, the residuals vs Populations means values plot shows that the residuals are distributed with roughly equal spread, suggesting that the assumption of constant variance is also met.

The hypothesis to test are:

\(H_0: \mu_{\text{ftype_1}} = \mu_{\text{ftype_2}} = \mu_{\text{ftype_3}} = \mu_{\text{ftype_4}}\)
\(H_a: \text{At least one } \mu_{\text{ftype_i}}\text{ differs ,} \forall i \in \{1, 2, 3, 4\}\)

# Testing ANOVA
aov.model <- aov(life ~ ftype, data = df_fluids)
summary(aov.model)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## ftype        3  30.17   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

CONCLUSIONS:

Since the P-Value (0.0525) is greater than significance level (0.05), We fail to reject the null hypothesis. This indicates that there is no significant difference in the fluids.
The F-statistic value is 3.047

1.1.2 Section (b)

(b) Which fluid would you select, given that the objective is long life?

There is no statistically significant difference between the fluids (with a p-value of 0.0525), which is greater than the significance level (\(\alpha = 0.05\)). This implies that, from a statistical perspective, it cannot be concluded that one of the fluids has a significantly longer lifespan than the others.

However, we can take a practical approach: check which fluid has the longest life average.

## Boxplot
boxplot(life ~ ftype, data=df_fluids, main ="BoxPlot of observations")

CONCLUSIONS: Although no statistically significant difference was found in the lifespan between the fluids, Fluid 3 has the highest average life time and, therefore, would be the best option based on the observed averages

1.1.3 Section (c)

(c) Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

# Plotting Residuals vs Fitted
plot(aov.model, which = 1)

# Plotting QQ Normal
plot(aov.model, which = 2)

CONCLUSIONS: From the ANOVA plots, we observe that the residuals follow a linear trend in the Q-Q plot, indicating that the assumption of normality is satisfied. Additionally, the residuals vs fitted values plot shows that the residuals are distributed with roughly equal spread, suggesting that the assumption of constant variance is also met.

2 Question 3.28

An experiment was performed to investigate the effectiveness of five insulating materials. Four samples of each material were tested at an elevated voltage level to accelerate the time to failure. The failure times (in minutes) are shown below:

(a) Do all five materials have the same effect on mean failure time?

(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

(c) Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

2.1 Solution - Question 3.28

Creating the Dataframe:

mat1 <- c(110, 157, 194, 178)
mat2 <- c(1, 2, 4 ,18) 
mat3 <- c(880, 1256, 5276, 4355)
mat4 <- c(495, 7040, 5307, 10050)
mat5 <- c(7, 5, 29, 2)

# Observations
obs <- c(mat1, mat2, mat3, mat4, mat5)

# Mean from each treatment
means <-c(rep(mean(mat1),4),rep(mean(mat2),4), rep(mean(mat3),4),rep(mean(mat4),4), rep(mean(mat5),4))

# Residuals
res <- obs - means

# Creating the Dataframe:
df_materials <- data.frame(mat1, mat2, mat3, mat4, mat5)
df_materials <- pivot_longer(df_materials, c(mat1, mat2, mat3, mat4, mat5))
df_materials$name <- as.factor(df_materials$name)
colnames(df_materials) <-c("Material","Fail_time")
rmarkdown::paged_table(df_materials)

2.1.1 Section (a)

(a) Do all five materials have the same effect on mean failure time?

We need to check the assumption of normality and constant variance.

## QQPlot
qqnorm(res, main="Normal QQ Plot")

## Residuals vs Pop means
plot(means,res, main="Residuals vs Populations means")

## Boxplot
boxplot(Fail_time ~ Material, data=df_materials, main ="BoxPlot of observations")

The QQ plot don’t follow a linear trend, indicating that the assumption of normality is not met. Addiotionally, the residuals vs population means plot shows that the residuals are not evenly spread, suggesting that the assumption of constant variance is not met. Therefore, we need to apply a non-parametric ANOVA.

The hypothesis to test are:

\(H_0: \mu_{\text{Material_1}} = \mu_{\text{Material_2}} = \mu_{\text{Material_3}} = \mu_{\text{Material_4}}=\mu_{\text{Material_5}}\)
\(H_a: \text{At least one } \mu_{\text{Material_i}}\text{ differs ,} \forall i \in \{1, 2, 3, 4, 5\}\)

# Testing Non-Parametric ANOVA
kruskal.test(Fail_time ~ Material, data=df_materials)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Fail_time by Material
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046

CONCLUSIONS:

Since the P-Value (0.002046) is less than significance level (0.05), We reject the null hypothesis. This indicates that there is a significant difference in the effect between the materials.
The chi-squared value is 16.873

2.1.2 Section (b)

(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

aov.model <- aov(Fail_time ~ Material, data=df_materials)

# Plotting Residuals vs Fitted
plot(aov.model, which = 1)

# Plotting QQ Normal
plot(aov.model, which = 2)

CONCLUSIONS:

From the ANOVA plots, we observe that Q-Q PLot don’t follow a linear trend,indicating that the assumption of normality is not met. Additionally, the residuals vs fitted values plot shows that the residuals are not evenly spread, suggesting that the assumption of constant variance is not met.
In order to perform an ANOVA test, we need to transform the observations to meet the assumptions.

2.1.3 Section (c)

(c) Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

In order to perform an ANOVA test, we need to transform the observations to meet the assumptions.

boxcox(Fail_time ~ Material, data=df_materials)

The Box-Cox plot shows 1 is outside the 95% confidence interval, confirming the need for data transformation (try stabilizing the variance).

lambda = 0.001
df_materials_v2 <- df_materials
df_materials_v2$Fail_time <- df_materials_v2$Fail_time^lambda

aov.model_2 <- aov(Fail_time ~ Material, data=df_materials_v2)

# Plotting Residuals vs Fitted
plot(aov.model_2, which = 1)

# Plotting QQ Normal
plot(aov.model_2, which = 2)

## Boxplot
boxplot(Fail_time ~ Material, data=df_materials_v2, main ="BoxPlot of observations")

Applying a non-parametric ANOVA again, as the ANOVA assumptions are not met:

# Testing Non-Parametric ANOVA
kruskal.test(Fail_time ~ Material, data=df_materials_v2)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Fail_time by Material
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046

CONCLUSIONS:

After the transformation, the ANOVA plots show that the residuals follow a linear trend in the Q-Q plot, indicating that the assumption of normality is satisfied. However, the residuals vs fitted values plot reveals uneven spread of residuals (due to Material Type 1), suggesting that the assumption of constant variance is not fully met.
Since the ANOVA assumption are not met, We applying a non-parametric ANOVA again.
From the Non-parametric ANOVA Test, We obtain the same P-Value (0.002046), which is less than significance level (0.05), Therefore, we continue to reject the null hypothesis.

3 Question 3.29

A semiconductor manufacturer has developed three different methods for reducing particle counts on wafers. All three methods are tested on five different wafers and the after treatment particle count obtained. The data are shown below:

(a) Do all methods have the same effect on mean particle count?
(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?
(c) Based on your answer to part (b) conduct another analysis of the particle count data and draw appropriate conclusions.

3.1 Solution - Question 3.29

Creating the Dataframe:

method1 <- c(31, 10, 21, 4, 1)
method2 <- c(62, 40, 24, 30, 35)
method3 <- c(53, 27, 120, 97, 68)

# Observations
obs <- c(method1, method2, method3)

# Mean from each treatment
means <-c(rep(mean(method1),5), rep(mean(method2),5), rep(mean(method3),5))

# Residuals
res <- obs - means

# Creating the Dataframe:
df_methods <- data.frame(method1, method2, method3)
df_methods <- pivot_longer(df_methods, c(method1, method2, method3))
df_methods$name <- as.factor(df_methods$name)
colnames(df_methods) <-c("Methods","Count")
rmarkdown::paged_table(df_methods)

3.1.1 Section (a)

(a) Do all methods have the same effect on mean particle count?

We need to check the assumption of normality and constant variance.

## QQPlot
qqnorm(res, main="Normal QQ Plot")

## Residuals vs Pop means
plot(means,res, main="Residuals vs Populations means")

## Boxplot
boxplot(Count ~ Methods, data=df_methods, main ="BoxPlot of observations")

The QQ plot follows a linear trend, indicating that the assumption of normality is met. However, the residuals vs population means plot shows that the residuals are not evenly spread, suggesting that the assumption of constant variance is not met. Therefore, we need to apply a non-parametric ANOVA.

The hypothesis to test are:

\(H_0: \mu_{\text{Method_1}} = \mu_{\text{Method_2}} = \mu_{\text{Method_3}}\)
\(H_a: \text{At least one } \mu_{\text{Method_i}}\text{ differs ,} \forall i \in \{1, 2, 3\}\)

# Testing Non-Parametric ANOVA
kruskal.test(Count ~ Methods, data=df_methods)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Count by Methods
## Kruskal-Wallis chi-squared = 8.54, df = 2, p-value = 0.01398

CONCLUSIONS:

Since the P-Value (0.01398) is less than significance level (0.05), We reject the null hypothesis. This indicates that there is a significant difference in the effect on mean particle count between the methods.
The chi-squared value is 8.54

3.1.2 Section (b)

(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?

aov.model <- aov(Count ~ Methods, data=df_methods)

# Plotting Residuals vs Fitted
plot(aov.model, which = 1)

# Plotting QQ Normal
plot(aov.model, which = 2)

CONCLUSIONS:

From the ANOVA plots, we observe that Q-Q PLot follows a linear trend,indicating that the assumption of normality is met. However, the residuals vs fitted values plot shows that the residuals are not evenly spread, suggesting that the assumption of constant variance is not met.
In order to perform an ANOVA test, we need to transform the observations to meet the assumptions.

3.1.3 Section (c)

(c) Based on your answer to part (b) conduct another analysis of the particle count data and draw appropriate conclusions.

In order to perform an ANOVA test, we need to transform the observations to meet the assumptions.

boxcox(Count ~ Methods, data=df_methods)

The Box-Cox plot shows 1 is outside the 95% confidence interval, confirming the need for data transformation (try stabilizing the variance)

lambda = 0.42
df_methods_v2 <- df_methods
df_methods_v2$Count <- df_methods_v2$Count^lambda

# Verify λ = 1 (approx) after transformation
boxcox(Count ~ Methods, data=df_methods_v2)

aov.model_2 <- aov(Count ~ Methods, data=df_methods_v2)

# Plotting Residuals vs Fitted
plot(aov.model_2, which = 1)

# Plotting QQ Normal
plot(aov.model_2, which = 2)

## Boxplot
boxplot(Count ~ Methods, data=df_methods_v2, main ="BoxPlot of observations")

Applying a ANOVA test, as the ANOVA assumptions are met:

# Testing Non-Parametric ANOVA
aov.model_2 <- aov(Count ~ Methods, data=df_methods_v2)
summary(aov.model_2)

##             Df Sum Sq Mean Sq F value Pr(>F)   
## Methods      2  26.62  13.310    9.89 0.0029 **
## Residuals   12  16.15   1.346                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

CONCLUSIONS:

After the transformation, the ANOVA plots show that the residuals follow a linear trend in the Q-Q plot, indicating that the assumption of normality is satisfied. Furthermore, the residuals vs. fitted values plot reveals an approximately even spread of residuals, suggesting that the assumption of constant variance is met.
From the ANOVA Test, We obtain the same P-Value (0.0029), which is less than significance level (0.05), Therefore, we continue to reject the null hypothesis.

4 Question 3.51

Use the Kruskal–Wallis test for the experiment in Problem 3.23. Compare the conclusions obtained with those from the usual analysis of variance.

Data from experiment “Problem 3.23”

ftype1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
ftype2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
ftype3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
ftype4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)

# Creating the Dataframe:
df_fluids <- data.frame(ftype1,ftype2,ftype3,ftype4)
df_fluids <- pivot_longer(df_fluids, c(ftype1,ftype2,ftype3,ftype4))
df_fluids$name <- as.factor(df_fluids$name)

4.1 Solution - Question 3.51

Applying a non-parametric ANOVA

# Testing Non-Parametric ANOVA
kruskal.test(value ~ name, data=df_fluids)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

CONCLUSIONS:

From ANOVA Test: The p-value is 0.0525, which means we fail to reject the null hypothesis at the 0.05 significance level.
From Non-Parametric ANOVA Test: The p-value is 0.1015, which means we fail to reject the null hypothesis at the 0.05 significance level.
In both cases (test), we fail to reject the null hypothesis
While ANOVA yields a non-significant result (p = 0.0525), the Kruskal-Wallis test, which is less sensitive to violations of parametric assumptions, suggests that the differences between the groups are even less evident.
Regarding both tests, we can conclude that there is no evidence of significant differences between the groups

5 Question 3.52

Use the Kruskal–Wallis test for the experiment in Problem 3.23. Are the results comparable to those found by the usual analysis of variance?

Data from experiment “Problem 3.23”

ftype1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
ftype2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
ftype3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
ftype4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)

# Creating the Dataframe:
df_fluids <- data.frame(ftype1,ftype2,ftype3,ftype4)
df_fluids <- pivot_longer(df_fluids, c(ftype1,ftype2,ftype3,ftype4))
df_fluids$name <- as.factor(df_fluids$name)

5.1 Solution - Question 3.52

Applying a non-parametric ANOVA

# Testing Non-Parametric ANOVA
kruskal.test(value ~ name, data=df_fluids)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

CONCLUSIONS:

From ANOVA Test: The p-value is 0.0525, which means we fail to reject the null hypothesis at the 0.05 significance level.
Kruskal-Wallis, being a non-parametric test, provides a more conservative result with a p-value of 0.1015, which indicates no significant differences between the groups
Both tests suggest no strong evidence of significant differences between the groups
The results are comparable (but not identical). ANOVA shows evidence of differences, while Kruskal-Wallis suggests the differences are even less likely to be significant.

6 Question 4.3

A chemist wishes to test the effect of four chemical agents on the strength of a particular type of cloth. Because there might be variability from one bolt to another, the chemist decides to use a randomized block design, with the bolts of cloth considered as blocks. She selects five bolts and applies all four chemicals in random order to each bolt. The resulting tensile strengths follow. Analyze the data from this experiment (use \(\alpha = 0.05\)) and draw appropriate conclusions.

6.1 Solution - Question 4.3

Entering Data

# Entering Data
ch1 <- c(73,68,74,71,67)
ch2 <- c(73,67,75,72,70)
ch3 <- c(75,68,78,73,68)
ch4 <- c(73,71,75,75,69)
blts <-c(1,2,3,4,5)

obs <- c(ch1,ch2,ch3,ch4)
chem <-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))
chem <-as.fixed(chem)
bolts <- c(rep(blts,4))
bolts <-as.fixed(bolts)

Hyphotesis test:

\(H_0 : \tau_i = 0 \quad \forall \, i\) (There is no effect of the chemical agents on the outcome)
\(H_a : \tau_i \neq 0 \quad \text{some} \, i\) (At least one chemical agent has a significant effect on the outcome)

Where:

\(i\) represents the chemical agent used in the experiment.

Linear Effect equation: \[ y_{i,j} = \mu + \tau_i + \beta_j + \epsilon_{i,j} \] Where:

\(\mu = \text{Grand Mean}\)
\(\tau_i = \text{Treatment "i" (chemicals)}\)
\(\beta_j = \text{Block "j" (Bolts)}\)
\(\epsilon_{i,j} = \text{Random Error}\)

Applying RCBD (Randomized Complete Block Design) test

# Applying RCBD (Randomized Complete Block Design) test
model <- lm(obs ~ chem + bolts)
gad(model)

## $anova
## Analysis of Variance Table
## 
## Response: obs
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## chem       3  12.95   4.317  2.3761    0.1211    
## bolts      4 157.00  39.250 21.6055 2.059e-05 ***
## Residuals 12  21.80   1.817                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

CONCLUSIONS:

Since the P-Value (0.1211) is greater than significance level (0.05), We fail to reject the null hypothesis, indicating no significant differences between the means. There is insufficient evidence to suggest a significant effect of the chemical agent.
The f-statistic is 2.3761
The bolt type is blocked to control its influence, ensuring that treatment comparisons focus on true differences rather than variability from bolt type.

7 Question 4.16

Assuming that chemical types and bolts are fixed, estimate the model parameters \(\tau_i\) and \(\beta_j\) in Problem 4.3.

7.1 Solution - Question 4.16

Linear Effect equation: \[ y_{i,j} = \mu + \tau_i + \beta_j + \epsilon_{i,j} \] Where:

\(\mu = \text{Grand Mean}\)
\(\tau_i = \text{Treatment "i" (chemicals)}\)
\(\beta_j = \text{Block "j" (Bolts)}\)
\(\epsilon_{i,j} = \text{Random Error}\)

Considering those equations:

For estimation of \(\tau_i\): \[ \tau_i = \bar{Y}_{i.} - \bar{Y}_{..} \]
For estimation of \(\beta_j\): \[ \beta_j = \bar{Y}_{.j} - \bar{Y}_{..} \]
Where:
- \(\bar{Y}_{i.}\) is the mean of the \(i\)-th treatment (chemical) across all blocks (bolts).
- \(\bar{Y}_{.j}\) is the mean of the \(j\)-th block (bolt) across all treatments (chemicals).
- \(\bar{Y}_{..}\) is the grand mean (overall mean of all observations.)

# Entering Data
ch1 <- c(73,68,74,71,67)
ch2 <- c(73,67,75,72,70)
ch3 <- c(75,68,78,73,68)
ch4 <- c(73,71,75,75,69)
blts <-c(1,2,3,4,5)

obs <- c(ch1,ch2,ch3,ch4)
matrix_data <- rbind(ch1, ch2, ch3, ch4)
bolt1 <- matrix_data[,1]
bolt2 <- matrix_data[,2]
bolt3 <- matrix_data[,3]
bolt4 <- matrix_data[,4]
bolt5 <- matrix_data[,5]

# Calculate the means
grand_mean <- mean(obs)
mean_ch1 <- mean(ch1)
mean_ch2 <- mean(ch2)
mean_ch3 <- mean(ch3)
mean_ch4 <- mean(ch4)
mean_bolt1 <- mean(bolt1)
mean_bolt2 <- mean(bolt2)
mean_bolt3 <- mean(bolt3)
mean_bolt4 <- mean(bolt4)
mean_bolt5 <- mean(bolt5)

# Calculate the t_i
t1 <- mean_ch1 - grand_mean
t2 <- mean_ch2 - grand_mean
t3 <- mean_ch3 - grand_mean 
t4 <- mean_ch4 - grand_mean 

# Calculate the B_j
B1 <- mean_bolt1 - grand_mean
B2 <- mean_bolt2 - grand_mean
B3 <- mean_bolt3 - grand_mean
B4 <- mean_bolt4 - grand_mean
B5 <- mean_bolt5 - grand_mean

# Results
t_values <- c(t1, t2, t3, t4)
B_values <- c(B1, B2, B3, B4, B5)
values <- c(t_values, B_values)

results <- data.frame(values)
rownames(results) <- c("t1", "t2", "t3", "t4", "B1", "B2", "B3", "B4", "B5")
results

##    values
## t1  -1.15
## t2  -0.35
## t3   0.65
## t4   0.85
## B1   1.75
## B2  -3.25
## B3   3.75
## B4   1.00
## B5  -3.25

8 Question 4.22

The effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process is being studied. Each batch of new material is only large enough to permit five runs to be made. Furthermore, each run requires approximately 1.5 hours, so only five runs can be made in one day. The experimenter decides to run the experiment as a Latin square so that day and batch effects may be systematically controlled. She obtains the data that follow. Analyze the data from this experiment use (\(\alpha = 0.05\)) and draw conclusions.

8.1 Solution - Question 4.22

This is a valid Latin square design, as each treatment (ingredient A, B, C, D, E) appears exactly once in each row (day) and in each column (batch), preventing repetitions. This ensures orthogonality in the design. Additionally, it effectively controls two sources of variability—days and batches—which are known and manageable factors.

# Entering Data
b1 <- c(8,7,1,7,3)
b2 <- c(11,2,7,3,8)
b3 <- c(4,9,10,1,5)
b4 <- c(6,8,6,6,10)
b5 <- c(4,2,3,8,8)
obs <- c(b1,b2,b3,b4,b5)
days <- c(1,2,3,4,5)
ingredients <- c("A", "B", "D", "C", "E",
                 "C", "E", "A", "D", "B",
                 "B", "A", "C", "E", "D",
                 "D", "C", "E", "B", "A",
                 "E", "D", "B", "A", "C")

bq_day <- rep(days,5)
bq_batch <-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5))

# Create a Dataframe
df <- data.frame(obs, ingredients, bq_batch, bq_day)
df$ingredients <- as.factor(df$ingredients)
df$bq_batch <- as.factor(df$bq_batch)
df$bq_day <- as.factor(df$bq_day)

Hyphotesis test:

\(H_0 : \tau_i = 0 \quad \forall \, i\) (There is no effect of the Ingredients on the outcome)
\(H_a : \tau_i \neq 0 \quad \text{some} \, i\) (At least one Ingredient has a significant effect on the outcome)

Where:

\(i\) represents the Ingredients used in the experiment.

Linear Effect equation: \[ y_{i,j} = \mu + \tau_i + \beta_j + \alpha_k + \epsilon_{i,j,k} \] Where:

\(\mu = \text{Grand Mean}\)
\(\tau_i = \text{Treatment "i" (Ingredients)}\)
\(\beta_j = \text{Block "j" (Batch)}\)
\(\alpha_k = \text{Block "k" (Day)}\)
\(\epsilon_{i,j,k} = \text{Random Error}\)

# Applying ANOVA Test for Latin Square Designs
aov.model<-aov(obs ~ ingredients + bq_batch + bq_day, data=df)
summary(aov.model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ingredients  4 141.44   35.36  11.309 0.000488 ***
## bq_batch     4  15.44    3.86   1.235 0.347618    
## bq_day       4  12.24    3.06   0.979 0.455014    
## Residuals   12  37.52    3.13                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

CONCLUSIONS:

Since the P-Value (0.000488) is less than significance level (0.05), We reject the null hypothesis, indicating a significant effect of the ingredients in the reaction time.
The p-values for the two blocks (batch and day) are much greater than alpha, indicating that they do not have a significant effect on the reaction time.

9 Complete R-Code

# Libraries
library(dplyr)
library(tidyr)
library(MASS)
library(GAD)


# QUESTION 3.23 #######################################################################################
ftype1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
ftype2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
ftype3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
ftype4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
# Observations
obs <- c(ftype1,ftype2,ftype3,ftype4)
# Mean from each treatment
means <-c(rep(mean(ftype1),6),rep(mean(ftype2),6), rep(mean(ftype3),6),rep(mean(ftype4),6))
# Residuals
res <- obs - means
# Creating the Dataframe:
df_fluids <- data.frame(ftype1,ftype2,ftype3,ftype4)
df_fluids <- pivot_longer(df_fluids, c(ftype1,ftype2,ftype3,ftype4))
df_fluids$name <- as.factor(df_fluids$name)
colnames(df_fluids) <-c("ftype","life")
rmarkdown::paged_table(df_fluids)

# PART A: 
# Check the assumption of normality and constant variance.
## QQPlot
qqnorm(res, main="Normal QQ Plot")
## Residuals vs Pop means
plot(means,res, main="Residuals vs Populations means")
## Boxplot
boxplot(life ~ ftype, data=df_fluids, main ="BoxPlot of observations")
# Testing ANOVA
aov.model <- aov(life ~ ftype, data = df_fluids)
summary(aov.model)

# PART B:
## Boxplot
boxplot(life ~ ftype, data=df_fluids, main ="BoxPlot of observations")

# PART C:
# Plotting Residuals vs Fitted
plot(aov.model, which = 1)
# Plotting QQ Normal
plot(aov.model, which = 2)


# QUESTION 3.28 #######################################################################################
mat1 <- c(110, 157, 194, 178)
mat2 <- c(1, 2, 4 ,18) 
mat3 <- c(880, 1256, 5276, 4355)
mat4 <- c(495, 7040, 5307, 10050)
mat5 <- c(7, 5, 29, 2)
# Observations
obs <- c(mat1, mat2, mat3, mat4, mat5)
# Mean from each treatment
means <-c(rep(mean(mat1),4),rep(mean(mat2),4), rep(mean(mat3),4),rep(mean(mat4),4), rep(mean(mat5),4))
# Residuals
res <- obs - means
# Creating the Dataframe:
df_materials <- data.frame(mat1, mat2, mat3, mat4, mat5)
df_materials <- pivot_longer(df_materials, c(mat1, mat2, mat3, mat4, mat5))
df_materials$name <- as.factor(df_materials$name)
colnames(df_materials) <-c("Material","Fail_time")
rmarkdown::paged_table(df_materials)

# PART A:
# Check the assumption of normality and constant variance.
## QQPlot
qqnorm(res, main="Normal QQ Plot")
## Residuals vs Pop means
plot(means,res, main="Residuals vs Populations means")
## Boxplot
boxplot(Fail_time ~ Material, data=df_materials, main ="BoxPlot of observations")
# Testing Non-Parametric ANOVA
kruskal.test(Fail_time ~ Material, data=df_materials)

# PART B:
aov.model <- aov(Fail_time ~ Material, data=df_materials)
# Plotting Residuals vs Fitted
plot(aov.model, which = 1)
# Plotting QQ Normal
plot(aov.model, which = 2)

# PART C:
boxcox(Fail_time ~ Material, data=df_materials)
lambda = 0.001
df_materials_v2 <- df_materials
df_materials_v2$Fail_time <- df_materials_v2$Fail_time^lambda
aov.model_2 <- aov(Fail_time ~ Material, data=df_materials_v2)
# Plotting Residuals vs Fitted
plot(aov.model_2, which = 1)
# Plotting QQ Normal
plot(aov.model_2, which = 2)
## Boxplot
boxplot(Fail_time ~ Material, data=df_materials_v2, main ="BoxPlot of observations")
# Testing Non-Parametric ANOVA
kruskal.test(Fail_time ~ Material, data=df_materials_v2)


# QUESTION 3.29 #######################################################################################
method1 <- c(31, 10, 21, 4, 1)
method2 <- c(62, 40, 24, 30, 35)
method3 <- c(53, 27, 120, 97, 68)
# Observations
obs <- c(method1, method2, method3)
# Mean from each treatment
means <-c(rep(mean(method1),5), rep(mean(method2),5), rep(mean(method3),5))
# Residuals
res <- obs - means
# Creating the Dataframe:
df_methods <- data.frame(method1, method2, method3)
df_methods <- pivot_longer(df_methods, c(method1, method2, method3))
df_methods$name <- as.factor(df_methods$name)
colnames(df_methods) <-c("Methods","Count")
rmarkdown::paged_table(df_methods)

# PART A:
# Check the assumption of normality and constant variance.
## QQPlot
qqnorm(res, main="Normal QQ Plot")
## Residuals vs Pop means
plot(means,res, main="Residuals vs Populations means")
## Boxplot
boxplot(Count ~ Methods, data=df_methods, main ="BoxPlot of observations")
# Testing Non-Parametric ANOVA
kruskal.test(Count ~ Methods, data=df_methods)

# PART B:
aov.model <- aov(Count ~ Methods, data=df_methods)
# Plotting Residuals vs Fitted
plot(aov.model, which = 1)
# Plotting QQ Normal
plot(aov.model, which = 2)

# PART C:
boxcox(Count ~ Methods, data=df_methods)
lambda = 0.42
df_methods_v2 <- df_methods
df_methods_v2$Count <- df_methods_v2$Count^lambda
# Verify λ = 1 (approx) after transformation
boxcox(Count ~ Methods, data=df_methods_v2)
aov.model_2 <- aov(Count ~ Methods, data=df_methods_v2)
# Plotting Residuals vs Fitted
plot(aov.model_2, which = 1)
# Plotting QQ Normal
plot(aov.model_2, which = 2)
## Boxplot
boxplot(Count ~ Methods, data=df_methods_v2, main ="BoxPlot of observations")
# Testing Non-Parametric ANOVA
aov.model_2 <- aov(Count ~ Methods, data=df_methods_v2)
summary(aov.model_2)


# QUESTION 3.51 #######################################################################################
ftype1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
ftype2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
ftype3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
ftype4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
# Creating the Dataframe:
df_fluids <- data.frame(ftype1,ftype2,ftype3,ftype4)
df_fluids <- pivot_longer(df_fluids, c(ftype1,ftype2,ftype3,ftype4))
df_fluids$name <- as.factor(df_fluids$name)

# PART A:
# Testing Non-Parametric ANOVA
kruskal.test(value ~ name, data=df_fluids)


# QUESTION 3.52 #######################################################################################
ftype1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
ftype2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
ftype3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
ftype4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
# Creating the Dataframe:
df_fluids <- data.frame(ftype1,ftype2,ftype3,ftype4)
df_fluids <- pivot_longer(df_fluids, c(ftype1,ftype2,ftype3,ftype4))
df_fluids$name <- as.factor(df_fluids$name)

# PART A:
# Testing Non-Parametric ANOVA
kruskal.test(value ~ name, data=df_fluids)


# QUESTION 4.3 ########################################################################################
ch1 <- c(73,68,74,71,67)
ch2 <- c(73,67,75,72,70)
ch3 <- c(75,68,78,73,68)
ch4 <- c(73,71,75,75,69)
blts <-c(1,2,3,4,5)
obs <- c(ch1,ch2,ch3,ch4)
chem <-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))
chem <-as.fixed(chem)
bolts <- c(rep(blts,4))
bolts <-as.fixed(bolts)
# Applying RCBD (Randomized Complete Block Design) test
model <- lm(obs ~ chem + bolts)
gad(model)


# QUESTION 4.16 #######################################################################################
# Entering Data
ch1 <- c(73,68,74,71,67)
ch2 <- c(73,67,75,72,70)
ch3 <- c(75,68,78,73,68)
ch4 <- c(73,71,75,75,69)
blts <-c(1,2,3,4,5)
obs <- c(ch1,ch2,ch3,ch4)
matrix_data <- rbind(ch1, ch2, ch3, ch4)
bolt1 <- matrix_data[,1]
bolt2 <- matrix_data[,2]
bolt3 <- matrix_data[,3]
bolt4 <- matrix_data[,4]
bolt5 <- matrix_data[,5]
# Calculate the means
grand_mean <- mean(obs)
mean_ch1 <- mean(ch1)
mean_ch2 <- mean(ch2)
mean_ch3 <- mean(ch3)
mean_ch4 <- mean(ch4)
mean_bolt1 <- mean(bolt1)
mean_bolt2 <- mean(bolt2)
mean_bolt3 <- mean(bolt3)
mean_bolt4 <- mean(bolt4)
mean_bolt5 <- mean(bolt5)
# Calculate the t_i
t1 <- mean_ch1 - grand_mean
t2 <- mean_ch2 - grand_mean
t3 <- mean_ch3 - grand_mean 
t4 <- mean_ch4 - grand_mean 
# Calculate the B_j
B1 <- mean_bolt1 - grand_mean
B2 <- mean_bolt2 - grand_mean
B3 <- mean_bolt3 - grand_mean
B4 <- mean_bolt4 - grand_mean
B5 <- mean_bolt5 - grand_mean
# Results
t_values <- c(t1, t2, t3, t4)
B_values <- c(B1, B2, B3, B4, B5)
values <- c(t_values, B_values)
results <- data.frame(values)
rownames(results) <- c("t1", "t2", "t3", "t4", "B1", "B2", "B3", "B4", "B5")
results


# QUESTION 4.22 #######################################################################################
# Entering Data
b1 <- c(8,7,1,7,3)
b2 <- c(11,2,7,3,8)
b3 <- c(4,9,10,1,5)
b4 <- c(6,8,6,6,10)
b5 <- c(4,2,3,8,8)
obs <- c(b1,b2,b3,b4,b5)
days <- c(1,2,3,4,5)
ingredients <- c("A", "B", "D", "C", "E",
                 "C", "E", "A", "D", "B",
                 "B", "A", "C", "E", "D",
                 "D", "C", "E", "B", "A",
                 "E", "D", "B", "A", "C")

bq_day <- rep(days,5)
bq_batch <-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5))
# Create a Dataframe
df <- data.frame(obs, ingredients, bq_batch, bq_day)
df$ingredients <- as.factor(df$ingredients)
df$bq_batch <- as.factor(df$bq_batch)
df$bq_day <- as.factor(df$bq_day)
# Applying ANOVA Test for Latin Square Designs
aov.model<-aov(obs ~ ingredients + bq_batch + bq_day, data=df)
summary(aov.model)

Homework 3

Jairo Rodriguez (R11925123)

Last compiled on October 10, 2024 at 12:00 PM

1 Question 3.23

1.1 Solution - Question 3.23

1.1.1 Section (a)

1.1.2 Section (b)

1.1.3 Section (c)

2 Question 3.28

2.1 Solution - Question 3.28

2.1.1 Section (a)

2.1.2 Section (b)

2.1.3 Section (c)

3 Question 3.29

3.1 Solution - Question 3.29

3.1.1 Section (a)

3.1.2 Section (b)

3.1.3 Section (c)

4 Question 3.51

4.1 Solution - Question 3.51

5 Question 3.52

5.1 Solution - Question 3.52

6 Question 4.3

6.1 Solution - Question 4.3

7 Question 4.16

7.1 Solution - Question 4.16

8 Question 4.22

8.1 Solution - Question 4.22

9 Complete R-Code