Necessary library:

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.9
## v tidyr   1.2.1     v stringr 1.4.1
## v readr   2.1.2     v forcats 0.5.2
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lawstat)
## Warning: package 'lawstat' was built under R version 4.1.3
library(agricolae)
## Warning: package 'agricolae' was built under R version 4.1.3
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

Answer to question no-3.23

Data entry and sorting:

fluid1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
fluid2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
fluid <- data.frame(fluid1, fluid2, fluid3, fluid4)
fluidlong <- pivot_longer(fluid, c(fluid1, fluid2, fluid3, fluid4))
fluidlong$name <- as.factor(fluidlong$name)

Answer no-3.23a

First, we check the normality of the data and the variance and take the decision if we want to use parametric tests or not-

boxplot(fluid)

qqnorm(fluid1)
qqline(fluid1)

qqnorm(fluid2)
qqline(fluid2)

qqnorm(fluid3)
qqline(fluid3)

qqnorm(fluid4)
qqline(fluid4)

levene.test(fluidlong$value, fluidlong$name, location ="mean")
## 
##  Classical Levene's test based on the absolute deviations from the mean
##  ( none not applied because the location is not set to median )
## 
## data:  fluidlong$value
## Test Statistic = 0.1462, p-value = 0.9309

The results of our initial analysis show that the variance are similar (indicated y the boxplot and the levene test. The data for each fluid type also follow a normal distribution. Hence we move forward with ANOVA

If u1,u2,u3,u4,u5 are the means of the fluid types 1-5 respectively,

Null hypothesis, Ho: u1 = u2 = u3 =u4 = u5

Alternative hypothesis, Ha: Atleast one of the means differ

anova <- aov(value~name, data = fluidlong)
summary(anova)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## name         3  30.16   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on our ANOVA result, we see that our p-value is greater than our alpha. Hence we fail to reject the null hypthesis that the mean lifetime of all the fluids are the same.

There is no strong evidence that the fluids differ

Answer no-3.23b

boxplot(fluid)

From the boxplot we see that fluid 3 has the highest mean, median and max and min value compared to the other fluids.

Hence, we’ll choose fluid 3 if the objective is higher fluid life.

Answer no-3.23c

We check the residuals of our ANOVA:

plot(anova)

From the plots of ANOVA, we see that the residuals follow a fairly linear trend indicating normality. Again, the residuals vs fitted value have fairly equal lengths indicating that the variance in them is the same.

Hence the model is adequate.

Answer to question no-3.51

We compare the results from 3.23-

summary(anova)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## name         3  30.16   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal.test(value~name, data = fluidlong)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

We see that our p-value from the Kruskal-Wallis test is higher than our threshold alpha. Hence we fail to reject the null hypothesis that the mean life of different fluids are the same.

With the ANOVA and Kruskal-Wallis test, we reach the same conclusion that the mean life time of the fluids are the same.

Answer to question no-3.52

We compare the results from 3.23-

summary(anova)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## name         3  30.16   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal.test(value~name, data = fluidlong)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

The results of Kruskal-Wallis shows higher P-value compared to ANOVA. This indicates a stronger evidence for the null hypothesis compared to ANOVA. Although both the tests indicate that we fail to reject the null hypothesis.

Answer to question no-3.28

material1 <- c(110, 157, 194, 178)
material2 <- c(1, 2, 4 ,18) 
material3 <- c(880, 1256, 5276, 4355)
material4 <- c(495, 7040, 5307, 10050)
material5 <- c(7, 5, 29, 2)
material <- data.frame(material1, material2, material3, material4, material5)


materiallong <- pivot_longer(material, c(material1, material2, material3, material4, material5))
#materiallong
materiallong$name <- as.factor(materiallong$name)
#str(materiallong)

Answer to question no-3.28a

If u1,u2,u3,u4,u5 are the mean failure times for material 1-5 respectively,

The null hypothesis, Ho: u1 = u2=u3=u4=u5

Alternative hypothesis, Ha: At least one of the means differ.

anova2 <- aov(value~name, data=materiallong)
summary(anova2)
##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## name         4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results of ANOVA, we see that the p-value is lower than our threshold critical value of 0.05. Hence we reject the null hypothesis that the mean failure times are the same for all the groups of materials.

The materials do not have the same effect on the mean failure times.

Answer to question no-3.28b

We plot the results of ANOVA as follows-

anova2 <- aov(value~name, data=materiallong)
plot(anova2)

From the normal probability plot of the residuals, we see that the residuals do not follow a linear trend, indicating that they deviate from normality.

Also the residuals vs fitted value data show that the variance width are not the same for all the groups. Also they form a funnel-like shape indicating that the group variances are not the same.

Answer to question no-3.28c

We perform a boxcox transformation on the data to try stabilizing the variance-

library(MASS)
lm_material <- lm(value~name, data = materiallong)
boxcox(lm_material)

materiallong_modified <- materiallong
materiallong_modified$value <- log(materiallong_modified$value)
anova_modified_material <- aov(value~name, materiallong_modified)
plot(anova_modified_material)

From the boxcox transformation, we see that the value of lamda is almost close to zero. Hence we perform a log -transformation on the data.

After transformation, we perform ANOVA again. This time we see that the variance show similarity between groups as indicated by the equal width residual boxes in residual vs fitted values. Also there exists normality in the normal qq-plot data, indicating that the residuals are fairly normally distributed.

Although this does not change our p-value from the ANOVA.

Answer to question no-3.29

Data entry and sorting:

method1 <- c(31, 10, 21, 4, 1)
method2 <- c(62, 40, 24, 30, 35)
method3 <- c(53, 27, 120, 97, 68)
method <- data.frame(method1, method2, method3)

methodlong <- pivot_longer(method, c(method1, method2, method3))
methodlong$name <- as.factor(methodlong$name)

Answer to question no-3.29a

If u1,u2,u3 are the mean counts for methods 1-3 respectively,

The null hypothesis, Ho: u1 = u2=u3

Alternative hypothesis, Ha: At least one of the means differ.

anova3 <- aov(value~name, methodlong)
summary(anova3)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## name         2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see from our ANOVA results that the p-value is less than our critical threshold alpha of 0.05. Hence we reject the null hypothesis that the mean count are the same for different methods.

At least one of the groups differ significantly from the rest.

Answer to question no-3.29b

We plot the results of ANOVA as follows-

plot(anova3)

From the plots of ANOVA results, we see that the residuals have a fairly normal distribution as seen in the normal qq-plot.

Again, from the residuals vs fitted value plots, we see that the variances are not the same for all the groups. It shows a funnel-like shape indicating non-similar variances in different groups.

Answer to question no-3.29c

We perform a boxcox transformation on the data to try stabilizing the variance-

library(MASS)
lm_method <- lm(value~name, data = methodlong)
boxcox(lm_method)

lambda = 0.35
methodlong_modified <- methodlong
methodlong_modified$value <- (methodlong_modified$value)^lambda
anova_modified_method <- aov(value~name, methodlong_modified)
boxcox(anova_modified_method)

plot(anova_modified_method)

From the boxcox we see that the 95% log-likelihood value is between 0.1 to 0.8. We choose our lambda to be 0.35. After using transformed data, we see that the residuals have a normal distribution. Also the variance starts stabilizing further with more boxed pattern in the residuals vs predicted values.

But this does not change our initial conclusion that we reject our null hypothesis based on the p-value.

Complete Code Chunk:

#Anwer to problem no 3.23
library(tidyverse)
library(lawstat)
library(agricolae)
library(MASS)

fluid1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
fluid2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)

qqnorm(fluid1)
qqline(fluid1)
qqnorm(fluid2)
qqline(fluid2)
qqnorm(fluid3)
qqline(fluid3)
qqnorm(fluid4)
qqline(fluid4)


fluid <- data.frame(fluid1, fluid2, fluid3, fluid4)
boxplot(fluid)
fluidlong <- pivot_longer(fluid, c(fluid1, fluid2, fluid3, fluid4))
fluidlong$name <- as.factor(fluidlong$name)
str(fluidlong)
?levene.test()
levene.test(fluidlong$value, fluidlong$name, location ="mean")


anova <- aov(value~name, data = fluidlong)
summary(anova)
plot(anova)

#answer to probem no.3.51
kruskal.test(value~name, data = fluidlong)

#Anwer to problem no 3.28
material1 <- c(110, 157, 194, 178)
material2 <- c(1, 2, 4 ,18) 
material3 <- c(880, 1256, 5276, 4355)
material4 <- c(495, 7040, 5307, 10050)
material5 <- c(7, 5, 29, 2)

qqnorm(material1)
qqline(material1)
qqnorm(material2)
qqline(material2)
qqnorm(material3)
qqline(material3)
qqnorm(material4)
qqline(material4)
qqnorm(material5)
qqline(material5)

material <- data.frame(material1, material2, material3, material4, material5)
boxplot(material)

materiallong <- pivot_longer(material, c(material1, material2, material3, material4, material5))
materiallong
materiallong$name <- as.factor(materiallong$name)
str(materiallong)

levene.test(materiallong$value, materiallong$name)

anova2 <- aov(value~name, data=materiallong)
summary(anova2)
plot(anova2)

kruskal.test(value~name, data=materiallong)
?LSD.test()
lsdmodel <- LSD.test(anova2, "name")
summary(lsdmodel)
plot(lsdmodel)

lm_material <- lm(value~name, data = materiallong)
boxcox(lm_material)
materiallong_modified <- materiallong
materiallong_modified$value <- log(materiallong_modified$value)
anova_modified_material <- aov(value~name, materiallong_modified)
plot(anova_modified_material)

#Answer to problem no-3.29
method1 <- c(31, 10, 21, 4, 1)
method2 <- c(62, 40, 24, 30, 35)
method3 <- c(53, 27, 120, 97, 68)

qqnorm(method1)
qqline(method1)
qqnorm(method2)
qqline(method2)
qqnorm(method3)
qqline(method3)

method <- data.frame(method1, method2, method3)

methodlong <- pivot_longer(method, c(method1, method2, method3))
methodlong$name <- as.factor(methodlong$name)
str(methodlong)

anova3 <- aov(value~name, methodlong)
summary(anova3)

plot(anova3)

lm_method <- lm(value~name, data = methodlong)
boxcox(lm_method)
lambda = 0.35
methodlong_modified <- methodlong
methodlong_modified$value <- (methodlong_modified$value)^lambda
anova_modified_method <- aov(value~name, methodlong_modified)
boxcox(anova_modified_method)
plot(anova_modified_method)




kruskal.test(value~name, data=methodlong)
plot(LSD.test(anova3, "name"))