Necessary library:
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.9
## v tidyr 1.2.1 v stringr 1.4.1
## v readr 2.1.2 v forcats 0.5.2
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lawstat)
## Warning: package 'lawstat' was built under R version 4.1.3
library(agricolae)
## Warning: package 'agricolae' was built under R version 4.1.3
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
Data entry and sorting:
fluid1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
fluid2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
fluid <- data.frame(fluid1, fluid2, fluid3, fluid4)
fluidlong <- pivot_longer(fluid, c(fluid1, fluid2, fluid3, fluid4))
fluidlong$name <- as.factor(fluidlong$name)
First, we check the normality of the data and the variance and take the decision if we want to use parametric tests or not-
boxplot(fluid)
qqnorm(fluid1)
qqline(fluid1)
qqnorm(fluid2)
qqline(fluid2)
qqnorm(fluid3)
qqline(fluid3)
qqnorm(fluid4)
qqline(fluid4)
levene.test(fluidlong$value, fluidlong$name, location ="mean")
##
## Classical Levene's test based on the absolute deviations from the mean
## ( none not applied because the location is not set to median )
##
## data: fluidlong$value
## Test Statistic = 0.1462, p-value = 0.9309
The results of our initial analysis show that the variance are similar (indicated y the boxplot and the levene test. The data for each fluid type also follow a normal distribution. Hence we move forward with ANOVA
If u1,u2,u3,u4,u5 are the means of the fluid types 1-5 respectively,
Null hypothesis, Ho: u1 = u2 = u3 =u4 = u5
Alternative hypothesis, Ha: Atleast one of the means differ
anova <- aov(value~name, data = fluidlong)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 30.16 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on our ANOVA result, we see that our p-value is greater than our alpha. Hence we fail to reject the null hypthesis that the mean lifetime of all the fluids are the same.
There is no strong evidence that the fluids differ
boxplot(fluid)
From the boxplot we see that fluid 3 has the highest mean, median and max and min value compared to the other fluids.
Hence, we’ll choose fluid 3 if the objective is higher fluid life.
We check the residuals of our ANOVA:
plot(anova)
From the plots of ANOVA, we see that the residuals follow a fairly linear trend indicating normality. Again, the residuals vs fitted value have fairly equal lengths indicating that the variance in them is the same.
Hence the model is adequate.
We compare the results from 3.23-
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 30.16 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal.test(value~name, data = fluidlong)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015
We see that our p-value from the Kruskal-Wallis test is higher than our threshold alpha. Hence we fail to reject the null hypothesis that the mean life of different fluids are the same.
With the ANOVA and Kruskal-Wallis test, we reach the same conclusion that the mean life time of the fluids are the same.
We compare the results from 3.23-
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 30.16 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal.test(value~name, data = fluidlong)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015
The results of Kruskal-Wallis shows higher P-value compared to ANOVA. This indicates a stronger evidence for the null hypothesis compared to ANOVA. Although both the tests indicate that we fail to reject the null hypothesis.
material1 <- c(110, 157, 194, 178)
material2 <- c(1, 2, 4 ,18)
material3 <- c(880, 1256, 5276, 4355)
material4 <- c(495, 7040, 5307, 10050)
material5 <- c(7, 5, 29, 2)
material <- data.frame(material1, material2, material3, material4, material5)
materiallong <- pivot_longer(material, c(material1, material2, material3, material4, material5))
#materiallong
materiallong$name <- as.factor(materiallong$name)
#str(materiallong)
If u1,u2,u3,u4,u5 are the mean failure times for material 1-5 respectively,
The null hypothesis, Ho: u1 = u2=u3=u4=u5
Alternative hypothesis, Ha: At least one of the means differ.
anova2 <- aov(value~name, data=materiallong)
summary(anova2)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 4 103191489 25797872 6.191 0.00379 **
## Residuals 15 62505657 4167044
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the results of ANOVA, we see that the p-value is lower than our threshold critical value of 0.05. Hence we reject the null hypothesis that the mean failure times are the same for all the groups of materials.
The materials do not have the same effect on the mean failure times.
We plot the results of ANOVA as follows-
anova2 <- aov(value~name, data=materiallong)
plot(anova2)
From the normal probability plot of the residuals, we see that the residuals do not follow a linear trend, indicating that they deviate from normality.
Also the residuals vs fitted value data show that the variance width are not the same for all the groups. Also they form a funnel-like shape indicating that the group variances are not the same.
We perform a boxcox transformation on the data to try stabilizing the variance-
library(MASS)
lm_material <- lm(value~name, data = materiallong)
boxcox(lm_material)
materiallong_modified <- materiallong
materiallong_modified$value <- log(materiallong_modified$value)
anova_modified_material <- aov(value~name, materiallong_modified)
plot(anova_modified_material)
From the boxcox transformation, we see that the value of lamda is almost close to zero. Hence we perform a log -transformation on the data.
After transformation, we perform ANOVA again. This time we see that the variance show similarity between groups as indicated by the equal width residual boxes in residual vs fitted values. Also there exists normality in the normal qq-plot data, indicating that the residuals are fairly normally distributed.
Although this does not change our p-value from the ANOVA.
Data entry and sorting:
method1 <- c(31, 10, 21, 4, 1)
method2 <- c(62, 40, 24, 30, 35)
method3 <- c(53, 27, 120, 97, 68)
method <- data.frame(method1, method2, method3)
methodlong <- pivot_longer(method, c(method1, method2, method3))
methodlong$name <- as.factor(methodlong$name)
If u1,u2,u3 are the mean counts for methods 1-3 respectively,
The null hypothesis, Ho: u1 = u2=u3
Alternative hypothesis, Ha: At least one of the means differ.
anova3 <- aov(value~name, methodlong)
summary(anova3)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 2 8964 4482 7.914 0.00643 **
## Residuals 12 6796 566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see from our ANOVA results that the p-value is less than our critical threshold alpha of 0.05. Hence we reject the null hypothesis that the mean count are the same for different methods.
At least one of the groups differ significantly from the rest.
We plot the results of ANOVA as follows-
plot(anova3)
From the plots of ANOVA results, we see that the residuals have a fairly normal distribution as seen in the normal qq-plot.
Again, from the residuals vs fitted value plots, we see that the variances are not the same for all the groups. It shows a funnel-like shape indicating non-similar variances in different groups.
We perform a boxcox transformation on the data to try stabilizing the variance-
library(MASS)
lm_method <- lm(value~name, data = methodlong)
boxcox(lm_method)
lambda = 0.35
methodlong_modified <- methodlong
methodlong_modified$value <- (methodlong_modified$value)^lambda
anova_modified_method <- aov(value~name, methodlong_modified)
boxcox(anova_modified_method)
plot(anova_modified_method)
From the boxcox we see that the 95% log-likelihood value is between 0.1 to 0.8. We choose our lambda to be 0.35. After using transformed data, we see that the residuals have a normal distribution. Also the variance starts stabilizing further with more boxed pattern in the residuals vs predicted values.
But this does not change our initial conclusion that we reject our null hypothesis based on the p-value.
#Anwer to problem no 3.23
library(tidyverse)
library(lawstat)
library(agricolae)
library(MASS)
fluid1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
fluid2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
qqnorm(fluid1)
qqline(fluid1)
qqnorm(fluid2)
qqline(fluid2)
qqnorm(fluid3)
qqline(fluid3)
qqnorm(fluid4)
qqline(fluid4)
fluid <- data.frame(fluid1, fluid2, fluid3, fluid4)
boxplot(fluid)
fluidlong <- pivot_longer(fluid, c(fluid1, fluid2, fluid3, fluid4))
fluidlong$name <- as.factor(fluidlong$name)
str(fluidlong)
?levene.test()
levene.test(fluidlong$value, fluidlong$name, location ="mean")
anova <- aov(value~name, data = fluidlong)
summary(anova)
plot(anova)
#answer to probem no.3.51
kruskal.test(value~name, data = fluidlong)
#Anwer to problem no 3.28
material1 <- c(110, 157, 194, 178)
material2 <- c(1, 2, 4 ,18)
material3 <- c(880, 1256, 5276, 4355)
material4 <- c(495, 7040, 5307, 10050)
material5 <- c(7, 5, 29, 2)
qqnorm(material1)
qqline(material1)
qqnorm(material2)
qqline(material2)
qqnorm(material3)
qqline(material3)
qqnorm(material4)
qqline(material4)
qqnorm(material5)
qqline(material5)
material <- data.frame(material1, material2, material3, material4, material5)
boxplot(material)
materiallong <- pivot_longer(material, c(material1, material2, material3, material4, material5))
materiallong
materiallong$name <- as.factor(materiallong$name)
str(materiallong)
levene.test(materiallong$value, materiallong$name)
anova2 <- aov(value~name, data=materiallong)
summary(anova2)
plot(anova2)
kruskal.test(value~name, data=materiallong)
?LSD.test()
lsdmodel <- LSD.test(anova2, "name")
summary(lsdmodel)
plot(lsdmodel)
lm_material <- lm(value~name, data = materiallong)
boxcox(lm_material)
materiallong_modified <- materiallong
materiallong_modified$value <- log(materiallong_modified$value)
anova_modified_material <- aov(value~name, materiallong_modified)
plot(anova_modified_material)
#Answer to problem no-3.29
method1 <- c(31, 10, 21, 4, 1)
method2 <- c(62, 40, 24, 30, 35)
method3 <- c(53, 27, 120, 97, 68)
qqnorm(method1)
qqline(method1)
qqnorm(method2)
qqline(method2)
qqnorm(method3)
qqline(method3)
method <- data.frame(method1, method2, method3)
methodlong <- pivot_longer(method, c(method1, method2, method3))
methodlong$name <- as.factor(methodlong$name)
str(methodlong)
anova3 <- aov(value~name, methodlong)
summary(anova3)
plot(anova3)
lm_method <- lm(value~name, data = methodlong)
boxcox(lm_method)
lambda = 0.35
methodlong_modified <- methodlong
methodlong_modified$value <- (methodlong_modified$value)^lambda
anova_modified_method <- aov(value~name, methodlong_modified)
boxcox(anova_modified_method)
plot(anova_modified_method)
kruskal.test(value~name, data=methodlong)
plot(LSD.test(anova3, "name"))