Is there any indication that the fluids differ? Use alpha = 0.05
There are no difference using alpha=0.05.
library(tidyr)
life_hrs <- c(17.6,18.9,16.3,17.4,20.1,21.6,16.9,15.3,18.6,17.1,19.5,20.3,21.4,23.6,19.4,18.5,20.5,22.3,19.3,21.1,16.9,17.5,18.3,19.8)
fluid_type <- c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)
fluid_type <- as.factor(fluid_type)
dat <- cbind(life_hrs,fluid_type)
dat <- as.data.frame(dat)
dat.model <- aov(life_hrs~fluid_type)
dat.model
## Call:
## aov(formula = life_hrs ~ fluid_type)
##
## Terms:
## fluid_type Residuals
## Sum of Squares 30.16500 65.99333
## Deg. of Freedom 3 20
##
## Residual standard error: 1.816498
## Estimated effects may be unbalanced
summary(dat.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## fluid_type 3 30.17 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Which fluid would you select, given that the objective is long life?
Fluid 3 is slight different from the others, and it’s average life also is bigger than the average from the others.
Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?
Yes. the residuals plots show that the assumptions are satisfied.
plot(dat.model)
Do all five materials have the same effect on mean failure time?
No, at least one material does not have the same effect.
material <- c(110,157,194,178,1,2,4,18,880,1256,5276,4355,495,7040,5307,10050,7,5,29,2)
failure <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5)
failure <- as.factor(failure)
dat3 <- cbind(material,failure)
dat3 <- as.data.frame(dat3)
dat3.model <- aov(material~failure)
dat3.model
## Call:
## aov(formula = material ~ failure)
##
## Terms:
## failure Residuals
## Sum of Squares 103191489 62505657
## Deg. of Freedom 4 15
##
## Residual standard error: 2041.334
## Estimated effects may be unbalanced
summary(dat3.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## failure 4 103191489 25797872 6.191 0.00379 **
## Residuals 15 62505657 4167044
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?
We have evidance enogh to say that the variance is not constant and the normality assumption is not valid.
plot(dat3.model)
Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.
Appliyng a log transformation we can identify that there is a difference in at least one material.
material_log <- log(c(110,157,194,178,1,2,4,18,880,1256,5276,4355,495,7040,5307,10050,7,5,29,2))
dat3_3 <- cbind(material_log,failure)
dat3_3 <- as.data.frame(dat3)
dat3_3.model <- aov(material_log~failure)
dat3_3.model
## Call:
## aov(formula = material_log ~ failure)
##
## Terms:
## failure Residuals
## Sum of Squares 165.05646 16.43689
## Deg. of Freedom 4 15
##
## Residual standard error: 1.046801
## Estimated effects may be unbalanced
summary(dat3_3.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## failure 4 165.06 41.26 37.66 1.18e-07 ***
## Residuals 15 16.44 1.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Do all methods have the same effect on mean particle count?
No, one or more methods have different effect on mean particle count.
method <- c(31,10,21,4,1,62,40,24,30,35,53,27,120,97,68)
count <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
count <- as.factor(count)
dat2 <- cbind(method,count)
dat2 <- as.data.frame(dat2)
dat2.model <- aov(method~count)
dat2.model
## Call:
## aov(formula = method ~ count)
##
## Terms:
## count Residuals
## Sum of Squares 8963.733 6796.000
## Deg. of Freedom 2 12
##
## Residual standard error: 23.79776
## Estimated effects may be unbalanced
summary(dat2.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## count 2 8964 4482 7.914 0.00643 **
## Residuals 12 6796 566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?
We have evidance that the variance of the residual observations is not constant. The normal assumption is not valid as the data does not fit in the stright line from the plot.
plot(dat2.model)
Based on your answer to part (b) conduct another analysis of the particle count data and draw appropriate conclusions.
The difference between the methods is bigger after applying a squere root transformation on the data, that is necessary when we have a bigger variance.
methodSqrt <- sqrt(c(31,10,21,4,1,62,40,24,30,35,53,27,120,97,68))
dat2_2 <- cbind(methodSqrt,count)
dat2_2 <- as.data.frame(dat2_2)
dat2_2.model <- aov(methodSqrt~count)
dat2_2.model
## Call:
## aov(formula = methodSqrt ~ count)
##
## Terms:
## count Residuals
## Sum of Squares 63.89971 38.96322
## Deg. of Freedom 2 12
##
## Residual standard error: 1.801925
## Estimated effects may be unbalanced
summary(dat2_2.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## count 2 63.90 31.95 9.84 0.00295 **
## Residuals 12 38.96 3.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is no difference, which means the variance analysis is right. Fail to reject the null hypothesis.
kruskal.test(life_hrs~fluid_type, data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: life_hrs by fluid_type
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015