Q)3.23) The effective life of insulating fluids at an accelerated load of 35 kV is being studied. Test data have been obtained for four types of fluids. The results from a completely randomized experiment were as follows:

Getting data

ftype <- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
obs <- c(17.6,18.9,16.3,17.4,20.1,21.6,16.9,15.3,18.6,17.1,19.5,20.3,21.4,23.6,19.4,18.5,20.5,22.3,19.3,21.1,16.9,17.5,18.3,19.8) 
dat1 <- cbind(ftype,obs)
dat1 <- as.data.frame(dat1)
dat1$ftype <- as.factor(dat1$ftype)
dat1$obs <- as.numeric(dat1$obs)

Q3.23)a)Is there any indication that the fluids differ? Use alpha 0.05.

Hypothesis :

Null Hypothesis : \(H_o: \mu_1= \mu_2 = \mu_3 = \mu_4 =\mu_i\)

Alternative Hypothesis : Ha: at least one of the \(\mu_i\) differs

first_model <- aov(dat1$obs~dat1$ftype,data=dat1)
summary(first_model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## dat1$ftype   3  30.17   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion :- We can see that P value 0.0525 is greater than 0.05 . Hence we fail to reject Null Hypothesis and state that the fluids does not differ.But we may also note that pvalue is not signifantly large hence there might be a chance of some ui differing . But as it is a little bigger than 0.05, lets go ahead with conclusion that we have failed to reject Null Hypothesis.

Q3.23)b)Which fluid would you select, given that the objective is long life?

Tukey_model1 <- TukeyHSD(first_model)
Tukey_model1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dat1$obs ~ dat1$ftype, data = dat1)
## 
## $`dat1$ftype`
##           diff         lwr       upr     p adj
## 2-1 -0.7000000 -3.63540073 2.2354007 0.9080815
## 3-1  2.3000000 -0.63540073 5.2354007 0.1593262
## 4-1  0.1666667 -2.76873407 3.1020674 0.9985213
## 3-2  3.0000000  0.06459927 5.9354007 0.0440578
## 4-2  0.8666667 -2.06873407 3.8020674 0.8413288
## 4-3 -2.1333333 -5.06873407 0.8020674 0.2090635
plot(Tukey_model1)

Answer:

For fluid 1 is 18.65 hours

For fluid 2 is 17.95 hours

For fluid 3 is 20.95 hours

For fluid 4 is 19.3142857 hours

Hence, We can see from Tukey plot as well as from the mean shown above , we can see that fluid 3 has mean of 20.95 which is highest than any other , hence we will suggest to choose this fluid as its mean life is hours is highest of all other

We also may note here that , as we concluded above that we reject fail to reject Null hypothesis , but as p value was not significantly bigger than 0.05 we can confirm now from the above tukey plot and analysis that there is a ui which differs . Hence we re-state our decision and say that we reject Null Hypothesis

Q3.23)c)Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied? There is nothing unusual in the residual plots

plot(first_model)

Answer:-

We can see from above residual normal plot that all data points fall fairly along striaght line hence we can state that data is normally distributed

We can also see from above plot that residual vs fitted value plot also shows that data points form a rectangular shape which clearly means that variance is constant

Hence we can claim that our assumption of Anova of Normal distribution and constant variance is adequate

Q3.28)

mat <- c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
time <- c(110,157,194,178,1,2,4,18,880,1256,5276,4355,495,7040,5307,10050,7,5,29,2)
dat2 <- cbind(mat,time)
dat2 <- as.data.frame(dat2)
dat2$mat <- as.factor(as.character(dat2$mat))
dat2$time <- as.numeric(dat2$time)
str(dat2)
## 'data.frame':    20 obs. of  2 variables:
##  $ mat : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ time: num  110 157 194 178 1 ...

Q3.28)A) Do all five had same effect?

Hypothesis :

Null Hypothesis : \(H_o: \mu_1= \mu_2 = \mu_3 = \mu_4 =\mu_i\)

Alternative Hypothesis : Ha: at least one of the \(\mu_i\) differs

second_model <- aov(dat2$time~dat2$mat,data = dat2)
summary(second_model)
##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## dat2$mat     4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion :- We can see that P value 0.00379 is less than 0.05 . Hence we reject Null Hypothesis and state that the atleast one of \(\mu_i\) differ and has differnet effect on failure time.

Q3.28)B) Resiudal plots

plot(second_model)

Conclusion :- We can see that

1) The Normaly probabilty plot of residual shows that data is not normally distributed as the data points does not fairly fall along straigth line.

2) The residual vs fitted value shows that this experiment does not have constant variance , as the plots maximum and minimum points does not make a rectangular shape rather makes funel shape.

I would recommend boxcox transformation and then testing of hypothesis for adequacy of ANOVA

Q3.28)C) Transformation

boxcox(second_model)

As we can see the 95% confidence value is almost zero , that means lambda is zero , hence we will take log of data for transformation as suggested in boxcox method

dat2$time_transformed <- log(dat2$time) 
head(dat2)
##   mat time time_transformed
## 1   1  110        4.7004804
## 2   1  157        5.0562458
## 3   1  194        5.2678582
## 4   1  178        5.1817836
## 5   2    1        0.0000000
## 6   2    2        0.6931472
second_model2 <- aov(dat2$time_transformed~dat2$mat,data = dat2)
summary(second_model2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## dat2$mat     4 165.06   41.26   37.66 1.18e-07 ***
## Residuals   15  16.44    1.10                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(second_model2)

Answer:After using boxcox to determine power of transformation and after transforming data we can see that now our anova model is adequate with normal distribution and constant variance .

But still the Pvalue is less than 0.05 hence we still reject NULL Hypothesis and claim that there is atleast one mean i which difers and that there is difference in failure time

Q3.29)

Getting data

method <- c(rep(1,5),rep(2,5),rep(3,5))
count <- c(31,10,21,4,1,62,40,24,30,35,53,27,120,97,68)
dat3 <- cbind(method,count)
dat3 <- as.data.frame(dat3)
dat3$method <- as.factor(as.character(dat3$method))
dat3$count <- as.numeric(dat3$count)
str(dat3)
## 'data.frame':    15 obs. of  2 variables:
##  $ method: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2 ...
##  $ count : num  31 10 21 4 1 62 40 24 30 35 ...

Q3.29)a) Do all method are same or differ

third_model <- aov(dat3$count~dat3$method,data = dat3)
summary(third_model)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## dat3$method  2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion :- We can see that P value 0.00643 is less than 0.05 . Hence we reject Null Hypothesis and state that the atleast one of \(\mu_i\) differ and also state that all method does not have same effect on mean particle.

Q3.29)B) Resiudal plots

plot(third_model)

Conclusion :- We can see that

1) The Normaly probabilty plot of residual shows that data is normally distributed as the data points fairly falls along straigth line.

2) The residual vs fitted value shows that this experiment does not have constant variance , as the plots maximum and minimum points does not make a rectangular shape rather it makes a funnel shape.

I would recommend boxcox transformation and then testing of hypothesis for adequacy of ANOVA

Q3.29)C) Transformation

boxcox(third_model)

As we can see the 95% confidence value is almost 0.4 , that means lambda is 0.4, hence we will take data with power 0.4 for transformation as suggested in boxcox method

dat3$count_transformed <- (dat3$count)^0.4 
head(dat3)
##   method count count_transformed
## 1      1    31          3.949523
## 2      1    10          2.511886
## 3      1    21          3.379774
## 4      1     4          1.741101
## 5      1     1          1.000000
## 6      2    62          5.211427
third_model2 <- aov(dat3$count_transformed~dat3$method,data = dat3)
summary(third_model2)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## dat3$method  2  21.21  10.605   9.881 0.00291 **
## Residuals   12  12.88   1.073                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(third_model2)

Answer:After using boxcox to determine power of transformation and after transforming data we can see that now our anova model is adequate with normal distribution and constant variance . As now points got more near the straight line in normal probability plot and in residual vs fitted plot we can see now it is in rectangular shape which states that we do have constant variance now

But still the Pvalue is less than 0.05 hence we still reject NULL Hypothesis and claim that there is atleast one mean which differs and that there is difference in failure time

Q-3.51) Use the Kruskal–Wallis test for the experiment in Problem 3.23.Compare the conclusions obtained with those from the usual analysis of variance.

kruskal.test(dat1$obs~dat1$ftype,data = dat1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  dat1$obs by dat1$ftype
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

Answer : We can see that P value 0.1015 is greater than 0.05 hence we fail to reject the NULL Hypothesis , and state that fluid does not differ .

The conclusion with ANOVA and Krusak Wallis Matches as we fail to reject Null hypothesis in both method

——————————– End ————————————

Source Code

library(dplyr)
library(MASS)
ftype <- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
obs <- c(17.6,18.9,16.3,17.4,20.1,21.6,16.9,15.3,18.6,17.1,19.5,20.3,21.4,23.6,19.4,18.5,20.5,22.3,19.3,21.1,16.9,17.5,18.3,19.8) 
dat1 <- cbind(ftype,obs)
dat1 <- as.data.frame(dat1)
dat1$ftype <- as.factor(dat1$ftype)
dat1$obs <- as.numeric(dat1$obs)
first_model <- aov(dat1$obs~dat1$ftype,data=dat1)
summary(first_model)
Tukey_model1 <- TukeyHSD(first_model)
Tukey_model1
plot(Tukey_model1)
plot(first_model)
mat <- c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
time <- c(110,157,194,178,1,2,4,18,880,1256,5276,4355,495,7040,5307,10050,7,5,29,2)
dat2 <- cbind(mat,time)
dat2 <- as.data.frame(dat2)
dat2$mat <- as.factor(as.character(dat2$mat))
dat2$time <- as.numeric(dat2$time)
str(dat2)
second_model <- aov(dat2$time~dat2$mat,data = dat2)
summary(second_model)
plot(second_model)
boxcox(second_model)
dat2$time_transformed <- log(dat2$time) 
head(dat2)
second_model2 <- aov(dat2$time_transformed~dat2$mat,data = dat2)
summary(second_model2)
plot(second_model2)
method <- c(rep(1,5),rep(2,5),rep(3,5))
count <- c(31,10,21,4,1,62,40,24,30,35,53,27,120,97,68)
dat3 <- cbind(method,count)
dat3 <- as.data.frame(dat3)
dat3$method <- as.factor(as.character(dat3$method))
dat3$count <- as.numeric(dat3$count)
str(dat3)
third_model <- aov(dat3$count~dat3$method,data = dat3)
summary(third_model)
plot(third_model)
boxcox(third_model)
dat3$count_transformed <- (dat3$count)^0.4 
head(dat3)
third_model2 <- aov(dat3$count_transformed~dat3$method,data = dat3)
summary(third_model2)
plot(third_model2)
kruskal.test(dat1$obs~dat1$ftype,data = dat1)