DOE HW Week 6

0.1 Question 3.23:

The effective life of insulating fluids at an accelerated load of 35 kV is being studied. Test data have been obtained for four types of fluids. The results from a completely randomized experiment were as follows:

Is there any indication that the fluids differ? Use α=0.05�=0.05.
Which fluid would you select, given that the objective is long life?
Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

Solution: 3.23a

u1= for fluid type 1, mean of life (in h) at 35 kV load

u2= for fluid type 2, mean of life (in h) at 35 kV load

u3= for fluid type 3, mean of life (in h) at 35 kV load

u4= for fluid type 4, mean of life (in h) at 35 kV load

Null hypothesis test

H0: u1=u2=u3=u4 that is; all the mean of all fluid type (1,2,3,4) are equal.

Alternative hypothesis

Ha- At least one of the means (u's) differs

Reading the data
```
life <- c(17.6,18.9,16.3,17.4,20.1,21.6,16.9,15.3,18.6,17.1,19.5,20.3,21.4,23.6,19.4,18.5,20.5,22.3,19.3,21.1,16.9,17.5,18.3,19.8) 
fluid_type<- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
dat5<-cbind(fluid_type,life)
dat5<- as.data.frame(dat5)
dat5$fluid_type<-as.factor(dat5$fluid_type)
dat5$life <- as.numeric(dat5$life)
```

Model<- aov(dat5$life~dat5$fluid_type,data=dat5)
summary(Model)

##                 Df Sum Sq Mean Sq F value Pr(>F)  
## dat5$fluid_type  3  30.16   10.05   3.047 0.0525 .
## Residuals       20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

0.1.1 Solution 3.23b

Model2<- TukeyHSD(Model)
Model2

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dat5$life ~ dat5$fluid_type, data = dat5)
## 
## $`dat5$fluid_type`
##           diff         lwr       upr     p adj
## 2-1 -0.7000000 -3.63540073 2.2354007 0.9080815
## 3-1  2.3000000 -0.63540073 5.2354007 0.1593262
## 4-1  0.1666667 -2.76873407 3.1020674 0.9985213
## 3-2  3.0000000  0.06459927 5.9354007 0.0440578
## 4-2  0.8666667 -2.06873407 3.8020674 0.8413288
## 4-3 -2.1333333 -5.06873407 0.8020674 0.2090635

plot(Model2)

Which fluid would you select, given that the objective is long life?

From the plot above, fluid 3 seems to have a longer average life, which will be our choice.

0.1.2 Solution 3.23c

Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

We get a plot of the model

plot(Model)

Deductions: 1. The Normal QQ residual plot shows some normality as the data appears to be in a straight line

The variance is constant since the residual vs. fitted plot shows some dispersion.

0.2 Question 3.28

An experiment was performed to investigate the effectiveness of five insulating materials. Four samples of each material were tested at an elevated voltage level to accelerate the time to failure. The failure times (in minutes) are shown below:

a. Do all five materials have the same effect on mean failure time?

b. Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

c. Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

0.2.1 Solution 3.28a

Reading the data:

material <- c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
t <- c(110,157,194,178,1,2,4,18,880,1256,5276,4355,495,7040,5307,10050,7,5,29,2)
dat <- cbind(material,t)
dat <- as.data.frame(dat)
dat$material <- as.factor(as.character(dat$material))
dat$t<- as.numeric(dat$t)
str(data)

## function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), 
##     envir = .GlobalEnv, overwrite = TRUE)

Testing the hypothesis:

Null hypothesis is that

H0: u1=u2=u3=u4=u5 that is; all the mean are equal.

Alternative hypothesis: Ha=At least one of the means (u's) differs

Model3<- aov(dat$t~dat$material,data = dat)
summary(Model3)

##              Df    Sum Sq  Mean Sq F value  Pr(>F)   
## dat$material  4 103191489 25797872   6.191 0.00379 **
## Residuals    15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: P value is 0.00379

As a result, we reject the Null hypothesis. This implies that the entire u’s are not the same.

0.2.2 Solution 3.28b

Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals.What information is conveyed by these plots?

plot(Model3)

Conclusion

The Normal Q-Q plot is not normally distributed as the data doesn’t seem to fall on a straight line.

Also, the residual vs. fitted plot shows a wide variety of spread. This implies that the variance is not constant.

0.2.3 Solution 3.28c

library(MASS)
boxcox(Model3)

Since Lambda is close to zero, we take the log of the data

dat$time <- log(dat$t) 
head(dat)

##   material   t      time
## 1        1 110 4.7004804
## 2        1 157 5.0562458
## 3        1 194 5.2678582
## 4        1 178 5.1817836
## 5        2   1 0.0000000
## 6        2   2 0.6931472

Model4<-aov(dat$time~dat$material,data = dat)
summary(Model4)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## dat$material  4 165.06   41.26   37.66 1.18e-07 ***
## Residuals    15  16.44    1.10                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we plot the model

plot(Model4)

On applying the log transformation (power transformation), the data becomes normally distributed since the plot appears to fall on a straight line.

Also, the residual vs. fitted plot reveals that the spread is fairly even showing a constant variance.

0.3 Question 3.29

A semiconductor manufacturer has developed three different methods for reducing particle counts on wafers. All three methods are tested on five different wafers and the after treatment particle count obtained. The data are shown below:

a. Do all methods have the same effect on mean particle count?

b. Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?

c. Based on your answer to part (b) conduct another analysis of the particle count data and draw appropriate conclusions.

0.3.1 Solution 3.29a

method<- c(rep(1,5),rep(2,5),rep(3,5))
count<- c(31,10,21,4,1,62,40,24,30,35,53,27,120,97,68)
dat1 <- cbind(method,count)
dat1 <- as.data.frame(dat1)
dat1$method <- as.factor(as.character(dat1$method))
dat1$count <- as.numeric(dat1$count)
str(data)

## function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), 
##     envir = .GlobalEnv, overwrite = TRUE)

let

u1= mean count using method 1

u2= mean count using method 2

u3= mean count using method 3

Null hypothesis is that

H0: u1=u2=u3

Alternative hypothesis

Ha: At least one of mean count of the three methods (1,2,3) is different

Model5<- aov(dat1$count~dat1$method,data = dat1)
summary(Model5)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## dat1$method  2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We have a P value of 0.00643, which is less than 0.05.

As a result, we reject the null hypothesis hence the method doesn’t have similar mean effect.

0.3.2 Solution 3.29b

plot(Model4)

0.3.3 Solution 3.29c

We perform data transformation to get the needed result:

plot(Model4)

Since lambda appears to be 0.47, we take the power transformation raised to 0.47

dat1$new_count<-(dat1$count)^0.47
head(dat1)

##   method count new_count
## 1      1    31  5.022732
## 2      1    10  2.951209
## 3      1    21  4.182569
## 4      1     4  1.918528
## 5      1     1  1.000000
## 6      2    62  6.957033

Model6<-aov(dat1$new_count~dat1$method,data = dat1)
summary(Model6)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## dat1$method  2  46.26  23.132   9.875 0.00292 **
## Residuals   12  28.11   2.343                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plotting the model

plot(Model6)

Conclusion

The residual normal QQ plot seems to appear on a straight line showing that the data is fairly normal.

We also conclude that the residual and fitted plot has a constant variance since the spread appears to be similar.

We have a P-value of 0.0029, so we conclude that at least one of the null value differs.

0.4 Question 3.51 & 3.52

Use the Kruskal–Wallis test for the experiment in Problem 3.23.

Compare the conclusions obtained with those from the usual analysis of variance

reading the data

life <- c(17.6,18.9,16.3,17.4,20.1,21.6,16.9,15.3,18.6,17.1,19.5,20.3,21.4,23.6,19.4,18.5,20.5,22.3,19.3,21.1,16.9,17.5,18.3,19.8) 
fluid_type<- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
dat5<-cbind(fluid_type,life)
dat5<- as.data.frame(dat5)
dat5$fluid_type<-as.factor(dat5$fluid_type)
dat5$life <- as.numeric(dat5$life)

Performing the test

kruskal.test(dat5$life~dat5$fluid_type,data = dat5)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  dat5$life by dat5$fluid_type
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

Stating the Hypothesis:

Null: H₀: μ₁ = μ₂ = μ₃ = μ₄
Alternate: H₁: μ_i ≠ μ_j for at least one pair (i,j)

Now analyzing the Kruskal Wallis Test:

We have a P value of 0.1015 which is higher than 0.05. As a result, we do not reject the NULL hypothesis.

The ANOVA result gave similar result even with lower P value.

As a result, we can conclude from the kruskal wallis test that this test is certain. Also, we conclude that the meal life of both fluids are not different. This makes the result similar to what we had in Q 3.23.

0.5 Complete R Code

#Question 3.23a

life <- c(17.6,18.9,16.3,17.4,20.1,21.6,16.9,15.3,18.6,17.1,19.5,20.3,21.4,23.6,19.4,18.5,20.5,22.3,19.3,21.1,16.9,17.5,18.3,19.8) 
fluid_type<- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
dat5<-cbind(fluid_type,life)
dat5<- as.data.frame(dat5)
dat5$fluid_type<-as.factor(dat5$fluid_type)
dat5$life <- as.numeric(dat5$life)

Model<- aov(dat5$life~dat5$fluid_type,data=dat5)
summary(Model)

#Question 3.23b
Model2<- TukeyHSD(Model)
Model2

#plotting the model

plot(Model2)


#Question 3.23c
plot(Model)

#Question 3.28a

material <- c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
t <- c(110,157,194,178,1,2,4,18,880,1256,5276,4355,495,7040,5307,10050,7,5,29,2)
dat <- cbind(material,t)
dat <- as.data.frame(dat)
dat$material <- as.factor(as.character(dat$material))
dat$t<- as.numeric(dat$t)
str(data)

#Testing the hypothesis
Model3<- aov(dat$t~dat$material,data = dat)
summary(Model3)

#Question 3.28b

plot(Model3)

#Question 3.28c

library(MASS)
boxcox(Model3)

Model4<-aov(dat$time~dat$material,data = dat)
summary(Model4)

#plot the model
plot(Model4)


#Question 3.29
method<- c(rep(1,5),rep(2,5),rep(3,5))
count<- c(31,10,21,4,1,62,40,24,30,35,53,27,120,97,68)
dat1 <- cbind(method,count)
dat1 <- as.data.frame(dat1)
dat1$method <- as.factor(as.character(dat1$method))
dat1$count <- as.numeric(dat1$count)
str(data)

#Testing the hypothesis

Model5<- aov(dat1$count~dat1$method,data = dat1)
summary(Model5)


#Question 3.29b

plot(Model4)

#Question 3.29c

#tranforming the data
plot(Model4)

#taking the power transformation

dat1$new_count<-(dat1$count)^0.47
head(dat1)

Model6<-aov(dat1$new_count~dat1$method,data = dat1)
summary(Model6)

plot(Model6)


#Question 3.51 & 3.52
kruskal.test(dat5$life~dat5$fluid_type,data = dat5)