1 Question 3.23:

The effective life of insulating fluids at an accelerated load of 35 kV is being studied. Test data have been obtained for four types of fluids. The results from a completely randomized experiment were as follows:

  1. Is there any indication that the fluids differ? Use \(\alpha=0.05\).

  2. Which fluid would you select, given that the objective is long life?

  3. Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

1.1 Solution:

PART A:

Reading the Data:

Life <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6, 16.9, 15.3, 18.6, 17.1, 19.5, 20.3, 21.4, 23.6, 19.4, 18.5, 20.5, 22.3, 19.3, 21.1, 16.9, 17.5, 18.3, 19.8) 
Type <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
Data <- data.frame(Life, Type)
Data$Type <- as.factor(Data$Type)
str(Data)
## 'data.frame':    24 obs. of  2 variables:
##  $ Life: num  17.6 18.9 16.3 17.4 20.1 21.6 16.9 15.3 18.6 17.1 ...
##  $ Type: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 2 2 2 2 ...

Performing ANOVA:

First Stating the Hypothesis:

Null:     H0: μ1 = μ2 = μ3 = μ4
Alternate: H1: μi ≠ μj for at least one pair (i,j)

aov.model<-aov(Life~Type,data=Data)
summary(aov.model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Type         3  30.17   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

—> At α = 0.05 there is no indication that the fluids differs and we fail to reject Null Hypothesis, but since the P-value (0.0525) is just slightly above 0.05, there is probably a difference in means at higher significance level


PART B:

Performing LSD Test to identify which treatment mean is highest and which pairs differ the most if so:

library(agricolae)
?LSD.test
LSD.test(aov.model,"Type",console=TRUE)
## 
## Study: aov.model ~ "Type"
## 
## LSD t Test for Life 
## 
## Mean Square Error:  3.299667 
## 
## Type,  means and individual ( 95 %) CI
## 
##       Life      std r      LCL      UCL  Min  Max
## 1 18.65000 1.952178 6 17.10309 20.19691 16.3 21.6
## 2 17.95000 1.854454 6 16.40309 19.49691 15.3 20.3
## 3 20.95000 1.879096 6 19.40309 22.49691 18.5 23.6
## 4 18.81667 1.554885 6 17.26975 20.36358 16.9 21.1
## 
## Alpha: 0.05 ; DF Error: 20
## Critical Value of t: 2.085963 
## 
## least Significant Difference: 2.187666 
## 
## Treatments with the same letter are not significantly different.
## 
##       Life groups
## 3 20.95000      a
## 4 18.81667     ab
## 1 18.65000      b
## 2 17.95000      b

Conclusion:

---> Given that the Objective is to select a fluid with Long Life, I would choose Fluid Type 3 because it has the highest average

PART C:

Analyzing Residuals:

plot(aov.model)

Conclusion:

---> There is nothing unusual in the residual plots & both the normal distribution and constant variance assumptions are satisfied, and model is adequate

2 Question 3.28:

An experiment was performed to investigate the effectiveness of five insulating materials. Four samples of each material were tested at an elevated voltage level to accelerate the time to failure. The failure times (in minutes) are shown below:

  1. Do all five materials have the same effect on mean failure time?

  2. Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

  3. Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

2.1 Solution:

PART A:

Reading the Data:

Time <- c(110, 157, 194, 178, 1, 2, 4, 18, 880, 1256, 5276, 4355, 495, 7040, 5307, 10050, 7, 5, 29, 2)
Type <- c(rep(1,4), rep(2,4), rep(3,4), rep(4,4), rep(5,4))
Data <- data.frame(Time,Type)
Data$Type <- as.factor(Data$Type)
str(Data)
## 'data.frame':    20 obs. of  2 variables:
##  $ Time: num  110 157 194 178 1 ...
##  $ Type: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...

Performing ANOVA:

First Stating the Hypothesis:

Null Hypothesis: \(Ho:μ1=μ2=μ3=μ4=μ5\)

Alternative Hypothesis: \(Ha: At least\space one \space μi\) differs

aov.model<-aov(Time~Type,data=Data)
summary(aov.model)
##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## Type         4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

---> With a p-value of 0.00379 and a significance level of 0.05, we reject the Null hypothesis,and all five materials do not have the same effect on mean failure time

PART B:

Plotting the residuals using Anova Plots:

library(ggfortify)
library(ggplot2)
autoplot(aov.model)

Conclusion:

---> 1) The Normaly probabilty plot of residual shows that data is not normally distributed as the data points does not fairly fall along straigth line. 2)The residual vs fitted value shows that this experiment does not have constant variance , as the plots maximum and minimum points does not make a rectangular shape The requirements for valid ANOVA are violated

PART C:

Based on Part B results we have to perform Data transformation (Either BoxCox or Natural Log) in order to stabilize the variance:

To visually see differences in variance, plotting Box Plot:

boxplot(Data$Time~Data$Type,xlab="Material Type",ylab="Failure Time",main="Boxplot of Observations")

---> Huge differences in variances between Failure Times data of different Material Types

Performing BoxCox Transformation:

library(MASS)
boxcox(Time~Type)

---> From box cox Maximum Liklihood plot, 1 is not in 95% confidence interval which confirms that data transformation is required. Since Maximum likelihood function is corresponding to almost zero value of lambda, we will perform a Natural log transformation of our data.

Performing Natural Log Transformation on Failure times data and Analyzing Variances:

LogTime <- log(Time)
boxplot(LogTime~Data$Type,xlab="Material Type",ylab="Failure Time",main="Boxplot of Observations")

---> The Box Plot of Log Transformed data shows that now the spread of time observations between Material Type 2,3,4,5 is close but still not close with Material Type 1

Checking Residuals Plots for Log Transformed Data:

DataT<-data.frame(LogTime,Type)
DataT$Type <- as.factor(DataT$Type)
aovmodelT<-aov(LogTime~Type,data=DataT)
autoplot(aovmodelT)

---> Looking at the residuals plot, the normal probability plot seems to follow straight line but variances are still not stabalized as seen by the difference in spread of residuals over the fitted values in “Residuals vs Fitted” plot

Also,

---> Over here the transformation did not seem to work because the data is messy with outliers, however we will now resort to Non-Parametric ANOVA Test i.e. Kruskal Wallis Test
kruskal.test(Time,Type,data=Data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Time and Type
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046

Conclusion:

---> The result we obtained from the Non Parametric Anova test is a P value of 0.002046 at α=0.05. This is less than our α of 0.05. Therefore, we will again reject the null hypothesis and conclude that the five materials do not have the same effect on mean failure time. We say this with more certainty now that we have used the correct test

3 Question 3.29:

A semiconductor manufacturer has developed three different methods for reducing particle counts on wafers. All three methods are tested on five different wafers and the after treatment particle count obtained. The data are shown below:

  1. Do all methods have the same effect on mean particle count?

  2. Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?

  3. Based on your answer to part (b) conduct another analysis of the particle count data and draw appropriate conclusions.

3.1 Solution:

PART A:

Reading the Data:

Count <- c(31, 10, 21, 4, 1, 62, 40, 24, 30, 35, 53, 27, 120, 97, 68)
Type <- c(rep(1,5), rep(2,5), rep(3,5))
Data <- data.frame(Count,Type)
Data$Type <- as.factor(Data$Type)
str(Data)
## 'data.frame':    15 obs. of  2 variables:
##  $ Count: num  31 10 21 4 1 62 40 24 30 35 ...
##  $ Type : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2 ...

Performing ANOVA:

First Stating the Hypothesis:

\(H_o:μ1=μ2=μ3\)
\(Ha: \space Atleast \space One \space\mu_{i} \space differs\)

Where 1, 2, and 3 correspond to Method 1, Method 2, and Method 3

aov.model<-aov(Count~Type,data=Data)
summary(aov.model)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Type         2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

---> Since our p-value is 0.00643, thus at 0.05 level of significance we reject Ho and we conclude that atleast one mui differs which means atleast one method has a different effect on mean particle count

PART B:

Plotting the Residuals:

autoplot(aov.model)

The Normal Probability Plot (titled ‘Normal Q-Q’) shows that the residuals are normal

But,

If we look at the “Residuals vs Fitted” plot, we can see that the spread of three methods is not constant and thus we can’t make constant variance assumption which is required for a valid ANOVA test

PART C:

As per Part B results, we need to perform Data Transformation to conclude appropriate results:

library(MASS)
boxplot(Data$Count~Data$Type,xlab="Method Type",ylab="Particle Count",main="Boxplot of Observations")

Since Variances vary, we will stabilize them using BoxCox transformation:

boxcox(Count~Type)

---> One is outside confidence interval and the likelihood function is maximum close to 0.4 value of lambda, thus we would perform a transformation on count data at (lambda = 0.4)

lambda <- 0.4
CountT<-Count^(lambda)

We look to see how the transformation did:

boxplot(CountT~Data$Type,xlab="Method Type",ylab="Particle Count",main="Boxplot of Observations")

boxcox(CountT~Type)

--->Now the spread of particle count is better than before the transformation and from the BoxCox graph we see that now the value of 1 is in the confidence interval.Now ANOVA Test will be held valid.

ANOVA Analysis on Transformed Data:

DataT <- data.frame(CountT,Type)
DataT$Type <- as.factor(DataT$Type)
str(DataT)
## 'data.frame':    15 obs. of  2 variables:
##  $ CountT: num  3.95 2.51 3.38 1.74 1 ...
##  $ Type  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2 ...
aov.modelT<-aov(CountT~Type,data=DataT)
summary(aov.modelT)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Type         2  21.21  10.605   9.881 0.00291 **
## Residuals   12  12.88   1.073                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
autoplot(aov.modelT)

CONCLUSION:

---> After transforming data we can see that now our anova model is adequate with normal distribution and constant variance. From the residual vs fitted plot we can see now it is in rectangular shape which states that we do have constant variance now. From anova analysis on transformed data, our p-value is 0.00291, so at 0.05 level of significance we can say that atleast one mui differs which means that method type has a significant effect on mean particle count.

4 Question 3.51 & 3.52:

Use the Kruskal–Wallis test for the experiment in Problem 3.23.

3.51) Compare the conclusions obtained with those from the usual analysis of variance.
3.52) Are the results comparable to those found by the usual analysis of variance?

4.1 Solution:

Reading the Data:

Life <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6, 16.9, 15.3, 18.6, 17.1, 19.5, 20.3, 21.4, 23.6, 19.4, 18.5, 20.5, 22.3, 19.3, 21.1, 16.9, 17.5, 18.3, 19.8) 
Type <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
Data <- data.frame(Life,Type)
Data$Type <- as.factor(Data$Type)
str(Data)
## 'data.frame':    24 obs. of  2 variables:
##  $ Life: num  17.6 18.9 16.3 17.4 20.1 21.6 16.9 15.3 18.6 17.1 ...
##  $ Type: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 2 2 2 2 ...

Stating the Hypothesis:

Null:     H0: μ1 = μ2 = μ3 = μ4
Alternate: H1: μi ≠ μj for at least one pair (i,j)

Now Performing Kruskal Wallis Test:

kruskal.test(Life~Type,data=Data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Life by Type
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

Conclusion:

---> We can see that P value 0.1015 is greater than 0.05 hence we fail to reject the NULL Hypothesis , and state that fluid does not differ. The results from the ANOVA test yielded the same result but with a lower p value. Therefore, using the kruskal wallis test allows us to make this conclusion with more certainty however the p-values from both the tests are in the acceptance region and we can conclude there are no difference bewteen the mean life of the fluid types. Therefore the results and conclsions are comparable to those found by analysis of variance in Question 3.23

5 Source Code:

getwd()

##Question 3.23:

#PART A:
#Reading the Data:
Life <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6, 16.9, 15.3, 18.6, 17.1, 19.5, 20.3, 21.4, 23.6, 19.4, 18.5, 20.5, 22.3, 19.3, 21.1, 16.9, 17.5, 18.3, 19.8) 
Type <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
Data <- data.frame(Life, Type)
Data$Type <- as.factor(Data$Type)
str(Data)
#ANOVA:
aov.model<-aov(Life~Type,data=Data)
summary(aov.model)

#PART B:
LSD.test(aov.model,"Type",console=TRUE)

#PART C:
plot(aov.model)

##Question 3.28:
#Reading the data:
Time <- c(110, 157, 194, 178, 1, 2, 4, 18, 880, 1256, 5276, 4355, 495, 7040, 5307, 10050, 7, 5, 29, 2)
Type <- c(rep(1,4), rep(2,4), rep(3,4), rep(4,4), rep(5,4))
Data <- data.frame(Time,Type)
Data$Type <- as.factor(Data$Type)
str(Data)

#PART A:
aov.model<-aov(Time~Type,data=Data)
summary(aov.model)

#PART B:
library(ggfortify)
library(ggplot2)
autoplot(aov.model)

#PART C:
library(MASS)
boxplot(Data$Time~Data$Type,xlab="Material Type",ylab="Failure Time",main="Boxplot of Observations")
boxcox(Time~Type)
LogTime <- log(Time)
boxplot(LogTime~Data$Type,xlab="Material Type",ylab="Failure Time",main="Boxplot of Observations")
DataT<-data.frame(LogTime,Type)
DataT$Type <- as.factor(DataT$Type)
str(DataT)
aovmodelT<-aov(LogTime~Type,data=DataT)
plot(aovmodelT)
?kruskal.test
kruskal.test(Time,Type,data=Data)

##Question 3.29:

#Reading the Data:
Count <- c(31, 10, 21, 4, 1, 62, 40, 24, 30, 35, 53, 27, 120, 97, 68)
Type <- c(rep(1,5), rep(2,5), rep(3,5))
Data <- data.frame(Count,Type)
Data$Type <- as.factor(Data$Type)
str(Data)

#PART A:
aov.model<-aov(Count~Type,data=Data)
summary(aov.model)
#PART B:
autoplot(aov.model)
#PART C:
library(MASS)
boxplot(Data$Count~Data$Type,xlab="Method Type",ylab="Particle Count",main="Boxplot of Observations")
boxcox(Count~Type)
lambda <- 0.4
CountT<-Count^(lambda)
#we look to see how the transformation did
boxplot(CountT~Data$Type,xlab="Method Type",ylab="Particle Count",main="Boxplot of Observations")
boxcox(CountT~Type)
#ANOVA Analysis on Transformed Data
DataT <- data.frame(CountT,Type)
DataT$Type <- as.factor(DataT$Type)
str(DataT)
aov.modelT<-aov(CountT~Type,data=DataT)
summary(aov.modelT)
autoplot(aov.modelT)

#Question 3.51 Question 3.52:
#Reading the Data:
Life <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6, 16.9, 15.3, 18.6, 17.1, 19.5, 20.3, 21.4, 23.6, 19.4, 18.5, 20.5, 22.3, 19.3, 21.1, 16.9, 17.5, 18.3, 19.8) 
Type <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
Data <- data.frame(Life,Type)
Data$Type <- as.factor(Data$Type)
str(Data)
kruskal.test(Life~Type,data=Data)