Homework Week 6 - Fred Gersdorff

This report captures work done for the individual homework for Week 6. R code along with the results are provided. The required homework problems were taken from “Design and Analysis of Experiments 8th Edition”:
1) 3.23
2) 3.28
3) 3.29
3) 3.51
Answers to the questions are in blue.

Problem 1 (3.23)

QUESTION

The effective life of insulating fluidsat an accelerated load of 35 kV is being studied. Test data have been obtainedfor four types of fluids. The results from a completely randomzied experiment were as follows: [See table of values in book.]
(a) Is there any indication that the fluids differ? Use alpha = 0.05.
(b) Which fluid would you select, given that the objective is long life?
(c) Analyze the residuals from the experiment. Are the basic analysis of variance assumptions satisfied?

Part(a)

In order to understand if the fluids differ, we will run an ANOVA test, with the following hypotheses:
Null: H₀: μ₁ = μ₂ = μ₃ = μ₄
Alternate: H₁: μ_i ≠ μ_j for at least one pair (i,j)

My_ANOVA_Input_Table <- as.data.frame(cbind(My_Factor, My_Response))
names(My_ANOVA_Input_Table) <- c(My_Factor_Name,My_Response_Name) 
My_ANOVA_Input_Table[[2]] <- as.numeric(My_ANOVA_Input_Table[[2]])
My_ANOVA_Input_Table[[1]] <- as.factor(My_ANOVA_Input_Table[[1]])
model <- aov(My_ANOVA_Input_Table[[2]] ~ My_ANOVA_Input_Table[[1]], 
             data= My_ANOVA_Input_Table)
summary(model)

##                           Df Sum Sq Mean Sq F value Pr(>F)  
## My_ANOVA_Input_Table[[1]]  3  30.17   10.05   3.047 0.0525 .
## Residuals                 20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 0.0525 and a significance level of 0.05, we technically will fail to reject the Null hypothesis. There is barely not enough evidence to support the claim of a significant difference between the sample means.

Part(b)

Which fluid would you select, given that the objective is long life?

Looking at the means of the Fluid Types:

aggregate(My_ANOVA_Input_Table[[2]], list(My_ANOVA_Input_Table[[1]]), FUN=mean)

##   Group.1        x
## 1     FT1 18.65000
## 2     FT2 17.95000
## 3     FT3 20.95000
## 4     FT4 18.81667

I would choose Fluid Type 3 because it has the highest average.

Part(c)

Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

autoplot(model)

When looking at the charts, one can see that the the basic assumptions for normality and constant variance ar met.

Problem 2 (3.28)

QUESTION

An experiment was performed to investigate the effectiveness of five insulating materials. Four samples of each material were tested at an elevated voltage level to accelerate the time to failure. The failure times (in minutes) are shown below: [See table of values in book.]
(a) Do all five materials have the same effect on mean failure time?
(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?
(c) Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

Part(a)

In order to understand if the material types have an effect on mean failure time, we will run an ANOVA test, with the following hypotheses:
Null: H₀: μ₁ = μ₂ = μ₃ = μ₄ = μ₅
Alternate: H₁: μ_i ≠ μ_j for at least one pair (i,j)

My_ANOVA_Input_Table <- as.data.frame(cbind(My_Factor, My_Response))
names(My_ANOVA_Input_Table) <- c(My_Factor_Name,My_Response_Name) 
My_ANOVA_Input_Table[[2]] <- as.numeric(My_ANOVA_Input_Table[[2]])
My_ANOVA_Input_Table[[1]] <- as.factor(My_ANOVA_Input_Table[[1]])
model2 <- aov(My_ANOVA_Input_Table[[2]] ~ My_ANOVA_Input_Table[[1]], 
             data= My_ANOVA_Input_Table)
summary(model2)

##                           Df    Sum Sq  Mean Sq F value  Pr(>F)   
## My_ANOVA_Input_Table[[1]]  4 103191489 25797872   6.191 0.00379 **
## Residuals                 15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 0.00379 and a significance level of 0.05, we reject the Null hypothesis. There is evidence to support the claim of a significant difference between at least one of the five sample mean failure times.

Part(b)

Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

autoplot(model2)

The Residuals vs Fitted Plot is showing us that the variance is not constant over the fitted values, which is required for a valid ANOVA test. The Normal Probability Plot (titled ‘Normal Q-Q’) shows that the residuals are also not normal, violating the requirements for a valid ANOVA.

Part(c)

Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

In order to get a valid ANOVA, we must transform the data using a BoxCox Transformation. We can validate the need for this by looking at the BoxCox graph and see that that confidence interval for Lamda doesn’t overlap the value of 1.

boxcox(My_Response~My_Factor)

So transforming the data, with a lamda of 0.0099 and re-running the BoxCox graph we see that we now cover the value of 1 in the confidence interval.ANOVA, we now get acceptable residuals

#we then transform
lamda=0.0099
My_Response <- My_Response^(lamda)

#we look to see how the transformation did
boxcox(My_Response~My_Factor)

We can now re-run the ANOVA with the same hyptheses as above.

#we re-run the ANOVA with transformed data
My_ANOVA_Input_Table <- as.data.frame(cbind(My_Factor, My_Response))
names(My_ANOVA_Input_Table) <- c(My_Factor_Name,My_Response_Name) 
My_ANOVA_Input_Table[[2]] <- as.numeric(My_ANOVA_Input_Table[[2]])
My_ANOVA_Input_Table[[1]] <- as.factor(My_ANOVA_Input_Table[[1]])
model2_b <- aov(My_ANOVA_Input_Table[[2]] ~ My_ANOVA_Input_Table[[1]], data= My_ANOVA_Input_Table)
summary(model2_b)

##                           Df   Sum Sq  Mean Sq F value   Pr(>F)    
## My_ANOVA_Input_Table[[1]]  4 0.017773 0.004443   37.67 1.17e-07 ***
## Residuals                 15 0.001769 0.000118                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

autoplot(model2_b)

Now looking at the residuals, we can see they have rather constant variance and are normally distributed.With a p-value of 1.17e-07 and a significance level of 0.05, we reject the Null hypothesis. There is evidence to support the claim of a significant difference between at least one of the five sample mean failure times.

Problem 3 (3.29)

QUESTION

A semiconductor manufacturer has developed three different methods for reducing particle counts on wafers. All three methods are tested on five different wafers and the after treatment particle count obtained. The data are shown below: [See table of values in book.]
(a) Do all methods have the same effect on mean particle count?
(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?
(c) Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

Part(a)

In order to understand if the material types have an effect on mean failure time, we will run an ANOVA test, with the following hypotheses:
Null: H₀: μ₁ = μ₂ = μ₃
Alternate: H₁: μ_i ≠ μ_j for at least one pair (i,j)

My_ANOVA_Input_Table <- as.data.frame(cbind(My_Factor, My_Response))
names(My_ANOVA_Input_Table) <- c(My_Factor_Name,My_Response_Name) 
My_ANOVA_Input_Table[[2]] <- as.numeric(My_ANOVA_Input_Table[[2]])
My_ANOVA_Input_Table[[1]] <- as.factor(My_ANOVA_Input_Table[[1]])
model3 <- aov(My_ANOVA_Input_Table[[2]] ~ My_ANOVA_Input_Table[[1]], data= My_ANOVA_Input_Table)
summary(model3)

##                           Df Sum Sq Mean Sq F value  Pr(>F)   
## My_ANOVA_Input_Table[[1]]  2   8964    4482   7.914 0.00643 **
## Residuals                 12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 0.00643 and a significance level of 0.05, we reject the Null hypothesis. There is evidence to support the claim of a significant difference between at least one of the three sample mean particle counts.

Part(b)

Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

autoplot(model3)

Part(c)

Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

boxcox(My_Response~My_Factor)

So transforming the data, with a lamda of 0.5 and re-running the BoxCox graph we see that we now cover the value of 1 in the confidence interval.ANOVA, we now get acceptable residuals

#we then transform
lamda=0.5
My_Response <- My_Response^(lamda)

#we look to see how the transformation did
boxcox(My_Response~My_Factor)

We can now re-run the ANOVA with the same hyptheses as above.

#we re-run the ANOVA with transformed data
My_ANOVA_Input_Table <- as.data.frame(cbind(My_Factor, My_Response))
names(My_ANOVA_Input_Table) <- c(My_Factor_Name,My_Response_Name) 
My_ANOVA_Input_Table[[2]] <- as.numeric(My_ANOVA_Input_Table[[2]])
My_ANOVA_Input_Table[[1]] <- as.factor(My_ANOVA_Input_Table[[1]])
model3_b <- aov(My_ANOVA_Input_Table[[2]] ~ My_ANOVA_Input_Table[[1]], data= My_ANOVA_Input_Table)
summary(model3_b)

##                           Df Sum Sq Mean Sq F value  Pr(>F)   
## My_ANOVA_Input_Table[[1]]  2  63.90   31.95    9.84 0.00295 **
## Residuals                 12  38.96    3.25                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

autoplot(model3_b)

Now looking at the residuals, we can see they have rather constant variance and are still normally distributed. With a p-value of 0.00295 and a significance level of 0.05, we reject the Null hypothesis. There is evidence to support the claim of a significant difference between at least one of the three sample mean paricle counts.

Problem 4 (3.51)

Use the Kruskal-Wallis test for the experiment in Problem 3.23. Compare the conclusions obstained with those from the usual analysis of variance.

The original results from problem 3.23 were as follows:

Null: H₀: μ₁ = μ₂ = μ₃ = μ₄
Alternate: H₁: μ_i ≠ μ_j for at least one pair (i,j)

My_ANOVA_Input_Table <- as.data.frame(cbind(My_Factor, My_Response))
names(My_ANOVA_Input_Table) <- c(My_Factor_Name,My_Response_Name) 
My_ANOVA_Input_Table[[2]] <- as.numeric(My_ANOVA_Input_Table[[2]])
My_ANOVA_Input_Table[[1]] <- as.factor(My_ANOVA_Input_Table[[1]])
model <- aov(My_ANOVA_Input_Table[[2]] ~ My_ANOVA_Input_Table[[1]], 
             data= My_ANOVA_Input_Table)
summary(model)

##                           Df Sum Sq Mean Sq F value Pr(>F)  
## My_ANOVA_Input_Table[[1]]  3  30.17   10.05   3.047 0.0525 .
## Residuals                 20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 0.0525 and a significance level of 0.05, we barely fail to reject the Null hypothesis.

Now running the data with the Kruskal-Wallis test, we get the following:

kruskal.test(My_Response~My_Factor,data=My_ANOVA_Input_Table)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  My_Response by My_Factor
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

The p-value of 0.1015 shows an increasing inability to detect a difference in the samples from the original p-value of 0.0525. That stands to reason because the non-parametric loses some specificity that the parametric version had and thus can’t detect differences in means as well.

Homework Week 6 - Fred Gersdorff

IE 5342 - Dr. Timothy I. Matis

Fred Gersdorff

10/10/2021

Problem 1 (3.23)

QUESTION

Part(a)

Part(b)

Part(c)

Problem 2 (3.28)

QUESTION

Part(a)

Part(b)

Part(c)

Problem 3 (3.29)

QUESTION

Part(a)

Part(b)

Part(c)

Problem 4 (3.51)