3.23

The effective life of insulating fluids at an accelerated load of 35 kV is being studied. Test data have been obtained for four types of fluids. The results from a completely randomized experiment were as follows:

Fluid Type | Life (in h) at 35 kV Load

# Read in the data table for 3.23
dat323<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/323Table.csv")
names(dat323) <- sub('X', '', names(dat323))
dat323

##      1    2    3    4
## 1 17.6 16.9 21.4 19.3
## 2 18.9 15.3 23.6 21.1
## 3 16.3 18.6 19.4 16.9
## 4 17.4 17.1 18.5 17.5
## 5 20.1 19.5 20.5 18.3
## 6 21.6 20.3 22.3 19.8

(a) Is there any indication that the fluids differ? Use α = 0.05.

If we assume constant variance and normal distributions in the fluid populations, our ANOVA test indicates that P_F = 0.0525 > 0.05 = α, so we cannot reject the null hypothesis and this does not indicate that the fluids differ.

# One way ANOVA test 
aov323 <- aov(values ~ ind,data=(stack(dat323)))
summary(aov323)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## ind          3  30.17   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(b) Which fluid would you select, given that the objective is long life?

We do not have statistical evidence that the means are significantly different and the ranges of the samples overlap. We can look at the distributions and select fluid 3 by guessing.

# boxplots comparing the 4 insulating fluids' effective life
boxplot(values ~ ind, data = stack(dat323), col=c("steelblue","firebrick2", "forestgreen","darkorange"),xlab = "insulating fluid",ylab = "effective life",main="effective life of insulating fluids")

(c) Analyze the residuals from this experiment. Are the basic analysis of variance assumptions satisfied?

The more significant assumption of equal variance is supported by the residuals versus predicted plot, as the spread is similar for the samples.

# AOV residuals versus predicted
aov323<-aov(values ~ ind,data=(stack(dat323)))
plot(aov323, 1)

The moderately significant assumption of normality is supported by the residuals versus predicted plot, as the points fall close the linear fit.

# AOV residual normal probability plot 
aov323<-aov(values ~ ind,data=(stack(dat323)))
plot(aov323, 2)

3.28

An experiment was performed to investigate the effectiveness of five insulating materials. Four samples of each material were tested at an elevated voltage level to accelerate the time to failure. The failure times (in minutes) are shown below:

Material | Failure Time (minutes)

# Read in the data table for 3.28
dat328<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/328Table.csv")
names(dat328) <- sub('X', '', names(dat328))
dat328

##     1  2    3     4  5
## 1 110  1  880   495  7
## 2 157  2 1256  7040  5
## 3 194  4 5276  5307 29
## 4 178 18 4355 10050  2

(a) Do all five materials have the same effect on mean failure time?

No. P_F = 0.00379 < 0.05 = α, so we reject the null hypothesis, assuming normality and equal variance.

# One way ANOVA test 
aov328 <- aov(values ~ ind,data=(stack(dat328)))
summary(aov328)

##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## ind          4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. What information is conveyed by these plots?

The more significant assumption of equal variance is not supported by the residuals versus predicted plot, as the spread is very different for the samples.

# AOV residuals versus predicted
aov328<-aov(values ~ ind,data=(stack(dat328)))
plot(aov328, 1)

The moderately significant assumption of normality is also not supported by the residuals versus predicted plot, as the graph is non-linear.

# AOV residual normal probability plot 
aov328<-aov(values ~ ind,data=(stack(dat328)))
plot(aov328, 2)

(c) Based on your answer to part (b) conduct another analysis of the failure time data and draw appropriate conclusions.

We need to perform a transform to normalize the variance and use a non-parameteric test to conduct ANOVA. BoxCox analysis below indicates a near-zero lambda. Hence, the appropriate transform is logarithmic, which brings the variances significantly closer to equal.

# Boxcox power estimate
library(MASS)
boxcox(stack(dat328)[,1]~stack(dat328)[,2], plotit=TRUE )

# Logarithmic transformation
dat328_log <- dat328
dat328_log[,1]<-log(dat328[,1])
dat328_log[,2]<-log(dat328[,2])
dat328_log[,3]<-log(dat328[,3])
dat328_log[,4]<-log(dat328[,4])
dat328_log[,5]<-log(dat328[,5])
dat328_log

##          1         2        3        4         5
## 1 4.700480 0.0000000 6.779922 6.204558 1.9459101
## 2 5.056246 0.6931472 7.135687 8.859363 1.6094379
## 3 5.267858 1.3862944 8.570924 8.576782 3.3672958
## 4 5.181784 2.8903718 8.379080 9.215328 0.6931472

# boxplots comparing the logarithmically transformed data from 5 insulating materials
boxplot(values ~ ind,data=(stack(dat328_log)), col=c("steelblue","firebrick2", "forestgreen", "darkorange", "plum"), xlab = "insulating material", ylab = "failure times (in minutes)", main="failure times (in minutes) by insulating material")

Using a non-parametric test (assuming non-normality), the p-value = 0.002046 < 0.05 = α, so we reject the null hypothesis; at least one material has a different effect on mean failure time than the others.

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat328_log)))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  values by ind
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046

Using Tukey’s test, we get some idea of which pairs definitively differ, which are 4-1, 4-2, and 5-4.

# Use Tukey's test to determine Honest Significant Differences between the treatment means, 95% confidence
library(car)

## Loading required package: carData

TukeyHSD(aov328)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = values ~ ind, data = (stack(dat328)))
## 
## $ind
##         diff        lwr       upr     p adj
## 2-1  -153.50  -4610.737  4303.737 0.9999674
## 3-1  2782.00  -1675.237  7239.237 0.3454736
## 4-1  5563.25   1106.013 10020.487 0.0115524
## 5-1  -149.00  -4606.237  4308.237 0.9999710
## 3-2  2935.50  -1521.737  7392.737 0.2974817
## 4-2  5716.75   1259.513 10173.987 0.0093981
## 5-2     4.50  -4452.737  4461.737 1.0000000
## 4-3  2781.25  -1675.987  7238.487 0.3457196
## 5-3 -2931.00  -7388.237  1526.237 0.2988208
## 5-4 -5712.25 -10169.487 -1255.013 0.0094552

plot(TukeyHSD(aov328))

3.29

A semiconductor manufacturer has developed three different methods for reducing particle counts on wafers. All three methods are tested on five different wafers and the after treatment particle count obtained. The data are shown below:

Method | Count

# Read in the data table for 3.29
dat329<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/329Table.csv")
names(dat329) <- sub('X', '', names(dat329))
dat329

##    1  2   3
## 1 31 62  53
## 2 10 40  27
## 3 21 24 120
## 4  4 30  97
## 5  1 35  68

(a) Do all methods have the same effect on mean particle count?

No. P_F = 0.00643 < 0.05 = α, so we reject the null hypothesis, assuming normality and equal variance.

# One way ANOVA test 
aov329 <- aov(values ~ ind,data=(stack(dat329)))
summary(aov329)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## ind          2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(b) Plot the residuals versus the predicted response. Construct a normal probability plot of the residuals. Are there potential concerns about the validity of the assumptions?

Yes

The more significant assumption of equal variance is not supported by the residuals versus predicted plot, as the spread is very different for the samples.

# AOV residuals versus predicted
aov329<-aov(values ~ ind,data=(stack(dat329)))
plot(aov329, 1)

The moderately significant assumption of normality is supported by the residuals versus predicted plot, as the points fall near the linear fit line.

# AOV residual normal probability plot 
aov329<-aov(values ~ ind,data=(stack(dat329)))
plot(aov329, 2)

(c) Based on your answer to part (b) conduct another analysis of the particle count data and draw appropriate conclusions.

To attempt to normalize the variance, from BoxCox analysis we apply a power transformation with λ ≈ 0.4275, which brings the variances significantly closer to equal. From one-way ANOVA applied to the tranformed data, P_F = 0.0029 < 0.05 = α, so we reject the null hypothesis with a fulfilled assumption of equal variance.

# Boxcox estimate
library(MASS)
boxcox(stack(dat329)[,1]~stack(dat329)[,2], plotit=TRUE )

# power transformation of the data using λ ≈ 0.4275
dat329_lambda<-dat329
dat329_lambda[,1]<-dat329[,1]^0.4275
dat329_lambda[,2]<-dat329[,2]^0.4275
dat329_lambda[,3]<-dat329[,3]^0.4275
dat329_lambda

##          1        2        3
## 1 4.340674 5.837776 5.459187
## 2 2.676086 4.840393 4.091740
## 3 3.674927 3.890813 7.741943
## 4 1.808759 4.280252 7.068787
## 5 1.000000 4.571820 6.072920

# Verify that boxcox λ ≈ 1 after transformation = SUCCESS
boxcox(stack(dat329_lambda)[,1]~stack(dat329_lambda)[,2], plotit=TRUE )

# boxplots comparing 3 methods to reduce particles on wafers after a power transformation
boxplot(values ~ ind, data = stack(dat329_lambda), col=c("steelblue","firebrick2", "forestgreen"),xlab = "method",ylab = "particle counts",main="particle counts for different reduction methods")

# One-way ANOVA
aov329_lambda<-aov(values ~ ind,data=(stack(dat329_lambda)))
summary(aov329_lambda)

##             Df Sum Sq Mean Sq F value Pr(>F)   
## ind          2  28.96  14.479   9.891 0.0029 **
## Residuals   12  17.57   1.464                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.51

Use the Kruskal–Wallis test for the experiment in Problem 3.23. Compare the conclusions obtained with those from the usual analysis of variance.

p-value = 0.1015 > 0.05 = α, so we cannot reject the null hypothesis and this does not indicate that the fluids differ. Compared to the ANOVA analysis that assume equal variance in the samples, the conclusion is the same i.e. we cannot conclude there is any difference in the effective life of these insulating fluids.

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat323)))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  values by ind
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

3.52

Use the Kruskal–Wallis test for the experiment in Problem 3.23. Are the results comparable to those found by the usual analysis of variance?

Yes. The results of the Kruskal-Wallis test do not assume normality, while ANOVA does. If the data did not demonstrate normality, the Kruskal-Wallis test would be more valid, but this is not the case here.

Complete Code

Here we display the complete R code used in this analysis.

# Read in the data table for 3.23
dat323<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/323Table.csv")
names(dat323) <- sub('X', '', names(dat323))
dat323

# One way ANOVA test 
aov323 <- aov(values ~ ind,data=(stack(dat323)))
summary(aov323)

# boxplots comparing the 4 insulating fluids' effective life
boxplot(values ~ ind, data = stack(dat323), col=c("steelblue","firebrick2", "forestgreen","darkorange"),xlab = "insulating fluid",ylab = "effective life",main="effective life of insulating fluids")

# AOV residuals versus predicted
aov323<-aov(values ~ ind,data=(stack(dat323)))
plot(aov323, 1)

# AOV residual normal probability plot 
aov323<-aov(values ~ ind,data=(stack(dat323)))
plot(aov323, 2)

# Read in the data table for 3.28
dat328<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/328Table.csv")
names(dat328) <- sub('X', '', names(dat328))
dat328

# One way ANOVA test 
aov328 <- aov(values ~ ind,data=(stack(dat328)))
summary(aov328)

# AOV residuals versus predicted
aov328<-aov(values ~ ind,data=(stack(dat328)))
plot(aov328, 1)

# AOV residual normal probability plot 
aov328<-aov(values ~ ind,data=(stack(dat328)))
plot(aov328, 2)

# Boxcox power estimate
library(MASS)
boxcox(stack(dat328)[,1]~stack(dat328)[,2], plotit=TRUE )

# Logarithmic transformation
dat328_log <- dat328
dat328_log[,1]<-log(dat328[,1])
dat328_log[,2]<-log(dat328[,2])
dat328_log[,3]<-log(dat328[,3])
dat328_log[,4]<-log(dat328[,4])
dat328_log[,5]<-log(dat328[,5])
dat328_log

# boxplots comparing the logarithmically transformed data from 5 insulating materials
boxplot(values ~ ind,data=(stack(dat328_log)), col=c("steelblue","firebrick2", "forestgreen", "darkorange", "plum"), xlab = "insulating material", ylab = "failure times (in minutes)", main="failure times (in minutes) by insulating material")

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat328_log)))

# Use Tukey's test to determine Honest Significant Differences between the treatment means, 95% confidence
library(car)
TukeyHSD(aov328)
plot(TukeyHSD(aov328))

# Read in the data table for 3.29
dat329<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/329Table.csv")
names(dat329) <- sub('X', '', names(dat329))
dat329

# One way ANOVA test 
aov329 <- aov(values ~ ind,data=(stack(dat329)))
summary(aov329)

# AOV residuals versus predicted
aov329<-aov(values ~ ind,data=(stack(dat329)))
plot(aov329, 1)

# AOV residual normal probability plot 
aov329<-aov(values ~ ind,data=(stack(dat329)))
plot(aov329, 2)

# Boxcox estimate
library(MASS)
boxcox(stack(dat329)[,1]~stack(dat329)[,2], plotit=TRUE )

# power transformation of the data using λ ≈ 0.4275
dat329_lambda<-dat329
dat329_lambda[,1]<-dat329[,1]^0.4275
dat329_lambda[,2]<-dat329[,2]^0.4275
dat329_lambda[,3]<-dat329[,3]^0.4275
dat329_lambda

# Verify that boxcox λ ≈ 1 after transformation = SUCCESS
boxcox(stack(dat329_lambda)[,1]~stack(dat329_lambda)[,2], plotit=TRUE )

# boxplots comparing 3 methods to reduce particles on wafers after a power transformation
boxplot(values ~ ind, data = stack(dat329_lambda), col=c("steelblue","firebrick2", "forestgreen"),xlab = "method",ylab = "particle counts",main="particle counts for different reduction methods")

# One-way ANOVA
aov329_lambda<-aov(values ~ ind,data=(stack(dat329_lambda)))
summary(aov329_lambda)

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat323)))

Homework, week 6 - IE 5342

Forest Kingfisher

2021-10-09