Suppose we wish to design a new experiment that tests for a significant difference between the mean effective life of four insulating fluids at an accelerated load of 35kV. The variance of fluid life is estimated to be 3.5 hrs based on preliminary data. We would like this test to have a type 1 error probability of 0.05, and for this test to have an 80%probability of rejecting the assumption that the mean life of all the fluids are the same if there is a difference greater than 2 hours between the mean lives of the fluids, with a min of 18 hrs and max of 20 hrs. How many samples of each fluid will need to be collected to achieve this design criterion in the case of Min, Intermediate, and Max variability?
The solution in commented R is given here
#### Begin Problem 1 #######
#number of populations, k
pop_k=4
#Test for even or ddd from the maximum case
k_is_even <- function(x) {
pop_k %% 2 == 0
}
k_is_odd <- function(x) {
pop_k %% 2 == 1
}
#run the even or odd modulo functions, key on the result being even
test_k_even = k_is_even(pop_k)
#calculate the difference in means, d
d=(20-18)
#Calculate the "f" values for minimum, intermediate, and maximum case.
#These equations are taken from video lectures
min_f = d*sqrt(1/(2*pop_k)) #minimum case
intermediate_f = (d/2)*sqrt((pop_k+1)/(3*(pop_k-1)))#intermediate case
#We test for even or odd to decide which equation to use for max
if (test_k_even) {
# code to run if k is EVEN
max_f = (d/2)
} else {
# code to run if k is ODD
max_f = (d*sqrt(pop_k^2 - 1)/(2*pop_k))
}
For the minimum case, we would have the following:
#minimum run
pwr::pwr.anova.test(k=pop_k, n=NULL, f=min_f, sig.level=0.05, power=0.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 6.515965
## f = 0.7071068
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
So the required number of samples would be 7 for the minimum case.
For the intermediate case, we would have the following:
#intermediate run
pwr::pwr.anova.test(k=pop_k, n=NULL, f=intermediate_f, sig.level=0.05, power=0.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 5.979231
## f = 0.745356
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
So the required number of samples would be 6 for the intermediate case.
For the maximum case, we would have the following, k is even
#maximum run
pwr::pwr.anova.test(k=pop_k, n=NULL, f=max_f, sig.level=0.05, power=0.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 3.856403
## f = 1
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
Our maximum case would require 4 samples from this result.
The experimenter decided to collect six observations for each type of fluid, resulting in the following test data.
response | fluid.type |
---|---|
17.6 | 1 |
18.9 | 1 |
16.3 | 1 |
17.4 | 1 |
20.1 | 1 |
21.6 | 1 |
16.9 | 2 |
15.3 | 2 |
18.6 | 2 |
17.1 | 2 |
19.5 | 2 |
20.3 | 2 |
21.4 | 3 |
23.6 | 3 |
19.4 | 3 |
18.5 | 3 |
20.5 | 3 |
22.3 | 3 |
19.3 | 4 |
21.1 | 4 |
16.9 | 4 |
17.5 | 4 |
18.3 | 4 |
19.8 | 4 |
#do our One-Way test with AOV and get a summary
model<-aov(response~fluid.type,data=df) #write aov to an object, easy to call over and over
summary(model) #summary of an aov object gives back anova
## Df Sum Sq Mean Sq F value Pr(>F)
## fluid.type 3 30.16 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Show our NPP for all data to confirm normalcy
#plot(model,2) #NPP
#I will use the oneway.test, since the summary has more digits
fluid_compare<-oneway.test(response~fluid.type, var.equal = TRUE)
fluid_compare
##
## One-way analysis of means
##
## data: response and fluid.type
## F = 3.0473, num df = 3, denom df = 20, p-value = 0.05246
We can conclude from the p-value of 0.05246 that we would reject the null hypothesis and conclude that the fluids do not have the same mean performance
For ANOVA, we need to have the following: independent samples,an approximately normal population, all populations must have a similar variance.
To check for normal behaviour we examine the NPP for all the samples combined and each sample individualy in the following NPP plots.
#Show our NPP for all data to confirm normalcy
plot(model,2) #NPP
#Let's do a NPP for each fluid type to look out how normal our data is!
qqnorm(fluid1, xlab = "Type 1 samples ", main = "Normal Probability Type 1")
qqline(fluid1, col = "darkgreen", lwd = 2)
#fluid2 NPP
qqnorm(fluid2, xlab = "Type 2 Samples ", main = "Normal Probability Type 2")
qqline(fluid2, col = "goldenrod", lwd = 2)
#fluid3 NPP
qqnorm(fluid3, xlab = "Type 3 Samples ", main = "Normal Probability Type 3")
qqline(fluid3, col = "steelblue4", lwd = 2)
#Fluid4 NPP
qqnorm(fluid4, xlab = "Type 4 Samples ", main = "Normal Probability Type 4")
qqline(fluid4, col = "red3", lwd = 2)
It is evident from all of these plots that the data is normal. The data has a linear appearance.
In order to check for the variance requirement, we perform a box plot.
#Perform a box plot to examine variance and locality of means
boxplot(response~fluid.type, col=c("darkgreen", "goldenrod", "steelblue4", "red3")) # continous variable ~ factor variable
By examination, we can see that all the data has roughly the same variance. So this requirement is satisfied. It is also evident that the samples are independent of one another.
#Since we reject Ho, let's see which sample pairs differ
tukey_result<-TukeyHSD(model, conf.level = 0.90)
plot(tukey_result)
Using the Tukey’s test to compare samples to one another can visually see how sampels compare on a pairwise basis. Any pair from the confidence interval graph that includes “0” in the interval do not significantly differ from one another. So at the 90% confidence level pairs, 2-1, 3-1, 4-1, 4-2, and 4-3 do NOT significantly differ from each other. However the pair 3-2 does significantly differ from each other.
Here is the source code for this assignment:
knitr::opts_chunk$set(echo = TRUE,warning=FALSE, message=FALSE)
#Derrell Dunn
#HW 5342 HW9
#CRD/One-Way ANOVA and Multiple Comparisons in R
#Texas Tech University
#### Begin Problem 1 #######
#number of populations, k
pop_k=4
#Test for even or odd from the maximum case
k_is_even <- function(x) {
pop_k %% 2 == 0
}
k_is_odd <- function(x) {
pop_k %% 2 == 1
}
#run the even or odd modulo functions, key on the result being even
test_k_even = k_is_even(pop_k)
#calculate the difference in means, d
d=(20-18)
#Calculate the "f" values for minimum, intermediate, and maximum case.
#These equations are taken from video lectures
min_f = d*sqrt(1/(2*pop_k)) #minimum case
intermediate_f = (d/2)*sqrt((pop_k+1)/(3*(pop_k-1)))#intermediate case
#We test for even or odd to decide which equation to use for max
if (test_k_even) {
# code to run if k is EVEN
max_f = (d/2)
} else {
# code to run if k is ODD
max_f = (d*sqrt(pop_k^2 - 1)/(2*pop_k))
}
#minimum run
pwr::pwr.anova.test(k=pop_k, n=NULL, f=min_f, sig.level=0.05, power=0.80)
#intermediate run
pwr::pwr.anova.test(k=pop_k, n=NULL, f=intermediate_f, sig.level=0.05, power=0.80)
#maximum run
pwr::pwr.anova.test(k=pop_k, n=NULL, f=max_f, sig.level=0.05, power=0.80)
#### End of Problem 1 ####
#### Begin Problem @ #####
#
#Setup up data frame for use
fluid.type<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
fluid.type<-as.factor(fluid.type)
#Input the responses for our data
fluid1<-c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
fluid2<-c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
fluid3<-c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
fluid4<-c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
response<-c(fluid1,fluid2,fluid3,fluid4) #can create vectors from vectors
#Create our full data frame with populated responses
df<-data.frame(response,fluid.type)
#Perform a box plot to examine variance and locality of means
boxplot(response~fluid.type, col=c("darkgreen", "goldenrod", "steelblue4", "red3")) # continous variable ~ factor varaible
#Let's do a NPP for each fluid type to look out how normal our data is!
qqnorm(fluid1, xlab = "Type 1 samples ", main = "Normal Probability Type 1")
qqline(fluid1, col = "darkgreen", lwd = 2)
#fluid2 NPP
qqnorm(fluid2, xlab = "Type 2 Samples ", main = "Normal Probability Type 2")
qqline(fluid2, col = "goldenrod", lwd = 2)
#fluid3 NPP
qqnorm(fluid3, xlab = "Type 3 Samples ", main = "Normal Probability Type 3")
qqline(fluid3, col = "steelblue4", lwd = 2)
#Fluid4 NPP
qqnorm(fluid4, xlab = "Type 4 Samples ", main = "Normal Probability Type 4")
qqline(fluid4, col = "red3", lwd = 2)
#do our One-Way test with AOV and get a summary
model<-aov(response~fluid.type,data=df) #write aov to an object, easy to call over and over
summary(model) #summary of an aov object gives back anova
#Show our NPP for all data to confirm normalcy
plot(model,2) #NPP
#I will use the oneway.test, since tghe summary has more digits
fluid_compare<-oneway.test(response~fluid.type, var.equal = TRUE)
fluid_compare
#Since we reject Ho, let's see which sample pairs differ
tukey_result<-TukeyHSD(model, conf.level = 0.90)
plot(tukey_result)