We wish to design a new experiment to test for a significant difference between the mean effective life of four different fluids at an accelerated load of 35kV. The variance of fluid life is estimated to be 3.5 hrs. based on preliminary data. We would like this test to have a Type 1 error probability of 0.05 and to have an 80% probability of rejecting the assumption that the mean life of all the fluids is equal if there is a difference greater than 2 hours between the mean lives of the fluids, with a minimum of 18 hours and a maximum of 20 hours.
For all cases, the pwr.anova.test in R will be used. This method will require the effect to be calculated using Cohen’s method, which first requires determining what the value of “d” is. “d” is the range divided by sigma where:
\[\sigma = \sqrt{variance}\]
“d” is calculated as follows:
\[d=\frac{\mu\_4 - \mu\_1}{\sigma} = \frac{2}{\sqrt{3.5}} = 1.07\]
Each case will also use “k”, where: \[k=\text{# of popluations }= 4\] Each case will have its own unique equation for the effect and will be discussed in their section.
#Global Settings
library(pwr)
library(tidyr)
library(car)
#calculate d=range/sqrt(variance) for use in the effect
d=2/sqrt(3.5)
#Set the population count k
k=4
For the minimum variability case, the means will be in the middle the range of 18 to 20. Therefore we will have a mean at (18, 19, 19, 20).
To calculate the effect for this case we use the following equation:
\[effect = d*\sqrt{\frac{1}{2*k}}\]
Let’s look at the amount of samples needed to collect of each fluid to achieve this design criterion in the case of minimum variability:
#min variability scenario
library(pwr)
#The effect is given by
effect=d*sqrt(1/(2*k))
#Run the anova test for the given variables
pwr.anova.test(k=k,n=NULL,f=effect,sig.level = .05,power=.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 20.08368
## f = 0.3779645
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
For the minimum variability case, we will need 20 samples.
For the intermediate variability case, the means will be evenly distributed over the range of 18 to 20. Therefore we will have a mean at (18, 18.66, 19.33, 20).
To calculate the effect for this case we use the following equation:
\[effect = \frac{d}{2}*\sqrt{\frac{k+1}{3*(k-1)}}\]
Let’s look at the amount of samples we will need to collect of each fluid to achieve this design criterion in the case of intermediate variability:
#Intermediate variability scenario
#The effect is given by
effect=(d/2)*sqrt((k+1)/(3*(k-1)))
#Run the anova test for the given variables
pwr.anova.test(k=k,n=NULL,f=effect,sig.level = .05,power=.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 18.17866
## f = 0.3984095
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
For the intermediate variability case, we will need 18 samples.
For the maximum variability case, the means will be distributed at the ends of the range of 18 to 20. Therefore we will have a mean at (18, 18, 20, 20).
The effect equation for maximum variability will depend on whether “k”, number of populations, is even or odd. Since “k” is even for this experiment, calculate the effect for this case we use the following equation:
\[effect = \frac{d}{2}\]
#max variability scenario
#The effect is given by (since k is even, otherwise it will change)
effect=d/2
#Run the anova test for the given variables
pwr.anova.test(k=k,n=NULL,f=effect,sig.level = .05,power=.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 10.56951
## f = 0.5345225
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
For the max variability case, we will need 10 samples.
A total of six observations were collected for each fluid type. This data is input using the following:
#Input data from observations
Fluid_data<-data.frame(
Fluid1 = c(17.6,18.9,16.3,17.4,20.1,21.6),
Fluid2 = c(16.9,15.3,18.6,17.1,19.5,20.3),
Fluid3 = c(21.4,23.6,19.4,18.5,20.5,22.3),
Fluid4 = c(19.3,21.1,16.9,17.5,18.3,19.8)
)
To test the hyposthesis: \[H_o:\mu_1=\mu_2=\mu_3=\mu_4\] \[H_a:\text{at least one } \mu \text{ differs}\]
Using a significance level: \[\alpha=0.10\]
First, the data is rearranged using tidyr:
#Convert the data to tidy format
tidy_Fluid_data <- pivot_longer(Fluid_data, cols=everything(),names_to="FluidType",values_to ="Life")
Now we run a Fit an Analysis of Variance Model, “aov”, on the data and summarize the output:
#Run the anova test
aov.model<-aov(Life~FluidType, data=tidy_Fluid_data)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## FluidType 3 30.17 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the summary of the “aov”, the p-value is 0.0525 and since this experiment is using the significance level of 0.10, we reject the null hypothesis based on this p-value being less than 0.10.
\[p-value = .0525 < \alpha = 0.10\]
Taking a look at the plots of the “aov” allow us to have a look at the data and how well it fits.
The plots of the “aov” are as follows:
plot(aov.model)
Looking at the “Residuals vs Fitted” plot, the spread of each fluid type seem to be decently equal with no concerns. Looking at the “Q-Q Residuals” plot, the data follows the line well and does not show any major outliers.Looking at the “Scale-Location” plot, there is a concern with the dip in the line. This indicates that there is a unequal variance across the groups. Looking at the “Constant Leverage” plot, there are no concerns to be noted.
Due to rejecting the null hypothesis, a look into which fluid significantly differs can be taken using Tukey’s test and a family-wise error rate of: \[\alpha=0.10\]
#Use tukeys to look at which fluid(s) significantly differ using the family-wise error
TukeyHSD(aov.model,conf.level = 0.90)
## Tukey multiple comparisons of means
## 90% family-wise confidence level
##
## Fit: aov(formula = Life ~ FluidType, data = tidy_Fluid_data)
##
## $FluidType
## diff lwr upr p adj
## Fluid2-Fluid1 -0.7000000 -3.2670196 1.8670196 0.9080815
## Fluid3-Fluid1 2.3000000 -0.2670196 4.8670196 0.1593262
## Fluid4-Fluid1 0.1666667 -2.4003529 2.7336862 0.9985213
## Fluid3-Fluid2 3.0000000 0.4329804 5.5670196 0.0440578
## Fluid4-Fluid2 0.8666667 -1.7003529 3.4336862 0.8413288
## Fluid4-Fluid3 -2.1333333 -4.7003529 0.4336862 0.2090635
Looking at the output of Tukey’s test, Fluid3 compared to Fluid2, shows there is a significant difference since zero does not fall within the confidence level. The plots of tukey’s test are looked at as follows:
plot(TukeyHSD(aov.model,conf.level = 0.90))
The claim that Fluid3 compared to Fluid2 has a significant difference due to zero not being included in the confidence level is proven by the plot. It is seen that for the conifidence interval plot for the fourth series, Fluid3-Fluid2, the confidence interval does not include zero.
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
#Global Settings
library(pwr)
library(tidyr)
library(car)
#calculate d=range/sqrt(variance) for use in the effect
d=2/sqrt(3.5)
#Set the population count k
k=4
#min variability scenario
library(pwr)
#The effect is given by
effect=d*sqrt(1/(2*k))
#Run the anova test for the given variables
pwr.anova.test(k=k,n=NULL,f=effect,sig.level = .05,power=.80)
#Intermediate variability scenario
#The effect is given by
effect=(d/2)*sqrt((k+1)/(3*(k-1)))
#Run the anova test for the given variables
pwr.anova.test(k=k,n=NULL,f=effect,sig.level = .05,power=.80)
#max variability scenario
#The effect is given by (since k is even, otherwise it will change)
effect=d/2
#Run the anova test for the given variables
pwr.anova.test(k=k,n=NULL,f=effect,sig.level = .05,power=.80)
#Input data from observations
Fluid_data<-data.frame(
Fluid1 = c(17.6,18.9,16.3,17.4,20.1,21.6),
Fluid2 = c(16.9,15.3,18.6,17.1,19.5,20.3),
Fluid3 = c(21.4,23.6,19.4,18.5,20.5,22.3),
Fluid4 = c(19.3,21.1,16.9,17.5,18.3,19.8)
)
#Convert the data to tidy format
tidy_Fluid_data <- pivot_longer(Fluid_data, cols=everything(),names_to="FluidType",values_to ="Life")
#Run the anova test
aov.model<-aov(Life~FluidType, data=tidy_Fluid_data)
summary(aov.model)
plot(aov.model)
#Use tukeys to look at which fluid(s) significantly differ using the family-wise error
TukeyHSD(aov.model,conf.level = 0.90)
plot(TukeyHSD(aov.model,conf.level = 0.90))