A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.
type1 <- c( .34, .12, 1.23, .70, 1.75, .12)
type2 <- c( .91 ,2.94, 2.14, 2.36, 2.86, 4.55)
type3 <- c( 6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
type4 <- c( 17.15 ,11.82, 10.97 ,17.20, 14.35 ,16.82)
Write the linear effects equation and the hypothesis you are testing
Xij=mu+T+error
Where we are testing whether T(the effect of between treatments) is 0 or not. T=0 vs T≠0
Ho=mu1=mu2…=grandmu
Ha=mu1≠mu2≠mu3….≠grandmu
Does it appear that the variance is constant?
dafr=data.frame(type1,type2,type3,type4)
dafr <- cbind(stack(dafr[1:4]))
boxplot(type1,type2,type3,type4,main='Boxplot by Type',names=c('1','2','3','4'))
Our variances are not equal, as shown by the Boxplot and the residual graphs shown below.
plot(aov(values~ind,dafr))
(nonparametric) Perform a Kruskal-Wallace test in R (a=0.05)
kruskal.test(values~ind,dafr)
##
## Kruskal-Wallis rank sum test
##
## data: values by ind
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
Our P-values is well below the alpha of .05, therefore we reject the null hypothesis and state that the methods of estimation do have an effect on the average flood flow frequency found.
(parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R (a=0.05)
library(MASS)
boxcox(values~ind,data=dafr)
Based on our Box-Cox graph, we will choose a lambda of .5 to transform our data by.
dafr2 <- dafr
dafr2$values <- dafr2$values^.5
plot(aov(values~ind,dafr2))
Our variances are somewhat edited by this transform, though it still does not look like they are completely equal. They are definitely closer to each other than they were before the boxcox transform.
summary(aov(values~ind,dafr2))
## Df Sum Sq Mean Sq F value Pr(>F)
## ind 3 32.69 10.898 81.17 2.27e-11 ***
## Residuals 20 2.69 0.134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Our p-value here was again found to be less than the alpha of .05, therefore we will still reject the null hypothesis and state that the methods of estimation do have an effect on the average flood flow frequency found.
type1 <- c( .34, .12, 1.23, .70, 1.75, .12)
type2 <- c( .91 ,2.94, 2.14, 2.36, 2.86, 4.55)
type3 <- c( 6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
type4 <- c( 17.15 ,11.82, 10.97 ,17.20, 14.35 ,16.82)
Ho=mu1=mu2...=grandmu
Ha=mu1≠mu2≠mu3....≠grandmu
Xij=mu+T+error
Where we are testing whether T(the effect of between treatments) is 0 or not.
dafr=data.frame(type1,type2,type3,type4)
dafr <- cbind(stack(dafr[1:4]))
boxplot(type1,type2,type3,type4,main='Boxplot by Type',names=c('1','2','3','4'))
plot(aov(values~ind,dafr))
It does however look normal
kruskal.test(values~ind,dafr)
library(MASS)
boxcox(values~ind,data=dafr)
dafr2 <- dafr
dafr2$values <- dafr2$values^.5
boxplot(type1^.5,type2^.5,type3^.5,type4^.5,main='Boxplot by Type(after transform)',names=c('1','2','3','4'))
summary(aov(values~ind,dafr2))