A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.

type1 <- c( .34,    .12,    1.23,   .70,    1.75,   .12)
type2 <- c( .91 ,2.94,  2.14,   2.36,   2.86,   4.55)
type3 <- c( 6.31,   8.37,   9.75,   6.09,   9.82,   7.24)
type4 <- c( 17.15   ,11.82, 10.97   ,17.20, 14.35   ,16.82)

a)

Write the linear effects equation and the hypothesis you are testing

Xij=mu+T+error

Where we are testing whether T(the effect of between treatments) is 0 or not. T=0 vs T≠0

Ho=mu1=mu2…=grandmu

Ha=mu1≠mu2≠mu3….≠grandmu

b)

Does it appear that the variance is constant?

dafr=data.frame(type1,type2,type3,type4)
dafr <- cbind(stack(dafr[1:4]))

boxplot(type1,type2,type3,type4,main='Boxplot by Type',names=c('1','2','3','4'))

Our variances are not equal, as shown by the Boxplot and the residual graphs shown below.

plot(aov(values~ind,dafr))

c)

(nonparametric) Perform a Kruskal-Wallace test in R (a=0.05)

kruskal.test(values~ind,dafr)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  values by ind
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Our P-values is well below the alpha of .05, therefore we reject the null hypothesis and state that the methods of estimation do have an effect on the average flood flow frequency found.

d)

(parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R (a=0.05)

library(MASS)
boxcox(values~ind,data=dafr)

Based on our Box-Cox graph, we will choose a lambda of .5 to transform our data by.

dafr2 <- dafr
dafr2$values <- dafr2$values^.5
plot(aov(values~ind,dafr2))

Our variances are somewhat edited by this transform, though it still does not look like they are completely equal. They are definitely closer to each other than they were before the boxcox transform.

summary(aov(values~ind,dafr2))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ind          3  32.69  10.898   81.17 2.27e-11 ***
## Residuals   20   2.69   0.134                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Our p-value here was again found to be less than the alpha of .05, therefore we will still reject the null hypothesis and state that the methods of estimation do have an effect on the average flood flow frequency found.

All code used:

type1 <- c( .34,    .12,    1.23,   .70,    1.75,   .12)
type2 <- c( .91 ,2.94,  2.14,   2.36,   2.86,   4.55)
type3 <- c( 6.31,   8.37,   9.75,   6.09,   9.82,   7.24)
type4 <- c( 17.15   ,11.82, 10.97   ,17.20, 14.35   ,16.82)

Ho=mu1=mu2...=grandmu
Ha=mu1≠mu2≠mu3....≠grandmu

Xij=mu+T+error
Where we are testing whether T(the effect of between treatments) is 0 or not.

dafr=data.frame(type1,type2,type3,type4)
dafr <- cbind(stack(dafr[1:4]))

boxplot(type1,type2,type3,type4,main='Boxplot by Type',names=c('1','2','3','4'))

plot(aov(values~ind,dafr))
It does however look normal

kruskal.test(values~ind,dafr)

library(MASS)
boxcox(values~ind,data=dafr)

dafr2 <- dafr
dafr2$values <- dafr2$values^.5
boxplot(type1^.5,type2^.5,type3^.5,type4^.5,main='Boxplot by Type(after transform)',names=c('1','2','3','4'))

summary(aov(values~ind,dafr2))