library(dplyr)
library(knitr)
library(agricolae)
library(kableExtra)
library(tidyr)
library(MASS)
A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in \(ft^3/s\)) are shown below.
| EstMeth | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 0.34 | 0.12 | 1.23 | 0.70 | 1.75 | 0.12 |
| 2 | 0.91 | 2.94 | 2.14 | 2.36 | 2.86 | 4.55 |
| 3 | 6.31 | 8.37 | 9.75 | 6.09 | 9.82 | 7.24 |
| 4 | 17.15 | 11.82 | 10.97 | 17.20 | 14.35 | 16.82 |
Write the linear effects equation and the hypotheses you are testing.
\(H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4\)
\(H_a\): at least one \(\mu_i\) differs from the other \(\mu_i\)’s.
Does it appear that the data is normally distributed? Does it appear that the variance is constant?
EstMethod1 <- c(0.34,0.12,1.23,0.70,1.75,0.12)
EstMethod2 <- c(0.91,2.94,2.14,2.36,2.86,4.55)
EstMethod3 <- c(6.31,8.37,9.75,6.09,9.82,7.24)
EstMethod4 <- c(17.15,11.82,10.97,17.20,14.35,16.82)
FloodFlowTable <- data.frame(EstMethod1,EstMethod2,EstMethod3,EstMethod4)
FloodFlowTableLong <- pivot_longer(FloodFlowTable,c(EstMethod1,EstMethod2,EstMethod3,EstMethod4))
FloodFlowAOV <- aov(value ~ name, data = FloodFlowTableLong)
FloodFlowAOVTable <- summary(FloodFlowAOV)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| name | 3 | 708.67595 | 236.225315 | 76.28684 | 0 |
| Residuals | 20 | 61.93082 | 3.096541 |
plot(FloodFlowAOV,2)
The normal probability plot shows that the data is somewhat normal, but has tails on each end. Because of the tailing on the ends, we will say it is not normally distributed.
boxplot(value~name, data = FloodFlowTableLong)
The box plot shows that the data does not have constant variance.
Perform a Kruskal-Wallace test in R (\(\alpha=0.05\))
kruskal.test(value~name,data=FloodFlowTableLong)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
The calculated p-value of \(9.77*10^{-5}\) is much smaller than \(\alpha=0.05\), so we reject the null hypothesis and conclude that at least one Estimation Method’s mean of observations significantly differs from the rest.
Select an appropriate transformation using Box Cox, transform the data, and test the hypothesis in R (\(\alpha=0.05\)).
boxcox(value~name,data=FloodFlowTableLong)
The Box Cox plot allows us to estimate that the value that maximizes the liklihood function is approximately \(\lambda=0.6\). We will now transform our data with that power.
lambda <- 0.6
FloodFlowTableLong[,2] <- FloodFlowTableLong[,2]^lambda
boxcox(value~name,data=FloodFlowTableLong)
FloodFlowTransformedAOV <- aov(value ~ name, data = FloodFlowTableLong)
summary(FloodFlowTransformedAOV)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 63.71 21.236 85.76 1.36e-11 ***
## Residuals 20 4.95 0.248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Box Cox plot shows that 1 now falls in the 95% confidence interval. Re-running the AOV calculated a new p-value of \(1.36*10^{-11}\), still significantly smaller than \(\alpha=0.05\), so we reject the null hypothesis and conclude that at least one Estimation Method’s mean of observations significantly differs from the rest.