Setup

Load Libraries Into Session

library(dplyr)
library(knitr)
library(agricolae)
library(kableExtra)
library(tidyr)
library(MASS)

A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in \(ft^3/s\)) are shown below.

EstMeth 1 2 3 4 5 6
1 0.34 0.12 1.23 0.70 1.75 0.12
2 0.91 2.94 2.14 2.36 2.86 4.55
3 6.31 8.37 9.75 6.09 9.82 7.24
4 17.15 11.82 10.97 17.20 14.35 16.82

Part (a)

Write the linear effects equation and the hypotheses you are testing.

Linear effects equation

Hypotheses

\(H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4\)

\(H_a\): at least one \(\mu_i\) differs from the other \(\mu_i\)’s.

Part (b)

Does it appear that the data is normally distributed? Does it appear that the variance is constant?

Setting up Data Frame for Part (b)

EstMethod1 <- c(0.34,0.12,1.23,0.70,1.75,0.12)
EstMethod2 <- c(0.91,2.94,2.14,2.36,2.86,4.55)
EstMethod3 <- c(6.31,8.37,9.75,6.09,9.82,7.24)
EstMethod4 <- c(17.15,11.82,10.97,17.20,14.35,16.82)

FloodFlowTable <- data.frame(EstMethod1,EstMethod2,EstMethod3,EstMethod4)
FloodFlowTableLong <- pivot_longer(FloodFlowTable,c(EstMethod1,EstMethod2,EstMethod3,EstMethod4))

Running Analysis of Variance for Part (b) NPP

FloodFlowAOV <- aov(value ~ name, data = FloodFlowTableLong)
FloodFlowAOVTable <- summary(FloodFlowAOV)
Df Sum Sq Mean Sq F value Pr(>F)
name 3 708.67595 236.225315 76.28684 0
Residuals 20 61.93082 3.096541

Normally Distributed?

plot(FloodFlowAOV,2)

The normal probability plot shows that the data is somewhat normal, but has tails on each end. Because of the tailing on the ends, we will say it is not normally distributed.

Constant Variance?

boxplot(value~name, data = FloodFlowTableLong)

The box plot shows that the data does not have constant variance.

Part (c)

Perform a Kruskal-Wallace test in R (\(\alpha=0.05\))

kruskal.test(value~name,data=FloodFlowTableLong)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

The calculated p-value of \(9.77*10^{-5}\) is much smaller than \(\alpha=0.05\), so we reject the null hypothesis and conclude that at least one Estimation Method’s mean of observations significantly differs from the rest.

Part (d)

Select an appropriate transformation using Box Cox, transform the data, and test the hypothesis in R (\(\alpha=0.05\)).

Creating Box Cox Plot

boxcox(value~name,data=FloodFlowTableLong)

The Box Cox plot allows us to estimate that the value that maximizes the liklihood function is approximately \(\lambda=0.6\). We will now transform our data with that power.

Transforming Data with lambda=0.6 and Creating New Box Cox Plot

lambda <- 0.6
FloodFlowTableLong[,2] <- FloodFlowTableLong[,2]^lambda
boxcox(value~name,data=FloodFlowTableLong)

Running AOV on Transformed Data

FloodFlowTransformedAOV <- aov(value ~ name, data = FloodFlowTableLong)
summary(FloodFlowTransformedAOV)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         3  63.71  21.236   85.76 1.36e-11 ***
## Residuals   20   4.95   0.248                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Box Cox plot shows that 1 now falls in the 95% confidence interval. Re-running the AOV calculated a new p-value of \(1.36*10^{-11}\), still significantly smaller than \(\alpha=0.05\), so we reject the null hypothesis and conclude that at least one Estimation Method’s mean of observations significantly differs from the rest.