1 Problem Statement

A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second)

We first setup our data and data frame

observation_1 = c(.34,.12,1.23,.70,1.75,.12)
observation_2 = c(.91,2.94,2.14,2.36,2.86,4.55)
observation_3 = c(6.31,8.37,9.75,6.09,9.82,7.24)
observation_4 = c(17.15,11.82,10.97,17.20,14.35,16.82)


all_observations<-c(observation_1, observation_2,observation_3,observation_4)
estimation_method<-c(rep(1,6), rep(2,6),rep(3,6),rep(4,6))

#Create data frame for analysis
data_frame<-data.frame(all_observations,estimation_method)

Our linear effects equation is:

\[{x_{ij}} = \mu + {\tau _i} + {\varepsilon _{ij}}\]

Our hypothesis is:

\[{H_0}:{\mu _1} = {\mu _2} = {\mu _3} = {\mu _4}\] \[{H_A}:{\mu _1} \ne {\mu _2} \ne {\mu _3} \ne {\mu _4}\]

1.1 Normality of the individual data sets

We now look a the data to look at the normality of the data individually and collectively. We also will look at the data’s variance. We will use box plots and NPP plots to inspect the data.

#Let's do a NPP for each type to look out how normal our data is!
#Type1 NPP
qqnorm(observation_1, xlab = "Estimation 1 samples ", main = "Normal Probability Type 1")
qqline(observation_1, col = "purple", lwd = 2)

#Type2 NPP
qqnorm(observation_2, xlab = "Estimation 2 Samples ", main = "Normal Probability Type 2")
qqline(observation_2, col = "red2", lwd = 2)

#Type3 NPP
qqnorm(observation_3, xlab = "Estimation 3 Samples ", main = "Normal Probability Type 3")
qqline(observation_3, col = "blue4", lwd = 2)

#Type4 NPP
qqnorm(observation_4, xlab = "Estimation 4 Samples ", main = "Normal Probability Type 4")
qqline(observation_4, col = "yellow", lwd = 2)

Observing the four data sets NPP plots individually, we can see the 1,2,and 3 seem reasonably linear. However data set 4 is clearly not linear. We also have an issue with so few data points in all the sets to make a definite clear conclusion. We will make a more definite conclusion we we consider the collective data later in this document.

1.2 Variance and Collective NPP

boxplot(all_observations~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")

mean_estimation_mthd=c(rep(mean(observation_1),6),rep(mean(observation_2),6),rep(mean(observation_3),6),rep(mean(observation_4),6))
observation_residuals= all_observations - mean_estimation_mthd
qqnorm(observation_residuals)

plot(mean_estimation_mthd,observation_residuals, xlab = "population average", ylab="residuals", main="constant variance vs population means")

1.3 Variance and NPP Verdict

We can cleary see from the box plot that the variance of the data sets vary widely. The means vary greatly also. Looking at the collective NPP plot is is clear that this data does not have a normal shape and would not be considered normal. We will attempt to use a transform to condition the data for a better analysis. The wide variance can also be seen in the residual plot. As currently condition the data seems to indicate that the null hypothesis would not be accepted. There is a clear difference in the data sets.Clearly the 4 different methods of measuring flood flow frequency are not equivalent.

1.4 ANOVA test with aov

model<-aov(all_observations~estimation_method, data = data_frame)
summary(model)
##                   Df Sum Sq Mean Sq F value  Pr(>F)    
## estimation_method  1  672.0   672.0   149.9 2.7e-11 ***
## Residuals         22   98.6     4.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1.5 ANOVA Conclusion

The P value is extremely small so we would reject the null hypothesis in the strongest way.

1.6 Box Cox transformation

Using a Box Cox transformation to attempt the data we have the following From this initial plot we can see that a lambda of 0.5 may help transform the data. So we will attempt the transform with a lambda exponent of 0.5

library(MASS)
boxcox(all_observations~estimation_method)

lambda = 0.5
all_observations_lambda_mod = (all_observations)^lambda

oneway.test(all_observations_lambda_mod~estimation_method,var.equal = TRUE)
## 
##  One-way analysis of means
## 
## data:  all_observations_lambda_mod and estimation_method
## F = 81.166, num df = 3, denom df = 20, p-value = 2.266e-11
boxcox(all_observations_lambda_mod~estimation_method)

boxplot(all_observations_lambda_mod~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")

kruskal.test(all_observations~estimation_method, data = data_frame)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  all_observations by estimation_method
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
kruskal.test(all_observations_lambda_mod~estimation_method, data = data_frame)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  all_observations_lambda_mod by estimation_method
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

1.7 Summary and Conclusion

Even with transformation we can see that the data still has a wide variance and greatly differing means. The P values of the all the modified data has an extremely small value. So we can strong conclude that the null hypothesis is definitely false. The methods that the engineer is attempting to use differ greatly in their results.

2 Source Code

knitr::opts_chunk$set(echo = TRUE,warning=FALSE, message=FALSE)

observation_1 = c(.34,.12,1.23,.70,1.75,.12)
observation_2 = c(.91,2.94,2.14,2.36,2.86,4.55)
observation_3 = c(6.31,8.37,9.75,6.09,9.82,7.24)
observation_4 = c(17.15,11.82,10.97,17.20,14.35,16.82)


all_observations<-c(observation_1, observation_2,observation_3,observation_4)
estimation_method<-c(rep(1,6), rep(2,6),rep(3,6),rep(4,6))

#Create data frame for analysis
data_frame<-data.frame(all_observations,estimation_method)

#Let's do a NPP for each type to look out how normal our data is!
#Type1 NPP
qqnorm(observation_1, xlab = "Estimation 1 samples ", main = "Normal Probability Type 1")
qqline(observation_1, col = "purple", lwd = 2)

#Type2 NPP
qqnorm(observation_2, xlab = "Estimation 2 Samples ", main = "Normal Probability Type 2")
qqline(observation_2, col = "red2", lwd = 2)

#Type3 NPP
qqnorm(observation_3, xlab = "Estimation 3 Samples ", main = "Normal Probability Type 3")
qqline(observation_3, col = "blue4", lwd = 2)

#Type4 NPP
qqnorm(observation_4, xlab = "Estimation 4 Samples ", main = "Normal Probability Type 4")
qqline(observation_4, col = "yellow", lwd = 2)



boxplot(all_observations~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")
mean_estimation_mthd=c(rep(mean(observation_1),6),rep(mean(observation_2),6),rep(mean(observation_3),6),rep(mean(observation_4),6))
observation_residuals= all_observations - mean_estimation_mthd
qqnorm(observation_residuals)
plot(mean_estimation_mthd,observation_residuals, xlab = "population average", ylab="residuals", main="constant variance vs population means")

model<-aov(all_observations~estimation_method, data = data_frame)
summary(model)
#do our One-Way test with AOV and get a summary
#model<-aov(,data=df) #write aov to an object, easy to call over and over
#summary(model) #summary of an aov object gives back anova


library(MASS)
boxcox(all_observations~estimation_method)

lambda = 0.5
all_observations_lambda_mod = (all_observations)^lambda

oneway.test(all_observations_lambda_mod~estimation_method,var.equal = TRUE)

boxcox(all_observations_lambda_mod~estimation_method)
boxplot(all_observations_lambda_mod~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")
kruskal.test(all_observations~estimation_method, data = data_frame)
kruskal.test(all_observations_lambda_mod~estimation_method, data = data_frame)

```