A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second)
We first setup our data and data frame
observation_1 = c(.34,.12,1.23,.70,1.75,.12)
observation_2 = c(.91,2.94,2.14,2.36,2.86,4.55)
observation_3 = c(6.31,8.37,9.75,6.09,9.82,7.24)
observation_4 = c(17.15,11.82,10.97,17.20,14.35,16.82)
all_observations<-c(observation_1, observation_2,observation_3,observation_4)
estimation_method<-c(rep(1,6), rep(2,6),rep(3,6),rep(4,6))
#Create data frame for analysis
data_frame<-data.frame(all_observations,estimation_method)
Our linear effects equation is:
\[{x_{ij}} = \mu + {\tau _i} + {\varepsilon _{ij}}\]
Our hypothesis is:
\[{H_0}:{\mu _1} = {\mu _2} = {\mu _3} = {\mu _4}\] \[{H_A}:{\mu _1} \ne {\mu _2} \ne {\mu _3} \ne {\mu _4}\]
We now look a the data to look at the normality of the data individually and collectively. We also will look at the data’s variance. We will use box plots and NPP plots to inspect the data.
#Let's do a NPP for each type to look out how normal our data is!
#Type1 NPP
qqnorm(observation_1, xlab = "Estimation 1 samples ", main = "Normal Probability Type 1")
qqline(observation_1, col = "purple", lwd = 2)
#Type2 NPP
qqnorm(observation_2, xlab = "Estimation 2 Samples ", main = "Normal Probability Type 2")
qqline(observation_2, col = "red2", lwd = 2)
#Type3 NPP
qqnorm(observation_3, xlab = "Estimation 3 Samples ", main = "Normal Probability Type 3")
qqline(observation_3, col = "blue4", lwd = 2)
#Type4 NPP
qqnorm(observation_4, xlab = "Estimation 4 Samples ", main = "Normal Probability Type 4")
qqline(observation_4, col = "yellow", lwd = 2)
Observing the four data sets NPP plots individually, we can see the 1,2,and 3 seem reasonably linear. However data set 4 is clearly not linear. We also have an issue with so few data points in all the sets to make a definite clear conclusion. We will make a more definite conclusion we we consider the collective data later in this document.
boxplot(all_observations~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")
mean_estimation_mthd=c(rep(mean(observation_1),6),rep(mean(observation_2),6),rep(mean(observation_3),6),rep(mean(observation_4),6))
observation_residuals= all_observations - mean_estimation_mthd
qqnorm(observation_residuals)
plot(mean_estimation_mthd,observation_residuals, xlab = "population average", ylab="residuals", main="constant variance vs population means")
We can cleary see from the box plot that the variance of the data sets vary widely. The means vary greatly also. Looking at the collective NPP plot is is clear that this data does not have a normal shape and would not be considered normal. We will attempt to use a transform to condition the data for a better analysis. The wide variance can also be seen in the residual plot. As currently condition the data seems to indicate that the null hypothesis would not be accepted. There is a clear difference in the data sets.Clearly the 4 different methods of measuring flood flow frequency are not equivalent.
model<-aov(all_observations~estimation_method, data = data_frame)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## estimation_method 1 672.0 672.0 149.9 2.7e-11 ***
## Residuals 22 98.6 4.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The P value is extremely small so we would reject the null hypothesis in the strongest way.
Using a Box Cox transformation to attempt the data we have the following From this initial plot we can see that a lambda of 0.5 may help transform the data. So we will attempt the transform with a lambda exponent of 0.5
library(MASS)
boxcox(all_observations~estimation_method)
lambda = 0.5
all_observations_lambda_mod = (all_observations)^lambda
oneway.test(all_observations_lambda_mod~estimation_method,var.equal = TRUE)
##
## One-way analysis of means
##
## data: all_observations_lambda_mod and estimation_method
## F = 81.166, num df = 3, denom df = 20, p-value = 2.266e-11
boxcox(all_observations_lambda_mod~estimation_method)
boxplot(all_observations_lambda_mod~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")
kruskal.test(all_observations~estimation_method, data = data_frame)
##
## Kruskal-Wallis rank sum test
##
## data: all_observations by estimation_method
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
kruskal.test(all_observations_lambda_mod~estimation_method, data = data_frame)
##
## Kruskal-Wallis rank sum test
##
## data: all_observations_lambda_mod by estimation_method
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
Even with transformation we can see that the data still has a wide variance and greatly differing means. The P values of the all the modified data has an extremely small value. So we can strong conclude that the null hypothesis is definitely false. The methods that the engineer is attempting to use differ greatly in their results.
knitr::opts_chunk$set(echo = TRUE,warning=FALSE, message=FALSE)
observation_1 = c(.34,.12,1.23,.70,1.75,.12)
observation_2 = c(.91,2.94,2.14,2.36,2.86,4.55)
observation_3 = c(6.31,8.37,9.75,6.09,9.82,7.24)
observation_4 = c(17.15,11.82,10.97,17.20,14.35,16.82)
all_observations<-c(observation_1, observation_2,observation_3,observation_4)
estimation_method<-c(rep(1,6), rep(2,6),rep(3,6),rep(4,6))
#Create data frame for analysis
data_frame<-data.frame(all_observations,estimation_method)
#Let's do a NPP for each type to look out how normal our data is!
#Type1 NPP
qqnorm(observation_1, xlab = "Estimation 1 samples ", main = "Normal Probability Type 1")
qqline(observation_1, col = "purple", lwd = 2)
#Type2 NPP
qqnorm(observation_2, xlab = "Estimation 2 Samples ", main = "Normal Probability Type 2")
qqline(observation_2, col = "red2", lwd = 2)
#Type3 NPP
qqnorm(observation_3, xlab = "Estimation 3 Samples ", main = "Normal Probability Type 3")
qqline(observation_3, col = "blue4", lwd = 2)
#Type4 NPP
qqnorm(observation_4, xlab = "Estimation 4 Samples ", main = "Normal Probability Type 4")
qqline(observation_4, col = "yellow", lwd = 2)
boxplot(all_observations~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")
mean_estimation_mthd=c(rep(mean(observation_1),6),rep(mean(observation_2),6),rep(mean(observation_3),6),rep(mean(observation_4),6))
observation_residuals= all_observations - mean_estimation_mthd
qqnorm(observation_residuals)
plot(mean_estimation_mthd,observation_residuals, xlab = "population average", ylab="residuals", main="constant variance vs population means")
model<-aov(all_observations~estimation_method, data = data_frame)
summary(model)
#do our One-Way test with AOV and get a summary
#model<-aov(,data=df) #write aov to an object, easy to call over and over
#summary(model) #summary of an aov object gives back anova
library(MASS)
boxcox(all_observations~estimation_method)
lambda = 0.5
all_observations_lambda_mod = (all_observations)^lambda
oneway.test(all_observations_lambda_mod~estimation_method,var.equal = TRUE)
boxcox(all_observations_lambda_mod~estimation_method)
boxplot(all_observations_lambda_mod~estimation_method, xlab="estimation method", ylab = "observations",main="Boxplot of Estimation Methods")
kruskal.test(all_observations~estimation_method, data = data_frame)
kruskal.test(all_observations_lambda_mod~estimation_method, data = data_frame)
```