A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.
Write the linear effects equation and the hypothesis you are testing
The null hypothesis is that tau is equal to zero and the alternative hypothesis is that tau is not equal to zero. y_{ij} = u+tau_{i}+e_{ij} where I = 4 and J= 6
Does it appear the data is normally distributed? Does it appear that the variance is constant?
It doesn’t look normally distributed and the variance is not constant. We don’t have enough data to make an accurt conclusion about the normality assumptions
library(MASS)
library(tidyr)
Estimation_Method_1 <- c(.34,.12, 1.23, .70, 1.75, .12)
Estimation_Method_2 <- c( .91, 2.94, 2.14, 2.36, 2.86, 4.55)
Estimation_Method_3 <-c( 6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
Estimation_Method_4<- c( 17.15, 11.82, 10.97, 17.20, 14.35, 16.82)
Estimation_Methods <- data.frame(Estimation_Method_1,Estimation_Method_2,Estimation_Method_3,Estimation_Method_4)
Estimation_Method_Long <- pivot_longer(Estimation_Methods,c(Estimation_Method_1,Estimation_Method_2,Estimation_Method_3,Estimation_Method_4))
qqnorm(Estimation_Method_Long$value)
boxplot(Estimation_Method_Long$value~Estimation_Method_Long$name, names = c("Method 1","Method 2",'Method 3','Method 4'))
c)Perform a Kruskal-Wallace test in R (alpha=0.05)(nonparametric)
we concluded the p value to be 9.771*10^-05. Which is less than 0.05 and we have enough evidence to reject the null hypothesis.
kruskal.test(value~name, data = Estimation_Method_Long)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
(parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R (alpha=0.05)
After the transformation, we concluded the data to be normal distributed with a almost constant variance. We don’t have enough data to make an accurt conclusion about the normality assumptions. We concluded the p value to be 1.21*10^-11. Which is enough evidence to reject the null hypothesis.
boxcox(Estimation_Method_Long$value~Estimation_Method_Long$name)
Sqrt_Estimation_method <- data.frame(Estimation_Method_Long$name, (Estimation_Method_Long$value)^.7)
qqnorm(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7)
boxplot(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7~Sqrt_Estimation_method$Estimation_Method_Long.name, names = c("Method 1","Method 2",'Method 3','Method 4'))
Anova_test<- aov(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7~Sqrt_Estimation_method$Estimation_Method_Long.name)
summary(Anova_test)
## Df Sum Sq Mean Sq F value
## Sqrt_Estimation_method$Estimation_Method_Long.name 3 119.69 39.90 86.86
## Residuals 20 9.19 0.46
## Pr(>F)
## Sqrt_Estimation_method$Estimation_Method_Long.name 1.21e-11 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(MASS)
library(tidyr)
Estimation_Method_1 <- c(.34,.12, 1.23, .70, 1.75, .12)
Estimation_Method_2 <- c( .91, 2.94, 2.14, 2.36, 2.86, 4.55)
Estimation_Method_3 <-c( 6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
Estimation_Method_4<- c( 17.15, 11.82, 10.97, 17.20, 14.35, 16.82)
Estimation_Methods <- data.frame(Estimation_Method_1,Estimation_Method_2,Estimation_Method_3,Estimation_Method_4)
Estimation_Method_Long <- pivot_longer(Estimation_Methods,c(Estimation_Method_1,Estimation_Method_2,Estimation_Method_3,Estimation_Method_4))
qqnorm(Estimation_Method_Long$value)
boxplot(Estimation_Method_Long$value~Estimation_Method_Long$name, names = c("Method 1","Method 2",'Method 3','Method 4'))
kruskal.test(value~name, data = Estimation_Method_Long)
boxcox(Estimation_Method_Long$value~Estimation_Method_Long$name)
Sqrt_Estimation_method <- data.frame(Estimation_Method_Long$name, (Estimation_Method_Long$value)^.7)
qqnorm(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7)
boxplot(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7~Sqrt_Estimation_method$Estimation_Method_Long.name, names = c("Method 1","Method 2",'Method 3','Method 4'))
boxcox(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7~Sqrt_Estimation_method$Estimation_Method_Long.name)
Anova_test<- aov(Sqrt_Estimation_method$X.Estimation_Method_Long.value..0.7~Sqrt_Estimation_method$Estimation_Method_Long.name)
summary(Anova_test)