VST and Kruskal-Wallace Assignment
A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.
| Estimation Method | Observations | |||||
| 1 | .34 | .12 | 1.23 | .7 | 1.75 | .12 |
| 2 | .91 | 2.94 | 2.14 | 2.36 | 2.86 | 4.55 |
| 3 | 6.31 | 8.37 | 9.75 | 6.09 | 9.82 | 7.24 |
| 4 | 17.15 | 11.82 | 10.97 | 17.2 | 14.35 | 16.82 |
Question a
Linear Effects Equation:
\[ Y_{ij}=\mu +\tau_{ij}+\epsilon _{ij} \]
Hypothesis:
\[ H_0: \tau _{i}=0 \]
\[ H_a: \tau _{i}\neq 0 \]
Where μ = Mean, = effect, = effect , H_o = Hull Hypothesis and H_a = alternate hypothesis
Question b
We use ANOVA model to predict variance and normality using the Residuals vs Fitted plot
library(tidyr)
library(dplyr)
data1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
data2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
data3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
data4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
data<-data.frame(data1,data2,data3,data4)
datapivot<-pivot_longer(data,c(data1,data2,data3,data4))
aov.model<-aov(value~name,data=datapivot)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 708.7 236.2 76.29 4e-11 ***
## Residuals 20 61.9 3.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)
Conclusion:
Looking at the Residuals vs Fitted plot, we cannot confirm a constant spread between the residuals at the different fitted values indicating unstable variance.
For validation, we may also use the boxplot to visually check for variance as shown below. The width of the boxplot varies too much with respect to each different estimation method indicating unstable variance.
dataplot<-c(data1,data2,data3,data4)
x<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
meanx<-c(rep(mean(data1),6),rep(mean(data2),6),rep(mean(data3,6),6),rep(mean(data4,6),6))
boxplot(dataplot~x,xlab="Estimation Method",ylab="observation",main="Boxplot of Observations")
Question c:
Performimg Kruskal-Wallace test
kruskal.test(value~name,data=datapivot)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
Conclusion:
With a p-value of 9.771e-05 & 0.05 level of significance, we will reject the null hypothesis.
Question d:
We will use the BoxCot transformation to stabilize the variance.
library(MASS)
boxcox(dataplot~x)
Based on the above plot, we can say the value of lambda is approximately 0.5
lambda=.5 # only if 1 is not in CI on lambda
dataplot<-dataplot^(lambda) # if lambda is not zero
#pop<-log(pop) # if lambda is equal to zero
boxcox(dataplot~x)
The above plot now shows a value of lambda close to 1, indicating the transformation was effective.
boxplot(dataplot~x,xlab="Method Type",ylab="Flow Frequency",main="Boxplot of Observations")
#Question b
library(tidyr)
library(dplyr)
data1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
data2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
data3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
data4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
data<-data.frame(data1,data2,data3,data4)
datapivot<-pivot_longer(data,c(data1,data2,data3,data4))
#ANOVA way
aov.model<-aov(value~name,data=datapivot)
summary(aov.model)
plot(aov.model)
#visual way
dataplot<-c(data1,data2,data3,data4)
x<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
meanx<-c(rep(mean(data1),6),rep(mean(data2),6),rep(mean(data3,6),6),rep(mean(data4,6),6))
boxplot(dataplot~x,xlab="Estimation Method",ylab="observation",main="Boxplot of Observations")
#Question c
kruskal.test(value~name,data=datapivot)
#Question d
library(MASS)
boxcox(dataplot~x)
lambda=.5 # only if 1 is not in CI on lambda
dataplot<-dataplot^(lambda) # if lambda is not zero
#pop<-log(pop) # if lambda is equal to zero
boxcox(dataplot~x)
boxplot(dataplot~x,xlab="Method Type",ylab="Flow Frequency",main="Boxplot of Observations")