1 Assignment Description

VST and Kruskal-Wallace Assignment

A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.

Estimation Method			Observations
1	.34	.12	1.23	.7	1.75	.12
2	.91	2.94	2.14	2.36	2.86	4.55
3	6.31	8.37	9.75	6.09	9.82	7.24
4	17.15	11.82	10.97	17.2	14.35	16.82

Write the linear effects equation and the hypothesis you are testing
Does it appear the data is normally distributed? Does it appear that the variance is constant?
(nonparametric) Perform a Kruskal-Wallace test in R ( a = 0.05)
(parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R ( a =0.05)

1.1 Solution

Question a

Linear Effects Equation:

\[ Y_{ij}=\mu +\tau_{ij}+\epsilon _{ij} \]

Hypothesis:

\[ H_0: \tau _{i}=0 \]

\[ H_a: \tau _{i}\neq 0 \]

Where μ = Mean, = effect, = effect , H_o = Hull Hypothesis and H_a = alternate hypothesis

Question b

We use ANOVA model to predict variance and normality using the Residuals vs Fitted plot

library(tidyr)
library(dplyr)
data1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
data2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
data3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
data4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
data<-data.frame(data1,data2,data3,data4)
datapivot<-pivot_longer(data,c(data1,data2,data3,data4))

aov.model<-aov(value~name,data=datapivot)
summary(aov.model)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## name         3  708.7   236.2   76.29  4e-11 ***
## Residuals   20   61.9     3.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(aov.model)

Conclusion:

Looking at the Residuals vs Fitted plot, we cannot confirm a constant spread between the residuals at the different fitted values indicating unstable variance.

For validation, we may also use the boxplot to visually check for variance as shown below. The width of the boxplot varies too much with respect to each different estimation method indicating unstable variance.

dataplot<-c(data1,data2,data3,data4)
x<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
meanx<-c(rep(mean(data1),6),rep(mean(data2),6),rep(mean(data3,6),6),rep(mean(data4,6),6))
boxplot(dataplot~x,xlab="Estimation Method",ylab="observation",main="Boxplot of Observations")

Question c:

Performimg Kruskal-Wallace test

kruskal.test(value~name,data=datapivot)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Conclusion:

With a p-value of 9.771e-05 & 0.05 level of significance, we will reject the null hypothesis.

Question d:

We will use the BoxCot transformation to stabilize the variance.

library(MASS)
boxcox(dataplot~x)

Based on the above plot, we can say the value of lambda is approximately 0.5

lambda=.5  # only if 1 is not in CI on lambda
dataplot<-dataplot^(lambda) # if lambda is not zero
#pop<-log(pop) # if lambda is equal to zero 
boxcox(dataplot~x)

The above plot now shows a value of lambda close to 1, indicating the transformation was effective.

boxplot(dataplot~x,xlab="Method Type",ylab="Flow Frequency",main="Boxplot of Observations")

2 Complete R Code

#Question b
library(tidyr)
library(dplyr)
data1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
data2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
data3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
data4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
data<-data.frame(data1,data2,data3,data4)
datapivot<-pivot_longer(data,c(data1,data2,data3,data4))

#ANOVA way
aov.model<-aov(value~name,data=datapivot)
summary(aov.model)
plot(aov.model)
#visual way
dataplot<-c(data1,data2,data3,data4)
x<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
meanx<-c(rep(mean(data1),6),rep(mean(data2),6),rep(mean(data3,6),6),rep(mean(data4,6),6))
boxplot(dataplot~x,xlab="Estimation Method",ylab="observation",main="Boxplot of Observations")

#Question c
kruskal.test(value~name,data=datapivot)

#Question d
library(MASS)
boxcox(dataplot~x)
lambda=.5  # only if 1 is not in CI on lambda
dataplot<-dataplot^(lambda) # if lambda is not zero
#pop<-log(pop) # if lambda is equal to zero 
boxcox(dataplot~x)
boxplot(dataplot~x,xlab="Method Type",ylab="Flow Frequency",main="Boxplot of Observations")

RMarkdown Test Document

Author Ajala, Ponmile

Author Kininge, Rucha

Author Tejada, Omar

Last compiled on October 05, 2023 at 10:09 PM - CDT

1 Assignment Description

1.1 Solution

2 Complete R Code