VST and Kruskal-Wallace Assignment:

1 Question: 1:

civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below:

**Observations**
Estimation Method
1	0.34	0.12	1.23	0.70	1.75	0.12
2	0.91	2.94	2.14	2.36	2.86	4.55
3	6.31	8.37	9.75	6.09	9.82	7.24
4	17.15	11.82	10.97	17.20	14.35	16.82

a)Write the linear effects equation and the hypothesis you are testing

b)Does it appear the data is normally distributed? Does it appear that the variance is constant?

c)(nonparametric) Perform a Kruskal-Wallace test in R (a=0.05)

d)(parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R (a=0.05)

1.1 Solution:

PART A:

Linear Effects Equation:

\[ Y_{i,j}=\mu +\tau _{i,j}+\epsilon _{i,j} \]

Where, \(\mu\) =Mean, \(\tau\) =effect, \(\epsilon\)=error

Stating the Hypothesis:

\[ H_O: \tau_{i}=0 \]

\[ H_a: \tau_{i}\neq0 \]

PART B:

We will run ANOVA Model so that we can look at residuals and predict on variance and normality:

method1<-c(0.34,0.12,1.23,0.70,1.75,0.12)

method2<-c(0.91,2.94,2.14,2.36,2.86,4.55)

method3<-c(6.31,8.37,9.75,6.09,9.82,7.24)

method4<-c(17.15,11.82,10.97,17.20,14.35,16.82)

dat<-data.frame(method1,method2,method3,method4)

Setting up data in TIDYR Format:

library(tidyr)
dat<-pivot_longer(dat, c(method1,method2,method3,method4))
print(dat)

## # A tibble: 24 × 2
##    name    value
##    <chr>   <dbl>
##  1 method1  0.34
##  2 method2  0.91
##  3 method3  6.31
##  4 method4 17.2 
##  5 method1  0.12
##  6 method2  2.94
##  7 method3  8.37
##  8 method4 11.8 
##  9 method1  1.23
## 10 method2  2.14
## # … with 14 more rows

str(dat)

## tibble [24 × 2] (S3: tbl_df/tbl/data.frame)
##  $ name : chr [1:24] "method1" "method2" "method3" "method4" ...
##  $ value: num [1:24] 0.34 0.91 6.31 17.15 0.12 ...

Performing Anova Analysis:

aov.model<-aov(value~name,data=dat)
summary(aov.model)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## name         3  708.7   236.2   76.29  4e-11 ***
## Residuals   20   61.9     3.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(aov.model)

Comment:

---> Data points are too less to make a reasonable test of normality. For assessing constant variance, we can utilize “Residuals vs Fitted” plot, which shows that variance is also not stable..

Also, using Box Plot to check for variance more visually:

observations <- c(.34,.12,1.23,.7,1.75,.12,
                  .91, 2.94,2.14,2.36,2.86,4.55,
                  6.31,8.37,9.75,6.09,9.82,7.24,
                  17.15,11.82,10.97,17.20,14.35,16.82)
estMethod <- as.factor(c(1,1,1,1,1,1,
                         2,2,2,2,2,2,
                         3,3,3,3,3,3,
                         4,4,4,4,4,4))
boxplot(observations~estMethod, main="Boxplot of Observations vs Estimating Method")

Comment:

---> Plots and the Box Plot above shows that variance of the data is not same.

PART C:

Since our varaince is not stable among different method types so we are using non-parametric test.

kruskal.test(value~name, data = dat)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Conclusion:

---> Our p-value is very small (9.771e-05), so we would reject null hypothesis at \(\alpha\)=0.05 level of significance.

PART D:

Transforming the data using box cox, so we can test our hypothesis through transformed data:

library(MASS)
boxcox(value~name, data=dat)

Based on the results of the Box-Cox analysis Using Lambda=0.5

lambda = 0.5
dat2<-dat$value^(lambda)
dat2<-cbind(dat$name, dat2)
dat2<-as.data.frame(dat2)
print(dat2)

##         V1              dat2
## 1  method1  0.58309518948453
## 2  method2 0.953939201416946
## 3  method3  2.51197133741609
## 4  method4  4.14125584816973
## 5  method1 0.346410161513775
## 6  method2  1.71464281994822
## 7  method3  2.89309522829789
## 8  method4  3.43802268753422
## 9  method1  1.10905365064094
## 10 method2  1.46287388383278
## 11 method3   3.1224989991992
## 12 method4  3.31209903233584
## 13 method1 0.836660026534076
## 14 method2  1.53622914957372
## 15 method3  2.46779253585061
## 16 method4  4.14728827066554
## 17 method1   1.3228756555323
## 18 method2  1.69115345252878
## 19 method3  3.13368792319848
## 20 method4  3.78813938497516
## 21 method1 0.346410161513775
## 22 method2  2.13307290077015
## 23 method3  2.69072480941474
## 24 method4  4.10121933088198

str(dat2)

## 'data.frame':    24 obs. of  2 variables:
##  $ V1  : chr  "method1" "method2" "method3" "method4" ...
##  $ dat2: chr  "0.58309518948453" "0.953939201416946" "2.51197133741609" "4.14125584816973" ...

boxcox(dat2~dat$name, data=dat2)

---> We can see that the value of lamda is near to 1 after performing boxcox transformation

dat3<-observations^(lambda)

boxplot(dat3~estMethod,xlab="Method Type",ylab="Flood Flow Frequency",main="Boxplot of Observations")

---> Our variances now look more constant (NOT PERFECT) as the range on the y axis became much smaller.

Testing Hypothesis:

model<-aov(dat2~dat$name,data=dat2)
summary(model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## dat$name     3  32.69  10.898   81.17 2.27e-11 ***
## Residuals   20   2.69   0.134                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

CONCLUSION:

---> Anova analysis of the transformed data gives us a p-value, that is too small (2.27e-11) so at 0.05 level of significance we would reject null hypothesis and thus we would conclude that estimation method has a significant effect on mean flood flow frequency.

Also if we look at our residual plots, we can see that after transformation variance among different estimation methods has become much more stable.

2 Source Code:

getwd()
#Question 1B:
#We will run aov so that we can look at residuals and predict on variance and normality
method1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
method2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
method3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
method4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
dat<-data.frame(method1,method2,method3,method4)
library(tidyr)
dat<-pivot_longer(dat, c(method1,method2,method3,method4))
print(dat)
str(dat)
#ANOVA ANALYSIS:
aov.model<-aov(value~name,data=dat)
summary(aov.model)
plot(aov.model)
#Data points in this case are too less to make a reasonable test of normality. For assessing constant variance, we can utilize “Residuals vs Fitted” plot, which shows that variance is not stable among different methods.

observations <- c(.34,.12,1.23,.7,1.75,.12,
                  .91, 2.94,2.14,2.36,2.86,4.55,
                  6.31,8.37,9.75,6.09,9.82,7.24,
                  17.15,11.82,10.97,17.20,14.35,16.82)
estMethod <- as.factor(c(1,1,1,1,1,1,
                         2,2,2,2,2,2,
                         3,3,3,3,3,3,
                         4,4,4,4,4,4))
boxplot(observations~estMethod, main="Boxplot of Observations vs Estimating Method")
#plots show concerns about the variance of the data

#PART C:
#Since our varaince is not stable among different method types so we are using non-parametric test.
kruskal.test(value~name, data = dat)
#Since our p-value is very small, so we would reject null hypothesis at a reasonable level of significance.

#PART D:
#Transforming the data using box cox, so we can test our hypothesis through transformed data:
library(MASS)
boxcox(value~name, data=dat)
#Based on the results of the Box-Cox analysis Lambda=0.5
lambda = 0.5
dat2<-dat$value^(lambda)
dat2<-cbind(dat$name, dat2)
dat2<-as.data.frame(dat2)
print(dat2)
str(dat2)
boxcox(dat2~dat$name, data=dat2)
#We can see that the value of lamda is near to 1 after performing boxcox transformation
dat3<-observations^(lambda)
boxplot(dat3~estMethod,xlab="Method Type",ylab="Flood Flow Frequency",main="Boxplot of Observations")
#Our variances then looked more constant as the range on the y axis became much smaller.

#Testing Hypothesis:
model<-aov(dat2~dat$name,data=dat2)
summary(model)
plot(model)

Flipped Assignment 10

Saad Mirza

Rahul Lamba

Dowthwaksai

Last compiled on October 04, 2022 at 11:54 AM - CDT

1 Question: 1:

1.1 Solution:

2 Source Code: