Flipped Assignment 10, VST and Kruskal-Wallace Assignment

A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.

# load data into a data frame
dat<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/FlippedAssignment10.csv", header=TRUE)
names(dat) <- sub('X', '', names(dat))
dat

##      1    2    3     4
## 1 0.34 0.91 6.31 17.15
## 2 0.12 2.94 8.37 11.82
## 3 1.23 2.14 9.75 10.97
## 4 0.70 2.36 6.09 17.20
## 5 1.75 2.86 9.82 14.35
## 6 0.12 4.55 7.24 16.82

a. Write the linear effects equation and the hypothesis you are testing

The following equation describes the components of the linear effects model:

The hypotheses to test are:

  H₀: µ_k = µ_l for all k,l ∈ {1,2,3,4}
  H₁: µ_k ≠ µ_l for any k,l ∈ {1,2,3,4} where k ≠ l

b. Does it appear the data is normally distributed? Does it appear that the variance is constant?

The data is NOT normally distributed, because the ends deviate in different directions.

# AOV normal Q-Q plots of the data for the 4 estimation methods
aov.model<-aov(values ~ ind,data=(stack(dat)))
plot(aov.model,2)

The data sets for each treatment do not have equal variance.

# AOV residuals vs fitted comparison the data for the 4 estimation methods
aov.model<-aov(values ~ ind,data=(stack(dat)))
plot(aov.model,1)

# boxplots comparing the 4 estimation methods
boxplot(values ~ ind, data = stack(dat), col=c("steelblue","firebrick2", "forestgreen","darkorange"),xlab = "estimation methods",ylab = "discharge (cubic fee/second)",main="peak discharge estimates")

c. (nonparametric) Perform a Kruskal-Wallis test in R (α=0.05)

The P-value = 9.771e-05 < 0.05 = α, therefore we reject the null hypothesis. One or more of the estimation methods differs significantly from at least one of the others.

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat)))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  values by ind
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

d. (parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R (α=0.05)

From boxcox, our λ ≈ 0.55, and after the power transformation, the P-value = 1.67e-11 < 0.05 = α, therefore we reject the null hypothesis.

# Boxcox estimate
library(MASS)
boxcox(stack(dat)[,1]~stack(dat)[,2], plotit=TRUE )

# power transformation of the data using λ ≈ 0.55
dat[,1]<-dat[,1]^0.55
dat[,2]<-dat[,2]^0.55
dat[,3]<-dat[,3]^0.55
dat[,4]<-dat[,4]^0.55
dat

##           1         2        3        4
## 1 0.5524760 0.9494515 2.754331 4.773592
## 2 0.3115657 1.8096352 3.217355 3.889911
## 3 1.1205928 1.5195940 3.499069 3.733478
## 4 0.8218715 1.6036205 2.701093 4.781241
## 5 1.3604135 1.7823842 3.512864 4.327814
## 6 0.3115657 2.3009452 2.970682 4.722852

# Verify that boxcox λ ≈ 1 after transformation = SUCCESS
boxcox(stack(dat)[,1]~stack(dat)[,2], plotit=TRUE )

# One-way ANOVA
aov<-aov(values ~ ind,data=(stack(dat)))
summary(aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ind          3  45.90  15.299   83.91 1.67e-11 ***
## Residuals   20   3.65   0.182                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Complete Code

Complete R code used in this analysis.

# load data into a data frame
dat<-read.csv("https://raw.githubusercontent.com/forestwhite/RStatistics/main/FlippedAssignment10.csv", header=TRUE)
names(dat) <- sub('X', '', names(dat))
dat

# AOV normal Q-Q plots of the data for the 4 estimation methods
aov.model<-aov(values ~ ind,data=(stack(dat)))
plot(aov.model,2)

# AOV residuals vs fitted comparison the data for the 4 estimation methods
aov.model<-aov(values ~ ind,data=(stack(dat)))
plot(aov.model,1)

# boxplots comparing the 4 estimation methods
boxplot(values ~ ind, data = stack(dat), col=c("steelblue","firebrick2", "forestgreen","darkorange"),xlab = "estimation methods",ylab = "discharge (cubic fee/second)",main="peak discharge estimates")

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat)))

# Boxcox estimate
library(MASS)
boxcox(stack(dat)[,1]~stack(dat)[,2], plotit=TRUE )

# power transformation of the data using λ ≈ 0.55
dat[,1]<-dat[,1]^0.55
dat[,2]<-dat[,2]^0.55
dat[,3]<-dat[,3]^0.55
dat[,4]<-dat[,4]^0.55
dat

# Verify that boxcox λ ≈ 1 after transformation = SUCCESS
boxcox(stack(dat)[,1]~stack(dat)[,2], plotit=TRUE )

# One-way ANOVA
aov<-aov(values ~ ind,data=(stack(dat)))
summary(aov)

# Kruskal-Wallis Test
kruskal.test(values ~ ind,data=(stack(dat)))