Summary

A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below.

We begin with the linear effects equation:

\[Y_{ij}=\mu + \tau_i + \epsilon_{ij}\]

where:

\[i = 1,2,3,4 \text{ (methods)}\]

\[j=1,2,3,4,5,6 \text{ (replicates per method)}\]

The hypothesis that we are testing is that all method effects are equal, given by:

\[H_o:\tau_1=\tau_2=\tau_3=\tau_4\] \[H_a:\text{at least one method is different}\]

Data Normality and Variance

Let’s look at the normality of the data by plotting a Q-Q plot.

library(ggplot2)
library(MASS)#for boxcox

#build the data set
method1<-c(.34,.12,1.23,.7,1.75,.12)
method2<-c(.91,2.94,2.14,2.36,2.86,4.55)
method3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
method4<-c(17.15,11.82,10.97,17.20,14.35,16.82)

discharge_data<-c(method1,method2,method3,method4)
methods<-rep(c("Method1","Method2","Method3","Method4"),each=6)

#Plot Q-Q plot to look at normality
qqnorm(discharge_data);qqline(discharge_data)

Looking at the output of the Q-Q plot, we can see that there is significant deviation in the data, particularly the tails. The data is not normally distributed, and the variance is not constant.

ANOVA

We fit the data to an ANOVA model:

#Build data frame for ANOVA
dat<-data.frame(
  Discharge=discharge_data,
  Methods=as.factor(methods)
)

#Perform 1-way ANOVA
results_anova<-aov(Discharge~Methods,data=dat)
summary(results_anova)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Methods      3  708.7   236.2   76.29  4e-11 ***
## Residuals   20   61.9     3.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looking at the summary of the ANOVA, it shows a F-statistic of 76.29 and a very small p-value. This indicates that there is a significant difference between the means of the methods.

Plotting the results of the ANOVA:

#Plot ANOVA output
plot(results_anova)

Residuals vs Fitted

This plot shows that the variance increases as the fitted values increase. This indicates non-constant variance.

Q-Q plot

This plot shows that the data is not normally distributed with apparent curves at each end.

Scale-Location

The red line is not flat, which indicates that the variance is not constant.

Residuals vs Leverage

This plot also indicates non-constant variance due to the red line not being flat.

Box Cox

Taking the ANOVA results, we can run the BoxCox function to find an optimal lambda value.

boxcox(results_anova)

We can see from the plot that the optimal lambda is: \[\lambda=0.5\] This lambda will be a square-root transformation of the data.

#Transform the data with a square-root transformation. (lambda=0.5)
dat$Transformed_Discharge<-dat$Discharge^0.5

#run the anova with transformed data
results_anova_transformed<-aov(Transformed_Discharge~Methods,data=dat)

summary(results_anova_transformed)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Methods      3  32.69  10.898   81.17 2.27e-11 ***
## Residuals   20   2.69   0.134                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(results_anova_transformed)

Summary of Transformed ANOVA

Looking at the ANOVA results of the transformed data, the F-statistic is 81.17 and the p-values is very small. Since the p-value is much smaller than the alpha level of .05, we reject the null hypothesis.

Residuals vs. Fitted

The data appears to be improved and the red line is now much flatter. This indicates that the constant variance assumption is now met.

Q-Q Residuals

Completing a square-root transformation of the data helped to cleanup some of the tails on the Q-Q plot, but the data still appears to look to not be normally distributed.

Scale-Location

The Scale-Location plot has tightened up, but the red line is not flat, so there may still be some non-constant variance.

Residuals vs Leverage

The Residuals vs Leverage plot for the transformed data does show an improvement and the red line is now much flatter, indicating constant variance.

Kruskal-Wallis Test

We now take the data and perform a Kruskal-Wallis test on it:

#Kruskal-Wallis test
Kruskal_Results<-kruskal.test(Discharge~Methods, data=dat)

#Print the Kruskal-Wallis test results
print(Kruskal_Results)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Discharge by Methods
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Results

The Kruskal-Wallis test outputs a chi-squared of 21.156. The degrees of freedom is 3, which matches the fact that it should be the number of groups minus 1. The p-value is much smaller than the alpha 0.05, therefore we reject the null hypothesis.

The Kruskal-Wallis test tells us that there is evidence that there is a significant difference in the mean of discharge among the four methods.

All Code

knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
library(ggplot2)
library(MASS)#for boxcox

#build the data set
method1<-c(.34,.12,1.23,.7,1.75,.12)
method2<-c(.91,2.94,2.14,2.36,2.86,4.55)
method3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
method4<-c(17.15,11.82,10.97,17.20,14.35,16.82)

discharge_data<-c(method1,method2,method3,method4)
methods<-rep(c("Method1","Method2","Method3","Method4"),each=6)

#Plot Q-Q plot to look at normality
qqnorm(discharge_data);qqline(discharge_data)

#Build data frame for ANOVA
dat<-data.frame(
  Discharge=discharge_data,
  Methods=as.factor(methods)
)

#Perform 1-way ANOVA
results_anova<-aov(Discharge~Methods,data=dat)
summary(results_anova)

#Plot ANOVA output
plot(results_anova)
boxcox(results_anova)
#Transform the data with a square-root transformation. (lambda=0.5)
dat$Transformed_Discharge<-dat$Discharge^0.5

#run the anova with transformed data
results_anova_transformed<-aov(Transformed_Discharge~Methods,data=dat)

summary(results_anova_transformed)
plot(results_anova_transformed)
#Kruskal-Wallis test
Kruskal_Results<-kruskal.test(Discharge~Methods, data=dat)

#Print the Kruskal-Wallis test results
print(Kruskal_Results)