Flipped Assignment 10

VST and Kruskal-Wallace

A civil engineer is interested in determining whether four different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Each procedure is used six times on the watershed, and the resulting discharge data (in cubic feet per second) are shown below:

Method	Observations
1	.34 .12 1.23 .70 1.75 .12
2	.91 2.94 2.14 2.36 2.86 4.55
3	6.31 8.37 9.75 6.09 9.82 7.42
4	17.15 11.82 10.97 17.20 14.35 16.82

Part A

Write the linear effects equation and the hypothesis you are testing

\[Y = \mu + \tau + \epsilon\] where \[\mu = mean,\ \tau=effects, \epsilon=error\]

Hypothesis: \[H_0: \mu_1=\mu_2=\mu_3=\mu_4\] \[H_1: At\ least\ one \ \mu \neq \ to \ others \]

Part B

Does it appear the data is normally distributed? Does it appear that the variance is constant?

An aov will be run so that we can look at residuals

library(MASS)
observations <- c(.34,.12,1.23,.7,1.75,.12,
                  .91, 2.94,2.14,2.36,2.86,4.55,
                  6.31,8.37,9.75,6.09,9.82,7.24,
                  17.15,11.82,10.97,17.20,14.35,16.82)
estMethod <- as.factor(c(1,1,1,1,1,1,
                         2,2,2,2,2,2,
                         3,3,3,3,3,3,
                         4,4,4,4,4,4))
dat1 <- cbind.data.frame(observations,estMethod)
aov1 <- aov(observations~estMethod, data = dat1)
summary(aov1)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## estMethod    3  708.7   236.2   76.29  4e-11 ***
## Residuals   20   61.9     3.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# look for normality
qqnorm(aov1$residuals, main = "QQNormal Plot of Residuals")
qqline(aov1$residuals)

The data looks like it could be roughly normal. There are some concerns about tails though.

Now look at the variance.

plot(aov1$fitted.values,aov1$residuals, main="Plot of Residuals vs Fitted Values")

boxplot(dat1$observations~dat1$estMethod, main="Boxplot of Observations vs Estimating Method")

The plots show concerns about the variance of the data.

Part C

Perform a non-parametric Kruskal-Wallace test in R (alpha=0.05)

kruskal.test(observations~estMethod,data = dat1)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  observations by estMethod
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Based on the p value of 9.771e-5 we reject the null and conclude that at least one of the means is different.

Part D

Select an appropriate transformation using Box Cox, transform the data and test the hypothesis in R with alpha = 0.05

boxcox(observations~estMethod)

Based on the results of the Box-Cox analysis .5 is a reasonable value for lambda

lambda <- .5
sqrtObservations <- observations^(lambda)
dat2 <- cbind.data.frame(sqrtObservations,estMethod)

Run anova and check validity

aov2 <- aov(sqrtObservations~estMethod, data = dat2)
summary(aov2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## estMethod    3  32.69  10.898   81.17 2.27e-11 ***
## Residuals   20   2.69   0.134                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now check for normality

qqnorm(aov2$residuals, main = "QQNormal Plot of Residuals")
qqline(aov2$residuals)

The data looks like it could be roughly normal. There are concerns about tails though.

Now look at the variance.

plot(aov2$fitted.values,aov2$residuals, main = "Residules vs Fitted")

boxplot(dat2$sqrtObservations~dat2$estMethod, main = "Boxplot of Observations")

There are still concerns about the tails of the data. One population still looks like it could have an unequal variance. But overall, the variance is much improved.

Based on the p value of 2.27e-11 we would reject the null hypothesis and conclude that at least one of the means is different.

The results of the Kruskal-Wallace and the parametric tests reach the same conclusion that the null hypothesis can be rejected.

R Code

All of the R Code run is included here.


library(MASS)
observations <- c(.34,.12,1.23,.7,1.75,.12,
                  .91, 2.94,2.14,2.36,2.86,4.55,
                  6.31,8.37,9.75,6.09,9.82,7.24,
                  17.15,11.82,10.97,17.20,14.35,16.82)
estMethod <- as.factor(c(1,1,1,1,1,1,
                         2,2,2,2,2,2,
                         3,3,3,3,3,3,
                         4,4,4,4,4,4))
dat1 <- cbind.data.frame(observations,estMethod)

aov1 <- aov(observations~estMethod, data = dat1)
summary(aov1)

qqnorm(aov1$residuals)
qqline(aov1$residuals)



plot(aov1$fitted.values,aov1$residuals)
boxplot(dat1$observations~dat1$estMethod)

kruskal.test(observations~estMethod,data = dat1)

boxcox(observations~estMethod)

lambda <- .5
sqrtObservations <- observations^(lambda)
dat2 <- cbind.data.frame(sqrtObservations,estMethod)

aov2 <- aov(sqrtObservations~estMethod, data = dat2)
summary(aov2)

qqnorm(aov2$residuals)
qqline(aov2$residuals)

plot(aov2$fitted.values,aov2$residuals)
boxplot(dat2$sqrtObservations~dat2$estMethod)