The linear effect equation is:
\[ Y_{i,j} = \mu + \tau_{i} + \epsilon_{i,j} \]
Where \(\mu\) = mean \(\tau\) = effect \(\epsilon\) = error
The hypothesis we are testing is :
Null hypothesis:
Mean peak discharge for all four methods of estimating flood flow frequency are the same.
That is: \(H_0: \mu_{1} = \mu_{2} =\mu_{3} = \mu_{4}\)
Alternate hypothesis :
At least one of the mean peak discharge is different.
That is: H_a : At least one \(\mu_{i}\) differs.
# Laoding the data in dataframe
m1 <- c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12)
m2 <- c(0.91, 2.94, 2.14, 2.36, 2.86, 4.55)
m3 <- c(6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
m4 <- c(17.15, 11.82, 10.97, 17.20, 14.35, 16.82)
dat <- data.frame(m1, m2, m3, m4)
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.4.3
tidy_dat<-pivot_longer(dat,c(m1,m2,m3,m4))
colnames(tidy_dat) <- c("methods", "freq")
aov_model <- aov(y~x, data = dat2)
summary(aov_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 672.0 672.0 149.9 2.7e-11 ***
## Residuals 22 98.6 4.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The mean squared value of the residuals is 4.5. Observing the boxplots, we can say the constant variance assumption is not valid as there is varying spread for each population as indicated by interquartile range in the boxplots.
plot(aov_model)
Looking at the residual vs fitted plot for anova, we notice the points in a funnel like shape which indicates variances of the error term is not constant. Due to this, use of anova is inappropriate to test the hypothesis.
Hence we move on to performing the boxcox transformation of the responses to stabilize the variances as the distribution of the populations are not known.
library(MASS)
boxcox(y~x, data=dat2)
From the log-likelihood against lambda plot, it is sensible to take lambda as 0.5 as it falls within the 95% confidence interval range.
transformed_y <- ((sqrt(y)-1)/0.5)
cbind(x, transformed_y)
## x transformed_y
## [1,] 1 -0.8338096
## [2,] 1 -1.3071797
## [3,] 1 0.2181073
## [4,] 1 -0.3266799
## [5,] 1 0.6457513
## [6,] 1 -1.3071797
## [7,] 2 -0.0921216
## [8,] 2 1.4292856
## [9,] 2 0.9257478
## [10,] 2 1.0724583
## [11,] 2 1.3823069
## [12,] 2 2.2661458
## [13,] 3 3.0239427
## [14,] 3 3.7861905
## [15,] 3 4.2449980
## [16,] 3 2.9355851
## [17,] 3 4.2673758
## [18,] 3 3.3814496
## [19,] 4 6.2825117
## [20,] 4 4.8760454
## [21,] 4 4.6241981
## [22,] 4 6.2945765
## [23,] 4 5.5762788
## [24,] 4 6.2024387
dat3 <- data.frame(x, transformed_y)
dat3
## x transformed_y
## 1 1 -0.8338096
## 2 1 -1.3071797
## 3 1 0.2181073
## 4 1 -0.3266799
## 5 1 0.6457513
## 6 1 -1.3071797
## 7 2 -0.0921216
## 8 2 1.4292856
## 9 2 0.9257478
## 10 2 1.0724583
## 11 2 1.3823069
## 12 2 2.2661458
## 13 3 3.0239427
## 14 3 3.7861905
## 15 3 4.2449980
## 16 3 2.9355851
## 17 3 4.2673758
## 18 3 3.3814496
## 19 4 6.2825117
## 20 4 4.8760454
## 21 4 4.6241981
## 22 4 6.2945765
## 23 4 5.5762788
## 24 4 6.2024387
boxplot(transformed_y~x, data=dat3)
Testing for the hypothesis after boxcox transformation.
transformed_aov_model <- aov(transformed_y~x, data = dat3)
summary(transformed_aov_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 130.1 130.12 251.2 1.62e-13 ***
## Residuals 22 11.4 0.52
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The mean squared value of the residuals is 0.52. Performing power transformation has reduced this value from before transformation. This means the transformation has stabilized the variance.
plot(transformed_aov_model)
Points appears to be evenly scattered around zero with no clear pattern. The initial funnel like spread is gone. It also indicates that the variance is stabilized.
kruskal.test(transformed_y~x, data = dat3)
##
## Kruskal-Wallis rank sum test
##
## data: transformed_y by x
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
P-value is less than significant level (alpha) of 0.05, we reject the null hypothesis. Which means at least one of the mean peak discharge for these different methods of estimating flood flow frequency is not equal to the other. One of the mean differs.
# Laoding the data in dataframe
m1 <- c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12)
m2 <- c(0.91, 2.94, 2.14, 2.36, 2.86, 4.55)
m3 <- c(6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
m4 <- c(17.15, 11.82, 10.97, 17.20, 14.35, 16.82)
dat <- data.frame(m1, m2, m3, m4)
#Making the data tidy
library(tidyr)
tidy_dat<-pivot_longer(dat,c(m1,m2,m3,m4))
colnames(tidy_dat) <- c("methods", "freq")
# making boxplot
x <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
y <-c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12,
0.91, 2.94, 2.14, 2.36, 2.86, 4.55,
6.31, 8.37, 9.75, 6.09, 9.82, 7.24,
17.15, 11.82, 10.97, 17.20, 14.35, 16.82)
dat2 <- data.frame(x,y)
boxplot(y~x, xlab = "methods", ylab="frequency of flow")
# Fitting the model using ANOVA
aov_model <- aov(y~x, data = dat2)
summary(aov_model)
plot(aov_model)
# Doing boxcox
library(MASS)
boxcox(y~x, data=dat2)
# ANOVA after BOXCOX
transformed_y <- ((sqrt(y)-1)/0.5)
cbind(x, transformed_y)
dat3 <- data.frame(x, transformed_y)
dat3
boxplot(transformed_y~x, data=dat3)
transformed_aov_model <- aov(transformed_y~x, data = dat3)
summary(transformed_aov_model)
plot(transformed_aov_model)
# kruskal wallas test
kruskal.test(transformed_y~x, data = dat3)
Comment:-
We do not have enough data to check for normality for each population
From the boxplots we can see that the variances aren’t the same(constant).