DoE Assignment 10

a. Linear effect equation and hypothesis:-

The linear effect equation is:

\[ Y_{i,j} = \mu + \tau_{i} + \epsilon_{i,j} \]

Where \(\mu\) = mean \(\tau\) = effect \(\epsilon\) = error

The hypothesis we are testing is :

Null hypothesis:

Mean peak discharge for all four methods of estimating flood flow frequency are the same.

That is: \(H_0: \mu_{1} = \mu_{2} =\mu_{3} = \mu_{4}\)

Alternate hypothesis :

At least one of the mean peak discharge is different.

That is: H_a : At least one \(\mu_{i}\) differs.

# Laoding the data in dataframe

m1 <- c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12)
m2 <- c(0.91, 2.94, 2.14, 2.36, 2.86, 4.55)
m3 <- c(6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
m4 <- c(17.15, 11.82, 10.97, 17.20, 14.35, 16.82)

dat <- data.frame(m1, m2, m3, m4)

library(tidyr)

## Warning: package 'tidyr' was built under R version 4.4.3

tidy_dat<-pivot_longer(dat,c(m1,m2,m3,m4))
colnames(tidy_dat) <- c("methods", "freq")

b. Checking for normality and constant variance

Comment:-

We do not have enough data to check for normality for each population

x <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
y <-c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12,
             0.91, 2.94, 2.14, 2.36, 2.86, 4.55,
             6.31, 8.37, 9.75, 6.09, 9.82, 7.24,
             17.15, 11.82, 10.97, 17.20, 14.35, 16.82)

dat2 <- data.frame(x,y)

boxplot(y~x, xlab = "methods", ylab="frequency of flow")

From the boxplots we can see that the variances aren’t the same(constant).

c. Fitting the model using AOV()

aov_model <- aov(y~x, data = dat2)
summary(aov_model)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## x            1  672.0   672.0   149.9 2.7e-11 ***
## Residuals   22   98.6     4.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comment:-

The mean squared value of the residuals is 4.5. Observing the boxplots, we can say the constant variance assumption is not valid as there is varying spread for each population as indicated by interquartile range in the boxplots.

plot(aov_model)

Comment:-

Looking at the residual vs fitted plot for anova, we notice the points in a funnel like shape which indicates variances of the error term is not constant. Due to this, use of anova is inappropriate to test the hypothesis.

Hence we move on to performing the boxcox transformation of the responses to stabilize the variances as the distribution of the populations are not known.

d. Power Transformation (Box- Cox)

library(MASS)
boxcox(y~x, data=dat2)

From the log-likelihood against lambda plot, it is sensible to take lambda as 0.5 as it falls within the 95% confidence interval range.

e. ANOVA after BOXCOX

transformed_y <- ((sqrt(y)-1)/0.5)
cbind(x, transformed_y)

##       x transformed_y
##  [1,] 1    -0.8338096
##  [2,] 1    -1.3071797
##  [3,] 1     0.2181073
##  [4,] 1    -0.3266799
##  [5,] 1     0.6457513
##  [6,] 1    -1.3071797
##  [7,] 2    -0.0921216
##  [8,] 2     1.4292856
##  [9,] 2     0.9257478
## [10,] 2     1.0724583
## [11,] 2     1.3823069
## [12,] 2     2.2661458
## [13,] 3     3.0239427
## [14,] 3     3.7861905
## [15,] 3     4.2449980
## [16,] 3     2.9355851
## [17,] 3     4.2673758
## [18,] 3     3.3814496
## [19,] 4     6.2825117
## [20,] 4     4.8760454
## [21,] 4     4.6241981
## [22,] 4     6.2945765
## [23,] 4     5.5762788
## [24,] 4     6.2024387

dat3 <- data.frame(x, transformed_y)
dat3

##    x transformed_y
## 1  1    -0.8338096
## 2  1    -1.3071797
## 3  1     0.2181073
## 4  1    -0.3266799
## 5  1     0.6457513
## 6  1    -1.3071797
## 7  2    -0.0921216
## 8  2     1.4292856
## 9  2     0.9257478
## 10 2     1.0724583
## 11 2     1.3823069
## 12 2     2.2661458
## 13 3     3.0239427
## 14 3     3.7861905
## 15 3     4.2449980
## 16 3     2.9355851
## 17 3     4.2673758
## 18 3     3.3814496
## 19 4     6.2825117
## 20 4     4.8760454
## 21 4     4.6241981
## 22 4     6.2945765
## 23 4     5.5762788
## 24 4     6.2024387

boxplot(transformed_y~x, data=dat3)

Testing for the hypothesis after boxcox transformation.

transformed_aov_model <- aov(transformed_y~x, data = dat3)
summary(transformed_aov_model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## x            1  130.1  130.12   251.2 1.62e-13 ***
## Residuals   22   11.4    0.52                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comment:-

The mean squared value of the residuals is 0.52. Performing power transformation has reduced this value from before transformation. This means the transformation has stabilized the variance.

plot(transformed_aov_model)

Comment:-

Points appears to be evenly scattered around zero with no clear pattern. The initial funnel like spread is gone. It also indicates that the variance is stabilized.

f. Kruskal Wallace Test

kruskal.test(transformed_y~x, data = dat3)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  transformed_y by x
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Conclusion:-

P-value is less than significant level (alpha) of 0.05, we reject the null hypothesis. Which means at least one of the mean peak discharge for these different methods of estimating flood flow frequency is not equal to the other. One of the mean differs.

Complete Unevaluated Code

# Laoding the data in dataframe

m1 <- c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12)
m2 <- c(0.91, 2.94, 2.14, 2.36, 2.86, 4.55)
m3 <- c(6.31, 8.37, 9.75, 6.09, 9.82, 7.24)
m4 <- c(17.15, 11.82, 10.97, 17.20, 14.35, 16.82)

dat <- data.frame(m1, m2, m3, m4)

#Making the data tidy
library(tidyr)
tidy_dat<-pivot_longer(dat,c(m1,m2,m3,m4))
colnames(tidy_dat) <- c("methods", "freq")

# making boxplot
x <- c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))
y <-c(0.34, 0.12, 1.23, 0.70, 1.75, 0.12,
             0.91, 2.94, 2.14, 2.36, 2.86, 4.55,
             6.31, 8.37, 9.75, 6.09, 9.82, 7.24,
             17.15, 11.82, 10.97, 17.20, 14.35, 16.82)

dat2 <- data.frame(x,y)

boxplot(y~x, xlab = "methods", ylab="frequency of flow")

# Fitting the model using ANOVA
aov_model <- aov(y~x, data = dat2)
summary(aov_model)
plot(aov_model)

# Doing boxcox
library(MASS)
boxcox(y~x, data=dat2)

# ANOVA after BOXCOX
transformed_y <- ((sqrt(y)-1)/0.5)
cbind(x, transformed_y)
dat3 <- data.frame(x, transformed_y)
dat3
boxplot(transformed_y~x, data=dat3)
transformed_aov_model <- aov(transformed_y~x, data = dat3)
summary(transformed_aov_model)
plot(transformed_aov_model)

# kruskal wallas test
kruskal.test(transformed_y~x, data = dat3)

DoE Assignment 10

Bishwa, Manas, Sanvitha

2025-10-07

a. Linear effect equation and hypothesis:-

b. Checking for normality and constant variance

Comment:-

c. Fitting the model using AOV()

Comment:-

Comment:-

d. Power Transformation (Box- Cox)

e. ANOVA after BOXCOX

Comment:-

Comment:-

f. Kruskal Wallace Test

Conclusion:-

Complete Unevaluated Code