library(tidyr)
library(MASS)
Linear effects equation:
Yi,j=μ+τi,j+ϵi,j
Where, μ =Mean, τ =effect, ϵ=error
Hypothesis Testing :
H0: μ1=μ2=μ3=μ4
Ha: Atleast one μi differs
method1 <- c(0.34,0.12,1.23,0.7,1.75,0.12)
method2 <- c(0.91,2.94,2.14,2.36,2.86,4.55)
method3 <- c(6.31,8.37,9.75,6.09,9.82,7.24)
method4 <- c(17.15,11.82,10.97,17.2,14.35,16.82)
dat <- data.frame(method1,method2,method3,method4)
dat<-pivot_longer(dat, c(method1,method2,method3,method4))
print(dat)
## # A tibble: 24 × 2
## name value
## <chr> <dbl>
## 1 method1 0.34
## 2 method2 0.91
## 3 method3 6.31
## 4 method4 17.2
## 5 method1 0.12
## 6 method2 2.94
## 7 method3 8.37
## 8 method4 11.8
## 9 method1 1.23
## 10 method2 2.14
## # … with 14 more rows
str(dat)
## tibble [24 × 2] (S3: tbl_df/tbl/data.frame)
## $ name : chr [1:24] "method1" "method2" "method3" "method4" ...
## $ value: num [1:24] 0.34 0.91 6.31 17.15 0.12 ...
Performing anova:
model <- aov(value~name,data=dat)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 708.7 236.2 76.29 4e-11 ***
## Residuals 20 61.9 3.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model)
From the plots we can see that the number of samples is less for the assumption of Normality. From the Residual vs Fitted plot we can see that the data points don’t fall on a straight line so we can say that the variance is not constant.
We can further confirm the judgment by using the Boxplot :
observations <- c(.34,.12,1.23,.7,1.75,.12,
.91, 2.94,2.14,2.36,2.86,4.55,
6.31,8.37,9.75,6.09,9.82,7.24,
17.15,11.82,10.97,17.20,14.35,16.82)
estMtd <- as.factor(c(rep(1,6),rep(2,6),rep(3,6),rep(4,6)))
boxplot(observations~estMtd, main="Boxplot of Observations vs Estimating Method")
From the Boxplot above we can confirm strongly that the variance is not constant.
1c)
Since our variance appear not to be stable, we are using non-parametric test.
kruskal.test(value~name, data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
From the Kruskal test we got the p value as 9.771 e-5 which is very small than alpha = 0.05. So, we reject null hypothesis.
By using Boxcox we are transforming the data to test the hypothesis for the transformed data.
boxcox(value~name, data=dat)
From the plot Lambda appear to be around 0.6.
lambda = 0.6
trans<-dat$value^(lambda)
dat2<-cbind(dat$name, trans)
dat2<-as.data.frame(dat2)
print(dat2)
## V1 trans
## 1 method1 0.523464639736905
## 2 method2 0.944984827018149
## 3 method3 3.0200742160381
## 4 method4 5.50248002457842
## 5 method1 0.280226206470681
## 6 method2 1.90989019105263
## 7 method3 3.57795830085097
## 8 method4 4.40119373345773
## 9 method1 1.13225192276187
## 10 method2 1.57851336743286
## 11 method3 3.92105351297984
## 12 method4 4.20846748905397
## 13 method1 0.807344375447297
## 14 method2 1.67396819322283
## 15 method3 2.95644889069996
## 16 method4 5.5120997492531
## 17 method1 1.39901647622902
## 18 method2 1.87853642234007
## 19 method3 3.93792003264493
## 20 method4 4.94437202610694
## 21 method1 0.280226206470681
## 22 method2 2.48202892290285
## 23 method3 3.27976811692847
## 24 method4 5.43870615120193
boxcox(trans~dat$name, data=dat2)
From the plot we can see that the lambda is near to 1 for the transformed data.
Boxplot for the transformed data:
dat3<-observations^(lambda)
boxplot(dat3~estMtd,xlab="Method Type",ylab="Flood Flow Frequency",main="Boxplot of Observations")
From the boxplot we can see that the variance appear to be constant when compared to that of non-transformed data.(Not exactly constant).
Testing Hypothesis :
model<-aov(trans~dat$name,data=dat2)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## dat$name 3 63.71 21.236 85.76 1.36e-11 ***
## Residuals 20 4.95 0.248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We got the p value as 1.36e-11 which is very small than alpha = 0.05. So, we reject null hypothesis and conclude that estimation method has a significant effect on mean flood flow frequency.
plot(model)
If we look at our residual plots, we can see that after transformation variance has become much more stable.