library(tidyr)
library(MASS)

1a)

Linear effects equation:

Yi,j=μ+τi,j+ϵi,j

Where, μ =Mean, τ =effect, ϵ=error

Hypothesis Testing :

H0: μ1=μ2=μ3=μ4

Ha: Atleast one μi differs

method1 <- c(0.34,0.12,1.23,0.7,1.75,0.12)
method2 <- c(0.91,2.94,2.14,2.36,2.86,4.55)
method3 <- c(6.31,8.37,9.75,6.09,9.82,7.24)
method4 <- c(17.15,11.82,10.97,17.2,14.35,16.82)
dat <- data.frame(method1,method2,method3,method4)

dat<-pivot_longer(dat, c(method1,method2,method3,method4))
print(dat)

## # A tibble: 24 × 2
##    name    value
##    <chr>   <dbl>
##  1 method1  0.34
##  2 method2  0.91
##  3 method3  6.31
##  4 method4 17.2 
##  5 method1  0.12
##  6 method2  2.94
##  7 method3  8.37
##  8 method4 11.8 
##  9 method1  1.23
## 10 method2  2.14
## # … with 14 more rows

str(dat)

## tibble [24 × 2] (S3: tbl_df/tbl/data.frame)
##  $ name : chr [1:24] "method1" "method2" "method3" "method4" ...
##  $ value: num [1:24] 0.34 0.91 6.31 17.15 0.12 ...

Performing anova:

model <- aov(value~name,data=dat)
summary(model)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## name         3  708.7   236.2   76.29  4e-11 ***
## Residuals   20   61.9     3.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

1b)

From the plots we can see that the number of samples is less for the assumption of Normality. From the Residual vs Fitted plot we can see that the data points don’t fall on a straight line so we can say that the variance is not constant.

We can further confirm the judgment by using the Boxplot :

observations <- c(.34,.12,1.23,.7,1.75,.12,
                  .91, 2.94,2.14,2.36,2.86,4.55,
                  6.31,8.37,9.75,6.09,9.82,7.24,
                  17.15,11.82,10.97,17.20,14.35,16.82)
estMtd <- as.factor(c(rep(1,6),rep(2,6),rep(3,6),rep(4,6)))
boxplot(observations~estMtd, main="Boxplot of Observations vs Estimating Method")

From the Boxplot above we can confirm strongly that the variance is not constant.

1c)

Since our variance appear not to be stable, we are using non-parametric test.

kruskal.test(value~name, data = dat)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

From the Kruskal test we got the p value as 9.771 e-5 which is very small than alpha = 0.05. So, we reject null hypothesis.

1d)

By using Boxcox we are transforming the data to test the hypothesis for the transformed data.

boxcox(value~name, data=dat)

From the plot Lambda appear to be around 0.6.

lambda = 0.6
trans<-dat$value^(lambda)
dat2<-cbind(dat$name, trans)
dat2<-as.data.frame(dat2)
print(dat2)

##         V1             trans
## 1  method1 0.523464639736905
## 2  method2 0.944984827018149
## 3  method3   3.0200742160381
## 4  method4  5.50248002457842
## 5  method1 0.280226206470681
## 6  method2  1.90989019105263
## 7  method3  3.57795830085097
## 8  method4  4.40119373345773
## 9  method1  1.13225192276187
## 10 method2  1.57851336743286
## 11 method3  3.92105351297984
## 12 method4  4.20846748905397
## 13 method1 0.807344375447297
## 14 method2  1.67396819322283
## 15 method3  2.95644889069996
## 16 method4   5.5120997492531
## 17 method1  1.39901647622902
## 18 method2  1.87853642234007
## 19 method3  3.93792003264493
## 20 method4  4.94437202610694
## 21 method1 0.280226206470681
## 22 method2  2.48202892290285
## 23 method3  3.27976811692847
## 24 method4  5.43870615120193

boxcox(trans~dat$name, data=dat2)

From the plot we can see that the lambda is near to 1 for the transformed data.

Boxplot for the transformed data:

dat3<-observations^(lambda)

boxplot(dat3~estMtd,xlab="Method Type",ylab="Flood Flow Frequency",main="Boxplot of Observations")

From the boxplot we can see that the variance appear to be constant when compared to that of non-transformed data.(Not exactly constant).

Testing Hypothesis :

model<-aov(trans~dat$name,data=dat2)
summary(model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## dat$name     3  63.71  21.236   85.76 1.36e-11 ***
## Residuals   20   4.95   0.248                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We got the p value as 1.36e-11 which is very small than alpha = 0.05. So, we reject null hypothesis and conclude that estimation method has a significant effect on mean flood flow frequency.

plot(model)

If we look at our residual plots, we can see that after transformation variance has become much more stable.

Flipped Assignment 10 Group 3

Dowthyaksai Lagadapati, Armina Rahman Mim, Rahul Vithalani

2023-10-05

1a)

1b)

1d)