VST and Kruskal-Wallace Assignment:
| Estimation Method | ||||||
|---|---|---|---|---|---|---|
| 1 | 0.34 | 0.12 | 1.23 | 0.70 | 1.75 | 0.12 |
| 2 | 0.91 | 2.94 | 2.14 | 2.36 | 2.86 | 4.55 |
| 3 | 6.31 | 8.37 | 9.75 | 6.09 | 9.82 | 7.24 |
| 4 | 17.15 | 11.82 | 10.97 | 17.20 | 14.35 | 16.82 |
a)Write the linear effects equation and the hypothesis you are testing
b)Does it appear the data is normally distributed? Does it appear that the variance is constant?
c)(nonparametric) Perform a Kruskal-Wallace test in R (a=0.05)
d)(parametric) Select an appropriate transformation using Box Cox, transform the data and test hypothesis in R (a=0.05)
PART A:
Linear Effects Equation:
\[ Y_{i,j}=\mu +\tau _{i,j}+\epsilon _{i,j} \]
Where, \(\mu\) =Mean, \(\tau\) =effect, \(\epsilon\)=error
Stating the Hypothesis:
\[ H_O: \tau_{i}=0 \]
\[ H_a: \tau_{i}\neq0 \]
PART B:
We will run ANOVA Model so that we can look at residuals and predict on variance and normality:
method1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
method2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
method3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
method4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
dat<-data.frame(method1,method2,method3,method4)
Setting up data in TIDYR Format:
library(tidyr)
dat<-pivot_longer(dat, c(method1,method2,method3,method4))
print(dat)
## # A tibble: 24 × 2
## name value
## <chr> <dbl>
## 1 method1 0.34
## 2 method2 0.91
## 3 method3 6.31
## 4 method4 17.2
## 5 method1 0.12
## 6 method2 2.94
## 7 method3 8.37
## 8 method4 11.8
## 9 method1 1.23
## 10 method2 2.14
## # … with 14 more rows
str(dat)
## tibble [24 × 2] (S3: tbl_df/tbl/data.frame)
## $ name : chr [1:24] "method1" "method2" "method3" "method4" ...
## $ value: num [1:24] 0.34 0.91 6.31 17.15 0.12 ...
Performing Anova Analysis:
aov.model<-aov(value~name,data=dat)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 708.7 236.2 76.29 4e-11 ***
## Residuals 20 61.9 3.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)
Comment:
---> Data points are too less to make a reasonable test of normality. For assessing constant variance, we can utilize “Residuals vs Fitted” plot, which shows that variance is also not stable..
Also, using Box Plot to check for variance more visually:
observations <- c(.34,.12,1.23,.7,1.75,.12,
.91, 2.94,2.14,2.36,2.86,4.55,
6.31,8.37,9.75,6.09,9.82,7.24,
17.15,11.82,10.97,17.20,14.35,16.82)
estMethod <- as.factor(c(1,1,1,1,1,1,
2,2,2,2,2,2,
3,3,3,3,3,3,
4,4,4,4,4,4))
boxplot(observations~estMethod, main="Boxplot of Observations vs Estimating Method")
Comment:
---> Plots and the Box Plot above shows that variance of the data is not same.
PART C:
Since our varaince is not stable among different method types so we are using non-parametric test.
kruskal.test(value~name, data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
Conclusion:
---> Our p-value is very small (9.771e-05), so we would reject null hypothesis at \(\alpha\)=0.05 level of significance.
PART D:
Transforming the data using box cox, so we can test our hypothesis through transformed data:
library(MASS)
boxcox(value~name, data=dat)
Based on the results of the Box-Cox analysis Using Lambda=0.5
lambda = 0.5
dat2<-dat$value^(lambda)
dat2<-cbind(dat$name, dat2)
dat2<-as.data.frame(dat2)
print(dat2)
## V1 dat2
## 1 method1 0.58309518948453
## 2 method2 0.953939201416946
## 3 method3 2.51197133741609
## 4 method4 4.14125584816973
## 5 method1 0.346410161513775
## 6 method2 1.71464281994822
## 7 method3 2.89309522829789
## 8 method4 3.43802268753422
## 9 method1 1.10905365064094
## 10 method2 1.46287388383278
## 11 method3 3.1224989991992
## 12 method4 3.31209903233584
## 13 method1 0.836660026534076
## 14 method2 1.53622914957372
## 15 method3 2.46779253585061
## 16 method4 4.14728827066554
## 17 method1 1.3228756555323
## 18 method2 1.69115345252878
## 19 method3 3.13368792319848
## 20 method4 3.78813938497516
## 21 method1 0.346410161513775
## 22 method2 2.13307290077015
## 23 method3 2.69072480941474
## 24 method4 4.10121933088198
str(dat2)
## 'data.frame': 24 obs. of 2 variables:
## $ V1 : chr "method1" "method2" "method3" "method4" ...
## $ dat2: chr "0.58309518948453" "0.953939201416946" "2.51197133741609" "4.14125584816973" ...
boxcox(dat2~dat$name, data=dat2)
---> We can see that the value of lamda is near to 1 after performing boxcox transformation
dat3<-observations^(lambda)
boxplot(dat3~estMethod,xlab="Method Type",ylab="Flood Flow Frequency",main="Boxplot of Observations")
---> Our variances now look more constant (NOT PERFECT) as the range on the y axis became much smaller.
Testing Hypothesis:
model<-aov(dat2~dat$name,data=dat2)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## dat$name 3 32.69 10.898 81.17 2.27e-11 ***
## Residuals 20 2.69 0.134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model)
CONCLUSION:
---> Anova analysis of the transformed data gives us a p-value, that is too small (2.27e-11) so at 0.05 level of significance we would reject null hypothesis and thus we would conclude that estimation method has a significant effect on mean flood flow frequency.
Also if we look at our residual plots, we can see that after transformation variance among different estimation methods has become much more stable.
getwd()
#Question 1B:
#We will run aov so that we can look at residuals and predict on variance and normality
method1<-c(0.34,0.12,1.23,0.70,1.75,0.12)
method2<-c(0.91,2.94,2.14,2.36,2.86,4.55)
method3<-c(6.31,8.37,9.75,6.09,9.82,7.24)
method4<-c(17.15,11.82,10.97,17.20,14.35,16.82)
dat<-data.frame(method1,method2,method3,method4)
library(tidyr)
dat<-pivot_longer(dat, c(method1,method2,method3,method4))
print(dat)
str(dat)
#ANOVA ANALYSIS:
aov.model<-aov(value~name,data=dat)
summary(aov.model)
plot(aov.model)
#Data points in this case are too less to make a reasonable test of normality. For assessing constant variance, we can utilize “Residuals vs Fitted” plot, which shows that variance is not stable among different methods.
observations <- c(.34,.12,1.23,.7,1.75,.12,
.91, 2.94,2.14,2.36,2.86,4.55,
6.31,8.37,9.75,6.09,9.82,7.24,
17.15,11.82,10.97,17.20,14.35,16.82)
estMethod <- as.factor(c(1,1,1,1,1,1,
2,2,2,2,2,2,
3,3,3,3,3,3,
4,4,4,4,4,4))
boxplot(observations~estMethod, main="Boxplot of Observations vs Estimating Method")
#plots show concerns about the variance of the data
#PART C:
#Since our varaince is not stable among different method types so we are using non-parametric test.
kruskal.test(value~name, data = dat)
#Since our p-value is very small, so we would reject null hypothesis at a reasonable level of significance.
#PART D:
#Transforming the data using box cox, so we can test our hypothesis through transformed data:
library(MASS)
boxcox(value~name, data=dat)
#Based on the results of the Box-Cox analysis Lambda=0.5
lambda = 0.5
dat2<-dat$value^(lambda)
dat2<-cbind(dat$name, dat2)
dat2<-as.data.frame(dat2)
print(dat2)
str(dat2)
boxcox(dat2~dat$name, data=dat2)
#We can see that the value of lamda is near to 1 after performing boxcox transformation
dat3<-observations^(lambda)
boxplot(dat3~estMethod,xlab="Method Type",ylab="Flood Flow Frequency",main="Boxplot of Observations")
#Our variances then looked more constant as the range on the y axis became much smaller.
#Testing Hypothesis:
model<-aov(dat2~dat$name,data=dat2)
summary(model)
plot(model)