1 Assignment on VST and Boxcox transform

1.1 Question 1.a

  • Linear Effects Equation:
    • \(y_{i,j} = \mu + \tau_{i} + \epsilon _{i,j}\)

      Where \(\mu\): Population Mean; \(\tau\) : Treatment effect; \(\epsilon\): Error

  • Hypothesis:
    • \(H_{0}: \tau_{i} = 0\)
    • \(H_{a}: \tau_{i} \neq 0\)

1.2 Question 1.b

Importing the given data and making it clean to perform tests.

pop1<- c(.34,   .12,    1.23,   .70,    1.75,   .12)
pop2<- c(.91,   2.94,   2.14,   2.36,   2.86,   4.55)
pop3<- c(6.31,  8.37,   9.75,   6.09,   9.82,   7.24)
pop4<- c(17.15, 11.82,  10.97,  17.20,  14.35,  16.82)
dframe<- cbind(pop1,pop2,pop3,pop4)
dat<- data.frame(pop1,pop2,pop3,pop4)
library(tidyr)
dat<- pivot_longer(dat,c(pop1,pop2,pop3,pop4))
dat$factor<- (rep(1:4, each=6))

Check Normality and Variance equality among the data

#Check for normality
qqnorm(dat$value)

## We can see there is no normality in data, also no. of samples are less to confirm.

#Check for Variance
?boxplot
boxplot(dframe, xlab="population(methods)",ylab= "value", 
        main= "Boxplot of all methods or populations" )

#we can see there is no equality in variance a sthe size of each box differs
  • Comment: We can observe the no normality or equal variance was found from the above plots. ## Question 1.c Performing Kruskal-Wellace test
#Question1.c (Kruskal-wallace test)
kruskal.test(value~name, data = dat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05
## As the p value obtained is very small, which results in rejecting a null hypothesis.
  • comment: As the p value(9.771e-05) obtained is very small, which results in rejecting a null hypothesis. Significant difference can be observed.

##Question 1.d Performing Box cox transformation.

#Question 1.d (Boxcox transformation and Anova)
#install.packages("MASS")
library(MASS)
boxcox(value~name, data = dat)

# we obtain lamda value as 0.5
  • we obtained the lambda value as 0.5

Transform data using lambda value.

lambda=0.5
dat2<-dat$value^(lambda)
dat2<-cbind (dat$name,dat2)
dat2<- data.frame(dat2)
str(dat2)
## 'data.frame':    24 obs. of  2 variables:
##  $ V1  : chr  "pop1" "pop2" "pop3" "pop4" ...
##  $ dat2: chr  "0.58309518948453" "0.953939201416946" "2.51197133741609" "4.14125584816973" ...
boxcox(dat2~dat$name, data=dat2)

# lambda value"1" lies between the confidenceinterval, now data is perfect to perform ANOVA
  • comment: lambda value”1” lies between the confidenceinterval, now data is perfect to perform ANOVA
#Hypothesis testing
Hyptest<-aov(dat2~dat$name,data=dat2)
summary(Hyptest)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## dat$name     3  32.69  10.898   81.17 2.27e-11 ***
## Residuals   20   2.69   0.134                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Hyptest)

  • We can see the results of Anova, It says, that the each method significantly differs according the f value obtained from the summary.

2 Complete R Code

pop1<- c(.34,   .12,    1.23,   .70,    1.75,   .12)
pop2<- c(.91,   2.94,   2.14,   2.36,   2.86,   4.55)
pop3<- c(6.31,  8.37,   9.75,   6.09,   9.82,   7.24)
pop4<- c(17.15, 11.82,  10.97,  17.20,  14.35,  16.82)
dframe<- cbind(pop1,pop2,pop3,pop4)
dat<- data.frame(pop1,pop2,pop3,pop4)
library(tidyr)
dat<- pivot_longer(dat,c(pop1,pop2,pop3,pop4))
dat$factor<- (rep(1:4, each=6))
#Question 1.b
#Check for normality
qqnorm(dat$value)
## We can see there is no normality in data, also no. of samples are less to confirm.

#Check for Variance
?boxplot
boxplot(dframe, xlab="population(methods)",ylab= "value", 
        main= "Boxplot of all methods or populations" )
#we can see there is no equality in variance a sthe size of each box differs

#Question1.c (Kruskal-wallace test)
kruskal.test(value~name, data = dat)
## As the p value obtained is very small, which results in rejecting a null hypothesis.

#Question 1.d (Boxcox transformation and Anova)
install.packages("MASS")
library(MASS)
boxcox(value~name, data = dat)
# we obtain lamda value as 0.5

lambda=0.5
dat2<-dat$value^(lambda)
dat2<-cbind (dat$name,dat2)
dat2<- data.frame(dat2)
str(dat2)

boxcox(dat2~dat$name, data=dat2)
# lambda value"1" lies between the confidenceinterval, now data is perfect to perform ANOVA

#Hypothesis testing
Hyptest<-aov(dat2~dat$name,data=dat2)
summary(Hyptest)
plot(Hyptest)