ANOVA

Let’s conduct one-way analysis of variance (ANOVA).

To perform One-way ANOVA, we have a dataset that shows weight loss for different diet plans.

So, let’s load the data first.

wtlossdata = read.table(file = "../Dataset/DietWeigthLoss.txt", header=T, sep = "\t")
attach(wtlossdata)

Let’s view the data briefly.

Let’s see the variable names present in the dataset

names(wtlossdata)

## [1] "WeightLoss" "Diet"

Let’s verify the class of these two variables :

class(WeightLoss)

## [1] "numeric"

class(Diet)

## [1] "factor"

As Diet variable is of factor type, so let’s see, what are the unique diet plans we have in our dataset

levels(Diet)

## [1] "A" "B" "C" "D"

In case we want to know a brief information about ANOVA, we can simple pass the following command to see help

help(aov)
#or,
?aov

Let’s visualize the weight loss data by different diet plans using boxplots :

boxplot(WeightLoss ~ Diet)

Hypothesis Testing

Let’s define our null and alternate hypothesis for one-way ANOVA

\(H_0\): Mean weight loss is same for all the diets.
\(H_A\): Mean weight loss is different for atleast one diet

ANOVATEST = aov(WeightLoss ~ Diet)
ANOVATEST

## Call:
##    aov(formula = WeightLoss ~ Diet)
## 
## Terms:
##                      Diet Residuals
## Sum of Squares   97.32983 296.98667
## Deg. of Freedom         3        56
## 
## Residual standard error: 2.302897
## Estimated effects may be unbalanced

Let’s see a descriptive summary of our test :

summary(ANOVATEST)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Diet         3  97.33   32.44   6.118 0.00113 **
## Residuals   56 296.99    5.30                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now, let’s see the various attributes present in the output of our test :

attributes(ANOVATEST)

## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"        
## 
## $class
## [1] "aov" "lm"

Now, let’s understand the the output of our hypothesis testing.

First, let’s look at the coefficients of different diets :

ANOVATEST$coefficients

## (Intercept)       DietB       DietC       DietD 
##   9.1800000  -0.2733333   2.9333333   1.3600000

Now, as the p-Value of the test is very low, so we reject the null hypothesis and believe that the means of different diets are not equal.

Now, to find out, weightloss due to which diet plan is significantly different than others, we have to pass the following command :

TukeyHSD(ANOVATEST)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = WeightLoss ~ Diet)
## 
## $Diet
##           diff        lwr       upr     p adj
## B-A -0.2733333 -2.4999391 1.9532725 0.9880087
## C-A  2.9333333  0.7067275 5.1599391 0.0051336
## D-A  1.3600000 -0.8666058 3.5866058 0.3773706
## C-B  3.2066667  0.9800609 5.4332725 0.0019015
## D-B  1.6333333 -0.5932725 3.8599391 0.2224287
## D-C -1.5733333 -3.7999391 0.6532725 0.2521236

Now, in order to make our comparision easier, we can visualise the above data, by just passing a plot() function over it.

plot(TukeyHSD(ANOVATEST), las=1)

From the above plot, we can conclude that the mean weight loss due to the diets A & C and diets B & D are significantly different, as we don’t have zero(0) in between the upper and lower bounds, i.e., the means can never be equal.