Let’s conduct one-way analysis of variance (ANOVA).
To perform One-way ANOVA, we have a dataset that shows weight loss for different diet plans.
So, let’s load the data first.
wtlossdata = read.table(file = "../Dataset/DietWeigthLoss.txt", header=T, sep = "\t")
attach(wtlossdata)
Let’s view the data briefly.
Let’s see the variable names present in the dataset
names(wtlossdata)
## [1] "WeightLoss" "Diet"
Let’s verify the class of these two variables :
class(WeightLoss)
## [1] "numeric"
class(Diet)
## [1] "factor"
As Diet variable is of factor type, so let’s see, what are the unique diet plans we have in our dataset
levels(Diet)
## [1] "A" "B" "C" "D"
In case we want to know a brief information about ANOVA, we can simple pass the following command to see help
help(aov)
#or,
?aov
Let’s visualize the weight loss data by different diet plans using boxplots :
boxplot(WeightLoss ~ Diet)
Let’s define our null and alternate hypothesis for one-way ANOVA
\(H_0\): Mean weight loss is same for all the diets.
\(H_A\): Mean weight loss is different for atleast one diet
ANOVATEST = aov(WeightLoss ~ Diet)
ANOVATEST
## Call:
## aov(formula = WeightLoss ~ Diet)
##
## Terms:
## Diet Residuals
## Sum of Squares 97.32983 296.98667
## Deg. of Freedom 3 56
##
## Residual standard error: 2.302897
## Estimated effects may be unbalanced
Let’s see a descriptive summary of our test :
summary(ANOVATEST)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 3 97.33 32.44 6.118 0.00113 **
## Residuals 56 296.99 5.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now, let’s see the various attributes present in the output of our test :
attributes(ANOVATEST)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "contrasts" "xlevels" "call" "terms"
## [13] "model"
##
## $class
## [1] "aov" "lm"
Now, let’s understand the the output of our hypothesis testing.
First, let’s look at the coefficients of different diets :
ANOVATEST$coefficients
## (Intercept) DietB DietC DietD
## 9.1800000 -0.2733333 2.9333333 1.3600000
Now, as the p-Value of the test is very low, so we reject the null hypothesis and believe that the means of different diets are not equal.
Now, to find out, weightloss due to which diet plan is significantly different than others, we have to pass the following command :
TukeyHSD(ANOVATEST)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = WeightLoss ~ Diet)
##
## $Diet
## diff lwr upr p adj
## B-A -0.2733333 -2.4999391 1.9532725 0.9880087
## C-A 2.9333333 0.7067275 5.1599391 0.0051336
## D-A 1.3600000 -0.8666058 3.5866058 0.3773706
## C-B 3.2066667 0.9800609 5.4332725 0.0019015
## D-B 1.6333333 -0.5932725 3.8599391 0.2224287
## D-C -1.5733333 -3.7999391 0.6532725 0.2521236
Now, in order to make our comparision easier, we can visualise the above data, by just passing a plot() function over it.
plot(TukeyHSD(ANOVATEST), las=1)
From the above plot, we can conclude that the mean weight loss due to the diets A & C and diets B & D are significantly different, as we don’t have zero(0) in between the upper and lower bounds, i.e., the means can never be equal.