mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$gear <- as.factor(mtcars$gear)
# Verify that their data type has been changed
class(mtcars$cyl)## [1] "factor"
## [1] "factor"
## [1] "factor"
## [1] "factor"
H0: The mean weight of the cars for different cylinders (cyl=4,6,8) is not significantly different
H1: The mean weight of the cars for different cylinders (cyl=4,6,8) is significantly different
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 2 18.18 9.088 22.91 1.07e-06 ***
## Residuals 29 11.50 0.397
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value of the test is 1.22e-07, which is less than the significance level alpha = 0.05. We can REJECT the null hypothesis,and conclude that mean weight of the cars for different cylinders (cyl=4,6,8) is significantly different
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = wt ~ cyl, data = mtcars)
##
## $cyl
## diff lwr upr p adj
## 6-4 0.8314156 0.07939155 1.583440 0.0278777
## 8-4 1.7134870 1.08680032 2.340174 0.0000006
## 8-6 0.8820714 0.16206323 1.602080 0.0138630
The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.
## Loading required package: carData
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.4995 0.612
## 29
From the output above we can see that the p-value is more than the significance level of 0.05. This means that there is no evidence to suggest that the variance in weights of the cars across three type of cylinder cars are statistically significantly different. Therefore, we can assume the homogeneity of variance in weights across three type of cylinder cars.
An alternative procedure (i.e. Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().
##
## One-way analysis of means (not assuming equal variances)
##
## data: wt and cyl
## F = 20.249, num df = 2.000, denom df = 18.974, p-value = 1.963e-05
# extract the residuals
aov_residuals <- residuals(object = AnovaOneWay )
# run Shapiro-Wilk test
shapiro.test(x = aov_residuals )##
## Shapiro-Wilk normality test
##
## data: aov_residuals
## W = 0.9025, p-value = 0.007175
Note that, a non-parametric alternative to one-way ANOVA is Kruskal-Wallis rank sum test, which can be used when ANNOVA assumptions are not met.
##
## Kruskal-Wallis rank sum test
##
## data: wt by cyl
## Kruskal-Wallis chi-squared = 22.807, df = 2, p-value = 1.116e-05
H0: The mean weight of the cars for different cylinders (cyl=4,6,8) & Transmission Type (am = 0,1) is not significantly different
H1: The mean weight of the cars for different cylinders (cyl=4,6,8) & Transmission Type (am = 0,1) is significantly different
## Df Sum Sq Mean Sq F value Pr(>F)
## am 1 14.232 14.232 48.49 1.43e-07 ***
## cyl 2 7.228 3.614 12.31 0.000146 ***
## Residuals 28 8.219 0.294
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value of the test is 0.000146, which is less than the significance level alpha = 0.05. We can reject the null hypothesis,and conclude that mean weight of the cars for different cylinders (cyl=4,6,8) & Transmissions (am = 0,1) is significantly different
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = wt ~ am + cyl, data = mtcars)
##
## $am
## diff lwr upr p adj
## 1-0 -1.357895 -1.757343 -0.9584467 1e-07
##
## $cyl
## diff lwr upr p adj
## 6-4 0.4258107 -0.2223305 1.073952 0.2518163
## 8-4 0.9199122 0.3797945 1.460030 0.0006710
## 8-6 0.4941015 -0.1264464 1.114649 0.1383557