Reading the data

Convert a numeric or character column into a factor variable, using as.factor()

## [1] "factor"
## [1] "factor"
## [1] "factor"
## [1] "factor"

Comparing Means (ANOVA)

One Way ANOVA

To compare whether the mean weight of the cars for different cylinders (cyl=4,6,8) is significantly different or not.

H0: The mean weight of the cars for different cylinders (cyl=4,6,8) is not significantly different

H1: The mean weight of the cars for different cylinders (cyl=4,6,8) is significantly different

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cyl          2  18.18   9.088   22.91 1.07e-06 ***
## Residuals   29  11.50   0.397                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value of the test is 1.22e-07, which is less than the significance level alpha = 0.05. We can REJECT the null hypothesis,and conclude that mean weight of the cars for different cylinders (cyl=4,6,8) is significantly different

Pairwise comparision using Tucky’s Post-Hoc test

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = wt ~ cyl, data = mtcars)
## 
## $cyl
##          diff        lwr      upr     p adj
## 6-4 0.8314156 0.07939155 1.583440 0.0278777
## 8-4 1.7134870 1.08680032 2.340174 0.0000006
## 8-6 0.8820714 0.16206323 1.602080 0.0138630

Check ANOVA Assumptions

The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.

Check The Homogeneity Of Variance Assumption

Levene’s Test for Homogeneity of Variance

## Loading required package: carData
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.4995  0.612
##       29

From the output above we can see that the p-value is more than the significance level of 0.05. This means that there is no evidence to suggest that the variance in weights of the cars across three type of cylinder cars are statistically significantly different. Therefore, we can assume the homogeneity of variance in weights across three type of cylinder cars.

Relaxing the homogeneity of variance assumption

An alternative procedure (i.e. Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  wt and cyl
## F = 20.249, num df = 2.000, denom df = 18.974, p-value = 1.963e-05

Check the Normality Assumption

Normality plot of residuals

Shapiro-Wilk Test

## 
##  Shapiro-Wilk normality test
## 
## data:  aov_residuals
## W = 0.9025, p-value = 0.007175

Non-Parametric Test Alternative To One-Way ANOVA Test

Note that, a non-parametric alternative to one-way ANOVA is Kruskal-Wallis rank sum test, which can be used when ANNOVA assumptions are not met.

## 
##  Kruskal-Wallis rank sum test
## 
## data:  wt by cyl
## Kruskal-Wallis chi-squared = 22.807, df = 2, p-value = 1.116e-05

Two Way ANOVA

To compare whether the mean weight of the cars for different cylinders (cyl=4,6,8) & Transmission Type (am = 0,1) is significantly different or not.

H0: The mean weight of the cars for different cylinders (cyl=4,6,8) & Transmission Type (am = 0,1) is not significantly different

H1: The mean weight of the cars for different cylinders (cyl=4,6,8) & Transmission Type (am = 0,1) is significantly different

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## am           1 14.232  14.232   48.49 1.43e-07 ***
## cyl          2  7.228   3.614   12.31 0.000146 ***
## Residuals   28  8.219   0.294                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value of the test is 0.000146, which is less than the significance level alpha = 0.05. We can reject the null hypothesis,and conclude that mean weight of the cars for different cylinders (cyl=4,6,8) & Transmissions (am = 0,1) is significantly different

Pairwise comparision using Tucky’s Post-Hoc test

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = wt ~ am + cyl, data = mtcars)
## 
## $am
##          diff       lwr        upr p adj
## 1-0 -1.357895 -1.757343 -0.9584467 1e-07
## 
## $cyl
##          diff        lwr      upr     p adj
## 6-4 0.4258107 -0.2223305 1.073952 0.2518163
## 8-4 0.9199122  0.3797945 1.460030 0.0006710
## 8-6 0.4941015 -0.1264464 1.114649 0.1383557