library(DescTools)
library(MASS)
library(car)

Exercise 1: Analysis of Variance

The heartbpchol.csv data set contains continuous cholesterol (Cholesterol) and blood pressure status (BP_Status) (category: High/ Normal/ Optimal) for alive patients. For the heartbpchol.xlsx data set, consider a one-way ANOVA model to identify differences between group cholesterol means. The normality assumption is reasonable, so you can proceed without testing normality.

a) Exercise 1A:

Perform a one-way ANOVA for Cholesterol with BP_Status as the categorical predictor. Comment on statistical significance of BP_Status, the amount of variation described by the model, and whether or not the equal variance assumption can be trusted.

Importing data

heart <- read.csv("heartbpchol.csv")
str(heart)
## 'data.frame':    541 obs. of  2 variables:
##  $ Cholesterol: int  221 188 292 319 205 247 202 150 228 280 ...
##  $ BP_Status  : chr  "Optimal" "High" "High" "Normal" ...

Since, the normality assumptions is reasonable, we will proceed with the one way ANOVA test

One-way ANOVA (Cholesterol ~ BP_Status)

aov.heart <- aov(Cholesterol ~ BP_Status, heart)
summary(aov.heart)
##              Df  Sum Sq Mean Sq F value  Pr(>F)   
## BP_Status     2   25211   12605   6.671 0.00137 **
## Residuals   538 1016631    1890                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: Since p-value is 0.00317 (below significance level of 0.05), we have enough evidence to reject Ho. This means that at least one observation of BP Status has a significant effect on Cholesterol (at least one of them has a different mean of Cholesterol)

R-Square (Variation of response variable explained by BP_Status)

lm.res <- lm(Cholesterol ~ BP_Status, heart)
summary(lm.res)$r.squared
## [1] 0.02419833

Observation: Only 2.4% of the variation of Cholesterol can be explained by the model. This means that the remaining 97.58% is from the Sum of Squares of residuals, and the main effect of the mean of the Cholesterol, doesn’t come from the Blood Pressure, but from the Residuals.

Levene Test for Equal Variance Assumption

LeveneTest(aov.heart)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  0.1825 0.8332
##       538

Conclusion: Since the p-value is 0.8332 (above significance level of 0.05), it’s safe to say that we do not have enough evidence to reject the Ho. This means that we can assume all BP_Status groups have equal variance.

Normality Check

par(mfrow=c(1,1)) # Diagnostic plot
plot(aov.heart, 2)

Observation: From the analysis of QQPlot above, we can conclude that the variance is not much different between the groups, hence the variance equality assumption is reasonable.

b) Exercise 1B:

Comment on any significantly different cholesterol means as determined by the post-hoc test comparing all pairwise differences. Specifically explain what that tells us about differences in cholesterol levels across blood pressure status groups, like which group has the highest or lowest mean values of Cholesterol.

ScheffeTest(aov.heart)
## 
##   Posthoc multiple comparisons of means: Scheffe Test 
##     95% family-wise confidence level
## 
## $BP_Status
##                      diff    lwr.ci    upr.ci   pval    
## Normal-High    -11.543481 -21.35092 -1.736038 0.0159 *  
## Optimal-High   -18.646679 -33.46702 -3.826341 0.0089 ** 
## Optimal-Normal  -7.103198 -21.81359  7.607194 0.4958    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

  • From the Scheffe Test above, we can see that the Normal-High group and the Optimal-High group have different mean values (below significance level of 0.05) of Cholesterol, and thus we can reject Ho. We can also see that the Optimal-Normal group, has the least significant mean value of Cholesterol (above significance level of 0.05), which means that Optimal and Normal BP_Status both have equal means.

More specifically, below are some of the effects of BP_Status on Cholesterol that can be seen:

  • High > Normal (Mean value of Cholesterol for High is higher than mean Cholesterol of Normal)
  • High > Optimal (Mean value of Cholesterol for High is higher than mean Cholesterol of Optimal)
  • Optimal = Normal (mean Cholesterol of Optimal is the same as mean Cholesterol of Normal)

Exercise 2: Analysis of Variance

For this problem use the bupa.csv data set. Check UCI Machine Learning Repository for more information (http://archive.ics.uci.edu/ml/datasets/Liver+Disorders). The mean corpuscular volume and alkaline phosphate are blood tests thought to be sensitive to liver disorder related to excessive alcohol consumption. We assume that normality and independence assumptions are valid.

a) Exercise 2A:

Perform a one-way ANOVA for mcv as a function of drinkgroup. Comment on significance of the drinkgroup, the amount of variation described by the model, and whether or not the equal variance assumption can be trusted.

Importing data

bupa <- read.csv("bupa.csv")
bupa$drinkgroup <- as.factor(bupa$drinkgroup)
str(bupa)
## 'data.frame':    345 obs. of  3 variables:
##  $ mcv       : int  85 85 86 91 87 98 88 88 92 90 ...
##  $ alkphos   : int  92 64 54 78 70 55 62 67 54 60 ...
##  $ drinkgroup: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

One-way ANOVA (mcv ~ drinkgroup)

aov.mcv <- aov(mcv ~ drinkgroup, bupa)
summary(aov.mcv)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## drinkgroup    4    733  183.29   10.26 7.43e-08 ***
## Residuals   340   6073   17.86                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: Since p-value is 7.43e-08 (below significance level of 0.05), it’s safe to say that we can reject the Ho. This means that at least one observation of drinkgroup has a significance effect on mcv (at least one of them has a different mean of mcv)

R-Square (Variation of response variable explained by drinkgroup)

lm.res_mcv <- lm(mcv ~ drinkgroup, bupa)
summary(lm.res_mcv)$r.squared
## [1] 0.1077214

Observation: Only 10.78% of the variation of mcv can be explained by the drinkgroup

Levene Test for Equal Variance Assumption

LeveneTest(aov.mcv)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   4  0.3053 0.8744
##       340

Conclusion: Since the p-value is 0.8744 (above significance level of 0.05), we do not have enough evidence to reject the Ho. This means that we can assume all drinkgroups groups have equal variance.

Normality Check

par(mfrow=c(1,1)) # Diagnostic plot 
plot(aov.mcv, 2)

Observation: From the analysis of QQPlot, the variance in the groups is equal, hence the variance equality assumption is reasonable.

b) Exercise 2B:

Perform a one-way ANOVA for alkphos as a function of drinkgroup. Comment on statistical significance of the drinkgroup, the amount of variation described by the model, and whether or not the equal variance assumption can be trusted.

One-way ANOVA (alkphos ~ drinkgroup)

aov.alkphos <- aov(alkphos ~ drinkgroup, bupa)
summary(aov.alkphos)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## drinkgroup    4   4946  1236.4   3.792 0.00495 **
## Residuals   340 110858   326.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: Since p-value is 0.00495 (below significance level of 0.05), we have enough evidence to reject the Ho. This means that at least one observation of drinkgroup has a significance effect on alkphos (at least one of them has a different mean of alkphos)

R-Square (Variation of response variable explained by drinkgroup)

lm.alkphos <- lm(alkphos ~ drinkgroup, bupa)
summary(lm.alkphos)$r.squared
## [1] 0.04270721

Observation: Only 4.27% of variation alkphos can be explained by the drinkgroup.This means that the remaining 95.73% is from the Sum of Squares of residuals, and the main effect of the mean of the alkphos, doesn’t come from the drinkgroup, but from the Residuals.

Levene Test for Equal Variance Assumption

LeveneTest(aov.alkphos)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   4  0.8089 0.5201
##       340

Conclusion: Since the p-value is 0.5201 (above significance level of 0.05), we cannot reject the Ho. This means that we can assume all drinkgroup groups have equal variance.

Normality Check

par(mfrow=c(1,1)) # Diagnostic plot 
plot(aov.alkphos, 2)

Observation: From the analysis of QQPlot, we see that the variance in the groups is not that different, hence the variance equality assumption is reasonable.

c) Exercise 2C:

Perform post-hoc tests for models in a) and b). Comment on any similarities or differences you observe from their results.

mcv:

ScheffeTest(aov.mcv)
## 
##   Posthoc multiple comparisons of means: Scheffe Test 
##     95% family-wise confidence level
## 
## $drinkgroup
##             diff      lwr.ci   upr.ci    pval    
## 2-1  1.241452991 -0.94020481 3.423111  0.5410    
## 3-1  0.938131313 -0.90892674 2.785189  0.6495    
## 4-1  3.744610282  1.73913894 5.750082 1.9e-06 ***
## 5-1  3.746031746  0.64379565 6.848268  0.0081 ** 
## 3-2 -0.303321678 -2.59291786 1.986275  0.9966    
## 4-2  2.503157290  0.08395442 4.922360  0.0380 *  
## 5-2  2.504578755 -0.87987039 5.889028  0.2646    
## 4-3  2.806478969  0.68408993 4.928868  0.0025 ** 
## 5-3  2.807900433 -0.37116998 5.986971  0.1151    
## 5-4  0.001421464 -3.27222796 3.275071  1.0000    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

From the Scheffe Test above, we can see that the groups 4-1, 5-1, 4-2, and 4-3 have different mean values (below significance level of 0.05) of mcv, and thus we have enough evidence to reject the null. We can also see that groups 2-1, 3-1, 3-2, 5-2, 5-3, and 5-4 do not have different mean values (above significance level of 0.05) of mcv, and thus we do not have enough evidence to reject the null.

Below are some of the effects of drinkgroup on mcv that can be seen:

  • 4 > 1 (mean mcv of 4 is greater than mean mcv of 1)
  • 5 > 1 (mean mcv of 5 is greater than mean mcv of 1)
  • 4 > 2 (mean mcv of 4 is greater than mean mcv of 2)
  • 4 > 3 (mean mcv of 4 is greater than mean mcv of 3)

alkphos:

ScheffeTest(aov.alkphos)
## 
##   Posthoc multiple comparisons of means: Scheffe Test 
##     95% family-wise confidence level
## 
## $drinkgroup
##          diff      lwr.ci    upr.ci   pval    
## 2-1 -2.645299 -11.9663647  6.675766 0.9419    
## 3-1 -4.056138 -11.9476367  3.835360 0.6389    
## 4-1 -1.148743  -9.7170578  7.419571 0.9965    
## 5-1 12.572650  -0.6815582 25.826857 0.0734 .  
## 3-2 -1.410839 -11.1930681  8.371390 0.9953    
## 4-2  1.496556  -8.8394138 11.832525 0.9952    
## 5-2 15.217949   0.7579944 29.677903 0.0329 *  
## 4-3  2.907395  -6.1604467 11.975236 0.9117    
## 5-3 16.628788   3.0463078 30.211268 0.0069 ** 
## 5-4 13.721393  -0.2651729 27.707959 0.0578 .  
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

From the Scheffe Test above, we can see that groups 5-2 and 5-3 have significantly different mean values (below significance level of 0.05) of alkphos, and thus we have enough evidence to reject the null. The groups 5-1 and 5-4 are only a little higher than our alpha (0.05) which shows that they might be statistically significantly different as well. We can also see that groups 2-1, 3-1, 4-1, 3-2, 4-2, 4-3 and do not have significantly different mean values of alkphos (above significance level of 0.05), and thus we do not have enough evidence to reject the null.

Below are some of the effects of drinkgroup on alkphos that can be seen:

  • 5 > 3 (mean alkphos of 5 is higher than mean alkphos of 3)
  • 5 > 2 (mean alkphos of 5 is higher than mean alkphos of 2)
  • Equivalent mean values can be found in drinkgroup pairs 2-1, 3-1, 4-1, 5-1, 3-2, 4-2, 4-3, 5-4

Similarities:

Group pairs 2-1, 3-1, 3-2 and 5-4 have equal mean values, since their p-values are above the significance level of 0.05. This means that they don’t have an effect on either mcv and alkphos.

Exercise 3

The psychology department at a hypothetical university has been accused of underpaying female faculty members. The data represent salary (in thousands of dollars) for all 22 professors in the department. This problem is from Maxwell and Delaney (2004).

a) Exercise 3A:

Fit a two-way ANOVA model including sex (F, M) and rank (Assistant, Associate) the interaction term. What do the Type 1 and Type 3 sums of squares tell us about significance of effects? Is the interaction between sex and rank significant? Also comment on the variation explained by the model.

Importing data

psych <- read.csv("psych.csv")
str(psych)
## 'data.frame':    22 obs. of  3 variables:
##  $ sex   : chr  "F" "F" "F" "F" ...
##  $ rank  : chr  "Assist" "Assist" "Assist" "Assist" ...
##  $ salary: int  33 36 35 38 42 37 39 38 40 44 ...

Two-way ANOVA (Type 1)

aov.psych1 <- aov(salary ~ sex * rank, psych)
summary(aov.psych1)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## sex          1 155.15  155.15  17.007 0.000637 ***
## rank         1 169.82  169.82  18.616 0.000417 ***
## sex:rank     1   0.63    0.63   0.069 0.795101    
## Residuals   18 164.21    9.12                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-way ANOVA (Type 1)

aov.psych2 <- aov(salary ~ rank * sex, psych)
summary(aov.psych2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## rank         1 252.22  252.22  27.647 5.33e-05 ***
## sex          1  72.76   72.76   7.975   0.0112 *  
## rank:sex     1   0.63    0.63   0.069   0.7951    
## Residuals   18 164.21    9.12                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observation:

  • Since both groups have low p-values (below the significance level of 0.05), we have enough evidence to reject the null hypothesis and conclude that these groups have a significant effect on salary.

  • We can see interaction between sex and rank result has a much higher p-value (above the significance level of 0.05), which means that the interaction of sex and rank doesn’t have significant effect on salary (we do not have enough evidence to reject the null hypothesis)

Two-way ANOVA (Type 3)

Anova(aov.psych1, type = 3)
## Anova Table (Type III tests)
## 
## Response: salary
##             Sum Sq Df  F value  Pr(>F)    
## (Intercept) 8140.2  1 892.2994 < 2e-16 ***
## sex           28.0  1   3.0711 0.09671 .  
## rank          70.4  1   7.7189 0.01240 *  
## sex:rank       0.6  1   0.0695 0.79510    
## Residuals    164.2 18                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observation:

  • Since rank is the only group that has a p-value below the significance level of 0.05, we have enough evidence to reject the null hypothesis and conclude that the rank group has a significant effect on salary.

  • sex and interaction groups have p-values above the significance level of 0.05, which means we do not have enough evidence to reject Ho and conclude that these groups don’t have a significant effect on salary.

Variation Explained by the model (sex and rank)

lm.psych1 <- lm(salary ~ sex*rank, psych)
summary(lm.psych1)$r.squared
## [1] 0.6647566

Observation: 66.47% of salary variation can be explained by the model (sex and rank)

b) Exercise 3B:

Refit the model without the interaction term. Comment on the significance of effects and variation explained. Report and interpret the Type 1 and Type 3 tests of the main effects. Are the main effects of rank and sex significant?

Two-way ANOVA (Type 1)

aov.psych3 <- aov(salary ~ rank + sex, psych)
summary(aov.psych3)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## rank         1 252.22  252.22  29.071 3.34e-05 ***
## sex          1  72.76   72.76   8.386  0.00926 ** 
## Residuals   19 164.84    8.68                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-way ANOVA (Type 1)

aov.psych4 <- aov(salary ~ sex + rank, psych)
summary(aov.psych4)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## sex          1  155.2  155.15   17.88 0.000454 ***
## rank         1  169.8  169.82   19.57 0.000291 ***
## Residuals   19  164.8    8.68                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observation:

Without any interaction, we conclude that both groups have significant effect on salary since their p-values are below the significance level of 0.05. Thus, we have enough evidence to reject the null hypothesis, which means that at least one group in sex and rank have different mean values.

Two-way ANOVA (Type 3)

Anova(aov.psych3, type = 3)
## Anova Table (Type III tests)
## 
## Response: salary
##              Sum Sq Df   F value    Pr(>F)    
## (Intercept) 10227.6  1 1178.8469 < 2.2e-16 ***
## rank          169.8  1   19.5743 0.0002912 ***
## sex            72.8  1    8.3862 0.0092618 ** 
## Residuals     164.8 19                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observation:

Without any interaction, we conclude that both groups have significant effect on salary since their p-values are below the significance level of 0.05. Thus, we have enough evidence to reject the null hypothesis, which means that at least one group in sex and rank have different mean values.

Variance Explained by the Model

lm.psych3 <- lm(salary ~ rank + sex, psych)
summary(lm.psych3)$r.squared
## [1] 0.6634627

Observation: Only 66.34% of salary variation can be explained by the model (sex and rank)

c) Exercise 3C:

Obtain model diagnostics to validate your Normality assumptions.

Normality Check

par(mfrow=c(1,1)) # Diagnostic plot 
plot(aov.psych3, 2)

Observation: From the analysis of QQPlot, we can conclude that a normal distribution is visible.

d) Exercise 3D:

Choose a final model based on your results from parts (a) and (b). Comment on any significant group differences through the post-hoc test. State the differences in salary across different main effect groups and interaction (if included) between them.

Based on the result (a) and (b), we can conclude that interaction effect does NOT exist. Thus, the two-way ANOVA - Type 3 is selected to see unique contribution of each categorical variable. As every effect is adjusted for all other effects, this model is the best option.

Post-hoc Test

TukeyHSD(aov.psych4)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = salary ~ sex + rank, data = psych)
## 
## $sex
##         diff      lwr      upr     p adj
## M-F 5.333333 2.693648 7.973019 0.0004544
## 
## $rank
##                  diff      lwr      upr     p adj
## Assoc-Assist 5.377778 2.738092 8.017463 0.0004193

Conclusion:

sex - Because the p-value is below the significance level of 0.05, it’s apparent that sex has a significant effect on salary. M > F (mean salary of Male is higher than mean salary of Female)

rank - Because the p-value is below the significance level of 0.05, it’s apparent that rank also has a significant effect on salary.

Associate > Assistant (Sum of square difference = 5.3777784. Mean salary of an Associate is higher than mean salary of an Assistant)

Exercise 4

Use the cars_new.csv. See HW1 for detailed information of variables

a) Exercise 4A:

Start with a three-way main effects ANOVA and choose the best main effects ANOVA model for mpg_highway as a function of cylinders, origin, and type for the cars in this set. Comment on which terms should be kept in a model for mpg_highway and why based on Type 3 SS. For the model with just predictors you decide to keep, comment on the significant effects in the model and comment on how much variation in highway fuel efficiency the model describes.

Importing data

cars <- read.csv("cars_new.csv")
cars$cylinders <- as.factor(cars$cylinders)
str(cars)
## 'data.frame':    180 obs. of  4 variables:
##  $ type       : chr  "Sedan" "Sedan" "Sedan" "Sedan" ...
##  $ origin     : chr  "Asia" "Asia" "Asia" "Asia" ...
##  $ cylinders  : Factor w/ 2 levels "4","6": 1 1 2 2 2 2 2 2 2 2 ...
##  $ mpg_highway: int  31 29 28 24 24 24 30 29 30 28 ...

Three-way ANOVA (Type 3)

aov.cars <- aov(mpg_highway ~ cylinders + origin + type, cars)
Anova(aov.cars, type=3)
## Anova Table (Type III tests)
## 
## Response: mpg_highway
##             Sum Sq  Df   F value  Pr(>F)    
## (Intercept)  69548   1 6501.6715 < 2e-16 ***
## cylinders     1453   1  135.8499 < 2e-16 ***
## origin           1   1    0.0786 0.77948    
## type           108   1   10.1018 0.00175 ** 
## Residuals     1883 176                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observation:

  • From the ANOVA table above, we can see that cylinders and type have p-values below the significant level of 0.05. This means that we have enough evidence to reject the null hypothesis and conclude that these two groups have a significant effect on mpg_highway.

  • Additionally, at least one group has different mean values

  • Since origin has a p-value higher than than significant level of 0.05, we can say that we do not have enough evidence to reject the the null hypothesis and conclude that this group doesn’t have a significant effect on mpg_highway. Thus, we will remove origin from the model since it has insignificant effect.

Variation Explained by the Model (cylinders and type)

lm.cars <- lm(mpg_highway ~ cylinders + type, cars)
summary(lm.cars)$r.squared
## [1] 0.4572163

Observation: 45.72% of the variation of mpg_highway can be explained by the model

b) Exercise 4B:

Starting with main effects chosen in part (a), find your best ANOVA model by adding in any additional interaction terms that will significantly improve the model. For your final model, comment on the significant effects and variation explained by the model.

aov.cars2 <- aov(mpg_highway ~ cylinders + type + cylinders * type, cars)
Anova(aov.cars2, type = 3)
## Anova Table (Type III tests)
## 
## Response: mpg_highway
##                Sum Sq  Df  F value    Pr(>F)    
## (Intercept)     85471   1 8358.838 < 2.2e-16 ***
## cylinders        1558   1  152.397 < 2.2e-16 ***
## type              198   1   19.392 1.844e-05 ***
## cylinders:type     84   1    8.201  0.004696 ** 
## Residuals        1800 176                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observation:

The best model is two-way ANOVA - Type 3 with the implementation of cylinders, type and its interaction. All three of them have p-values lower than the significance level of 0.05, which means that we have enough evidence to reject the null hypothesis and conclude that these have a significant effect on mpg_highway.

Variation Explained by the Model

lm.cars2 <- lm(mpg_highway ~ cylinders + type + cylinders * type, cars)
summary(lm.cars2)$r.squared
## [1] 0.4813821

Observation: 48.13% of mpg_highway can be explained by cylinders, type, and its interaction.

Normality Check

par(mfrow=c(1,1)) # Diagnostic plot 
plot(aov.cars2, 2)

Observation: From the analysis of QQPlot above, we can conclude that a normal distribution is visible

c) Exercise 4C:

Comment on any significant group differences through the post-hoc test. What does this tell us about fuel efficiency differences across cylinders, origin, or type groups? See Hint in Exercise 3.

Post-hoc Test

TukeyHSD(aov.cars2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mpg_highway ~ cylinders + type + cylinders * type, data = cars)
## 
## $cylinders
##          diff       lwr       upr p adj
## 6-4 -5.722662 -6.664343 -4.780981     0
## 
## $type
##                   diff       lwr       upr     p adj
## Sports-Sedan -2.817931 -4.470787 -1.165075 0.0009407
## 
## $`cylinders:type`
##                         diff       lwr       upr     p adj
## 6:Sedan-4:Sedan   -6.1723315 -7.469178 -4.875485 0.0000000
## 4:Sports-4:Sedan  -5.2275641 -8.306639 -2.148489 0.0001079
## 6:Sports-4:Sedan  -6.6025641 -9.681639 -3.523489 0.0000006
## 4:Sports-6:Sedan   0.9447674 -2.120956  4.010491 0.8546517
## 6:Sports-6:Sedan  -0.4302326 -3.495956  2.635491 0.9834567
## 6:Sports-4:Sports -1.3750000 -5.521993  2.771993 0.8253946

Conclusion:

  • type: Since the p-value for type is smaller than the significance level of 0.05, we have enough evidence to reject the null hypothesis and conclude that the type has a significant effect on mpg_highway. Also, Sports < Sedan (mean mpg_highway of Sedan is higher than mean mpg_highway of Sports)

  • cylinders: Since the p-value for cylinder is smaller than the significance level 0.05, we have enough evidence to reject the null hypothesis and conclude that the cylinders has a significant effect on mpg_highway. Also, 6 < 4 (mean mpg_highway of 6 is lower than mean mpg_highway of 4)

  • cylinders and type interaction:

    • Because the p-value is below the significance level of 0.05, type groups 6:Sedan-4:Sedan, 4:Sports-4:Sedan, and 6:Sports-4:Sedan have a significant effect on mpg_highway.
    • Groups 4:Sports-6:Sedan, 6:Sports-6:Sedan, and 6:Sports-4:Sports all have p-values above the significance level of 0.05, and we conclude that they do not have significant effect on mpg_highway.
    • 6:Sedan < 4:Sedan (mean mpg_highway of 6:Sedan is smaller than mean mpg_highway of 4:Sedan)
    • 4:Sports < 4:Sedan (mean mpg_highway of 4:Sports is smaller than mean mpg_highway of 4:Sedan)
    • 6:Sports < 4:Sedan (mean mpg_highway of 6:Sports is smaller than mean mpg_highway of 4:Sedan)

We can finally conclude that:

  • Sedans have higher mpg_highway than sport cars (Sedans have better fuel efficiency).
  • 4 Cylinder cars have a higher mpg_highway than 6 cylinder cars (4 cylinder cars have better fuel efficiency).
  • 4 cylinder sedans have a higher mpg_highway than 6 cylinder sedans (4 cylinder cars have better fuel efficiency).
  • 4 cylinder sedans have a higher mpg_highway than 4 cylinder sports (4 cylinder cars have better fuel efficiency).
  • 4 cylinder sedans have a higher mpg_highway than 6 cylinder sports (4 cylinder cars have better fuel efficiency)