We would like to examine the effect of disp on mpg for types of gear from the mtcars data.Here the response variable is “mpg”, idependent variable is “disp” and the factor is “gear”.

attach(mtcars)

Lets create dummy variables for “gear”. Number of dummy variables is (n-1) where n is the number of levels. As number of levels for gear is 3, we can create two dummy variables. First add two columns “one” and “two” holding zeros as dummy variables to mtcars data. Then conditionally change values in the following way:

For “gear” == 3, “one” == 1 & “two” == 0

For “gear” == 4, “one” == 0 & “two” == 1

For “gear” == 5, “one” == 0 & “two” == 0

Lets see how dummy variables would look like for “gear”. As R uses difference from the first level as shown below, we can flip the order of levels in “gear”. That is instead of 3 its 5, and instead of 5 its 3.

contrasts(as.factor(gear))
##   4 5
## 3 0 0
## 4 1 0
## 5 0 1

Data Preparation

mtcars <- within(mtcars, c(z1 <- rep(0, dim(mtcars)[1]), z2 <- rep(0, dim(mtcars)[1])))

mtcars[,"z1"] <- ifelse(gear == 3,1,0) 

mtcars[,"z2"] <- ifelse(gear == 4,1,0) 

mtcars_data <- mtcars[,c("mpg","disp","gear","z1","z2")] 

head(mtcars_data, n = 10) # print first 10 obs
##                    mpg  disp gear z1 z2
## Mazda RX4         21.0 160.0    4  0  1
## Mazda RX4 Wag     21.0 160.0    4  0  1
## Datsun 710        22.8 108.0    4  0  1
## Hornet 4 Drive    21.4 258.0    3  1  0
## Hornet Sportabout 18.7 360.0    3  1  0
## Valiant           18.1 225.0    3  1  0
## Duster 360        14.3 360.0    3  1  0
## Merc 240D         24.4 146.7    4  0  1
## Merc 230          22.8 140.8    4  0  1
## Merc 280          19.2 167.6    4  0  1

Model fitting

mtcars.fit.model <- lm(mpg ~ disp + z1 + z2, data =mtcars_data)

print(mtcars.fit.model)
## 
## Call:
## lm(formula = mpg ~ disp + z1 + z2, data = mtcars_data)
## 
## Coefficients:
## (Intercept)         disp           z1           z2  
##    29.63589     -0.04077     -0.22471     -0.08669

A more straightforward way to achieve the above is by directly fitting “gear” in the model. As class of gear is number, convert it to factor.

mtcars[,"gear"] <- as.factor(mtcars$gear) # Convert class of gear from num to factor

mtcars.fit <- lm(mpg ~ disp + gear, data =mtcars)

print(mtcars.fit)
## 
## Call:
## lm(formula = mpg ~ disp + gear, data = mtcars)
## 
## Coefficients:
## (Intercept)         disp        gear4        gear5  
##    29.41118     -0.04077      0.13802      0.22471

Regression equation when fitted with dummy variable is different from the regression equation when fitted with the factor itself. However, the ANOVA ouputs are identical.

anova(mtcars.fit.model) # model fit with dummy variables
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## disp       1 808.89  808.89 71.4449 3.426e-09 ***
## z1         1   0.12    0.12  0.0109    0.9177    
## z2         1   0.02    0.02  0.0021    0.9637    
## Residuals 28 317.01   11.32                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(mtcars.fit) # model fit without dummy variables
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## disp       1 808.89  808.89 71.4449 3.426e-09 ***
## gear       2   0.15    0.07  0.0065    0.9935    
## Residuals 28 317.01   11.32                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression equation: mpg = 29.63589 - 0.04077(disp) - 0.22471(z1) - 0.08669(z2)

Average mpg for “gear” ==3 : mpg = 29.63589 - 0.04077(160) - 0.22471(1) - 0.08669(0)

Average mpg for “gear” ==4 : mpg = 29.63589 - 0.04077(160) - 0.22471(0) - 0.08669(1)

Average mpg for “gear” ==5 : mpg = 29.63589 - 0.04077(160) - 0.22471(0) - 0.08669(0)

avg_mpg_gear3 <- 29.63589 - 0.04077*(160) - 0.22471*(1) - 0.08669*(0)

avg_mpg_gear4 <- 29.63589 - 0.04077*(160) - 0.22471*(0) - 0.08669*(1)

avg_mpg_gear5 <- 29.63589 - 0.04077*(160) - 0.22471*(0) - 0.08669*(0)

cat("Gear3:",avg_mpg_gear3,"Gear4:",avg_mpg_gear4,"Gear5:", avg_mpg_gear5, sep = "\n")
## Gear3:
## 22.88798
## Gear4:
## 23.026
## Gear5:
## 23.11269

Inference

The effect for the reference (baseline) category is constrained to be null(here its gear 5), and the estimates from the other dummy variables ( which are gear 3 and gear 4) are relative effects, that is the effect of moving from the reference category to the target one(i.e from gear 5 to gear 3).

Null hypothesis: H1 = H2 = H3

Alternate hypotheis: Atleast one is not equal

From the ANOVA table for the model fit using dummy variables, we can conclude that there is statistically no significant difference in mean mpg between gear 5 and gear 3 and between gear 5 and gear 4 at disp 160 as the p values are more then 0.05.

Conclusion

At disp 160 there is no difference in the mpg between different gears.