If Engine Displacement relative Highway and City Miles Per Gallon (MPG)

The “MPG” data set is a subset of the fuel economy data file that the Environment Protection(EPA) makes available on http://fueleconomy.gov. The data set contains fuel economy data from 1999 and 2008 for the 38 popular models cars, with 11 variables and 234 observations.

The 11 variables in the data are:

Histograms

library(ggplot2)
data(mpg)
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...
ggplot(data = mpg, aes(x = cty)) + geom_histogram( fill = "blue") 

ggplot(data = mpg, aes(x = hwy)) + geom_histogram( fill = "red") 

From the above histograms, we can see that “HWY” has a roughly normal distribution but a few observations “outliers” lie far from the other observations in the graph. The “CTY” has a positively right skewed distribution with few observations “outliers” being further apart from the rest.

Linear Models

highway <- lm(hwy ~ displ, data = mpg)
summary(highway)
## 
## Call:
## lm(formula = hwy ~ displ, data = mpg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1039 -2.1646 -0.2242  2.0589 15.0105 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.6977     0.7204   49.55   <2e-16 ***
## displ        -3.5306     0.1945  -18.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.836 on 232 degrees of freedom
## Multiple R-squared:  0.5868, Adjusted R-squared:  0.585 
## F-statistic: 329.5 on 1 and 232 DF,  p-value: < 2.2e-16
ggplot(mpg, aes(displ,hwy)) + 
  geom_point() + 
  geom_smooth(method = "lm")

City <- lm(cty ~ displ, data = mpg)
summary(highway)
## 
## Call:
## lm(formula = hwy ~ displ, data = mpg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1039 -2.1646 -0.2242  2.0589 15.0105 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.6977     0.7204   49.55   <2e-16 ***
## displ        -3.5306     0.1945  -18.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.836 on 232 degrees of freedom
## Multiple R-squared:  0.5868, Adjusted R-squared:  0.585 
## F-statistic: 329.5 on 1 and 232 DF,  p-value: < 2.2e-16
ggplot(mpg, aes(displ,cty)) + 
  geom_point() + 
  geom_smooth(method = "lm")

library(texreg)
screenreg(list(highway, City))
## 
## ===================================
##              Model 1     Model 2   
## -----------------------------------
## (Intercept)   35.70 ***   25.99 ***
##               (0.72)      (0.48)   
## displ         -3.53 ***   -2.63 ***
##               (0.19)      (0.13)   
## -----------------------------------
## R^2            0.59        0.64    
## Adj. R^2       0.59        0.64    
## Num. obs.    234         234       
## RMSE           3.84        2.57    
## ===================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

For every 1 liter increase in engine displacement, Highway MPG decreases by -3.53 gallons. Therefore, we can see that displacement and Highway MPG have a negative correlation from the graph and the linear coefficients even though both the variables are highly significant.

For every 1 liter increase in engine displacement, City MPG decreases by -2.63 gallons. Therefore, we can see that displacement and City MPG have a negative correlation from the graph and the linear coefficients even though both the variables are highly significant.

Interaction between Drv and Displacement

Drive<- lm (hwy ~ drv*displ, data = mpg)
summary(Drive)
## 
## Call:
## lm(formula = hwy ~ drv * displ, data = mpg)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.489 -1.895 -0.191  1.797 13.467 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  30.6831     1.0961  27.994  < 2e-16 ***
## drvf          6.6950     1.5670   4.272 2.84e-05 ***
## drvr         -4.9034     4.1821  -1.172   0.2422    
## displ        -2.8785     0.2638 -10.913  < 2e-16 ***
## drvf:displ   -0.7243     0.4979  -1.455   0.1471    
## drvr:displ    1.9550     0.8148   2.400   0.0172 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.034 on 228 degrees of freedom
## Multiple R-squared:  0.746,  Adjusted R-squared:  0.7405 
## F-statistic:   134 on 5 and 228 DF,  p-value: < 2.2e-16

There is an interaction effect between Drv and Displacement because the t-value is significat 0.0172 for the Rear Wheel. With having a Rear Wheel Drive Car with an incraese of 1 liter in Displacement, MPG will decrease by (0.9235) as compared to Front Wheel Drive.