The “MPG” data set is a subset of the fuel economy data file that the Environment Protection(EPA) makes available on http://fueleconomy.gov. The data set contains fuel economy data from 1999 and 2008 for the 38 popular models cars, with 11 variables and 234 observations.
The 11 variables in the data are:
Manufacturer
Model: Model Name
Displ: Engine Displacement, in liters
Year: Year of Manufacture
Cyl: Number of Cylinders
Trans: Type of Transmission
Drv: F = Front-Wheel Drive, R = Rear Wheel Drive, 4 = 4wd
Cty: City miles per gallon
Hwy: highway miles per gallon
Fl: fuel type
Class: “type” of car
library(ggplot2)
data(mpg)
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: chr "audi" "audi" "audi" "audi" ...
## $ model : chr "a4" "a4" "a4" "a4" ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr "f" "f" "f" "f" ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr "p" "p" "p" "p" ...
## $ class : chr "compact" "compact" "compact" "compact" ...
ggplot(data = mpg, aes(x = cty)) + geom_histogram( fill = "blue")
ggplot(data = mpg, aes(x = hwy)) + geom_histogram( fill = "red")
From the above histograms, we can see that “HWY” has a roughly normal distribution but a few observations “outliers” lie far from the other observations in the graph. The “CTY” has a positively right skewed distribution with few observations “outliers” being further apart from the rest.
highway <- lm(hwy ~ displ, data = mpg)
summary(highway)
##
## Call:
## lm(formula = hwy ~ displ, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1039 -2.1646 -0.2242 2.0589 15.0105
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.6977 0.7204 49.55 <2e-16 ***
## displ -3.5306 0.1945 -18.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.836 on 232 degrees of freedom
## Multiple R-squared: 0.5868, Adjusted R-squared: 0.585
## F-statistic: 329.5 on 1 and 232 DF, p-value: < 2.2e-16
ggplot(mpg, aes(displ,hwy)) +
geom_point() +
geom_smooth(method = "lm")
City <- lm(cty ~ displ, data = mpg)
summary(highway)
##
## Call:
## lm(formula = hwy ~ displ, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1039 -2.1646 -0.2242 2.0589 15.0105
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.6977 0.7204 49.55 <2e-16 ***
## displ -3.5306 0.1945 -18.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.836 on 232 degrees of freedom
## Multiple R-squared: 0.5868, Adjusted R-squared: 0.585
## F-statistic: 329.5 on 1 and 232 DF, p-value: < 2.2e-16
ggplot(mpg, aes(displ,cty)) +
geom_point() +
geom_smooth(method = "lm")
library(texreg)
screenreg(list(highway, City))
##
## ===================================
## Model 1 Model 2
## -----------------------------------
## (Intercept) 35.70 *** 25.99 ***
## (0.72) (0.48)
## displ -3.53 *** -2.63 ***
## (0.19) (0.13)
## -----------------------------------
## R^2 0.59 0.64
## Adj. R^2 0.59 0.64
## Num. obs. 234 234
## RMSE 3.84 2.57
## ===================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
For every 1 liter increase in engine displacement, Highway MPG decreases by -3.53 gallons. Therefore, we can see that displacement and Highway MPG have a negative correlation from the graph and the linear coefficients even though both the variables are highly significant.
For every 1 liter increase in engine displacement, City MPG decreases by -2.63 gallons. Therefore, we can see that displacement and City MPG have a negative correlation from the graph and the linear coefficients even though both the variables are highly significant.
Drive<- lm (hwy ~ drv*displ, data = mpg)
summary(Drive)
##
## Call:
## lm(formula = hwy ~ drv * displ, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.489 -1.895 -0.191 1.797 13.467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.6831 1.0961 27.994 < 2e-16 ***
## drvf 6.6950 1.5670 4.272 2.84e-05 ***
## drvr -4.9034 4.1821 -1.172 0.2422
## displ -2.8785 0.2638 -10.913 < 2e-16 ***
## drvf:displ -0.7243 0.4979 -1.455 0.1471
## drvr:displ 1.9550 0.8148 2.400 0.0172 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.034 on 228 degrees of freedom
## Multiple R-squared: 0.746, Adjusted R-squared: 0.7405
## F-statistic: 134 on 5 and 228 DF, p-value: < 2.2e-16
There is an interaction effect between Drv and Displacement because the t-value is significat 0.0172 for the Rear Wheel. With having a Rear Wheel Drive Car with an incraese of 1 liter in Displacement, MPG will decrease by (0.9235) as compared to Front Wheel Drive.