Objective:

Look at a dataset of a collection of cars. Explore the relationship between a set of variables and miles per gallan (MPG). Particulaly pay attention to the following two questions:

  1. “Is an automatic or manual transmission better for MPG”

  2. “Quantify the MPG difference between automatic and manual transmissions”

Exploratory data analysis

Since we need to find out the relation ship between transmission and MPG. Let’s first draw a barplot to directly visuallize the output.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
df = mtcars
df$am = as.factor(df$am)
amplot = ggplot(df, aes(x = am, y = mpg)) + 
        geom_boxplot(aes(col = df$am))
amplot

From the plot we can see that an manuel transmission perfor better for MPG over manual transmission.

Let’s look at other factors that may influence the MPG of cars.

plot2 = ggplot(df, aes(x = wt, y = mpg)) +
        geom_point(aes(col = as.factor(df$cyl)))
plot2

From the plot we can see the lighter of the weight, the larger the MPG. Also cars with smaller cyl shows higher mpg.

Quantify the relationship between MPG and other parameters

Model 1

First let’s fit combination of all variables with MPG.

df$cyl = as.factor(df$cyl)
fit1 = lm (mpg ~., data = df)
summary(fit1)
## 
## Call:
## lm(formula = mpg ~ ., data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4734 -1.3794 -0.0655  1.0510  4.3906 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 17.81984   16.30602   1.093   0.2875  
## cyl6        -1.66031    2.26230  -0.734   0.4715  
## cyl8         1.63744    4.31573   0.379   0.7084  
## disp         0.01391    0.01740   0.799   0.4334  
## hp          -0.04613    0.02712  -1.701   0.1045  
## drat         0.02635    1.67649   0.016   0.9876  
## wt          -3.80625    1.84664  -2.061   0.0525 .
## qsec         0.64696    0.72195   0.896   0.3808  
## vs           1.74739    2.27267   0.769   0.4510  
## am1          2.61727    2.00475   1.306   0.2065  
## gear         0.76403    1.45668   0.525   0.6057  
## carb         0.50935    0.94244   0.540   0.5948  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.582 on 20 degrees of freedom
## Multiple R-squared:  0.8816, Adjusted R-squared:  0.8165 
## F-statistic: 13.54 on 11 and 20 DF,  p-value: 5.722e-07

From the summary result we can see there is no evidence aginast the null hypothesis for any variables. All p-values are larther than any accepted significance level.

But we can see the P-value of hp, wt, am are smaller than others. Thus they can be considered for a reduced model.

Model 2

fit2 = lm (mpg ~ hp + wt + am, data = df)
summary(fit2)
## 
## Call:
## lm(formula = mpg ~ hp + wt + am, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## am1          2.083710   1.376420   1.514 0.141268    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

Let’s check the diagnostic plots of the residuals for model 2

par(mfrow = c (2,2))
plot(fit2)

From the plots we can see there are several outliers in the dataset, such as Toyota Corolla, Chrysller Imperial and Fiat 128.

Model 3

In the model 3 we only fit the mpg with am

fit3 = lm (mpg ~ am, data = df)
summary(fit3)
## 
## Call:
## lm(formula = mpg ~ am, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am1            7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
par(mfrow = c (2,2))
plot (fit3)

It’s show that cars with a manual transmission achieve a fuel efficiency of 7.245 miles per gallon higher than cars with an automatic transmission. However this model only achieves an adjusted R squared value of 0.33855, which is worse than model 2. So transmission is poorer in prediction of fuel efficiency than the combination variable of hp, wt and am.

Conclusion

In conclusion, MPG is higher for manual cars.