The purpose of this report is to look at the mtcars data set in R to explore the relationship between a set of variables and miles per gallon (MPG). The report aims to address the following two questions:
-The results of the following analysis show that cars with manual transmission got better MPG than those with automatic transmission.
-If you simply look at transmission by itself, it appears that manual transmissions get on average 7.25 more MPG.
-When you account for all the other variables which can also influence MPG, it appears that a manual transmission is responsible for 0.595 better MPG than automatic transmission. Some of those variables could be confounding variables though.
-The author’s best estimate is to use a multivariate model that predicts an outcome of mpg based on hp, wt, and transmission. This model yields manual transmissions being responsible for 2.08 better MPG in cars than automatic transmissions.
The methods used to do this analysis were:
*A simple linear model to see how transmission alone affected MPG
*A multivariate model to see how transmission along with all other variables affected MPG
*A multivariate model to see how transmission along with a select few other variables affected MPG
The first model is a simple linear regression of our outcome, MPG, influenced by the predictor, am (transmission type).
#Load in dataset. Convert all necessary variables to factor and label transmission variables.
df <- mtcars
df$am <- factor(df$am, labels = c("Automatic", "Manual"))
df$cyl <- factor(df$cyl)
df$vs <- factor(df$vs)
df$gear <- factor(df$gear)
df$carb <- factor(df$carb)
#Create linear model and output summary statistics
fit1 <- lm(mpg ~ am - 1, data = df)
summary(fit1)
##
## Call:
## lm(formula = mpg ~ am - 1, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## amAutomatic 17.147 1.125 15.25 1.13e-15 ***
## amManual 24.392 1.360 17.94 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9452
## F-statistic: 277.2 on 2 and 30 DF, p-value: < 2.2e-16
The model calculates that automatic transmissions get an average of 17.1 MPG while manual transmissions get an average of 24.4 MPG. The p-value of 2.2e-16 shows a strong statistical significance here. Figure 1 in the appendix shows this apparent difference between the two types of transmission as well.
However, one must consider that there are other variables that could affect MPG. It could even be possible that transmission type is a confounding variable or is directly related to other variables that have an effect. So we must also look at a multivariate model.
Now we will analyze a multivariate model with all the variables available to us as possible influencers.
fit2 <- lm(mpg ~ . - 1, data = df)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ . - 1, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5087 -1.3584 -0.0948 0.7745 4.6251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## cyl4 23.87913 20.06582 1.190 0.2525
## cyl6 21.23044 18.33416 1.158 0.2650
## cyl8 23.54297 18.22250 1.292 0.2159
## disp 0.03555 0.03190 1.114 0.2827
## hp -0.07051 0.03943 -1.788 0.0939 .
## drat 1.18283 2.48348 0.476 0.6407
## wt -4.52978 2.53875 -1.784 0.0946 .
## qsec 0.36784 0.93540 0.393 0.6997
## vs1 1.93085 2.87126 0.672 0.5115
## amManual 1.21212 3.21355 0.377 0.7113
## gear4 1.11435 3.79952 0.293 0.7733
## gear5 2.52840 3.73636 0.677 0.5089
## carb2 -0.97935 2.31797 -0.423 0.6787
## carb3 2.99964 4.29355 0.699 0.4955
## carb4 1.09142 4.44962 0.245 0.8096
## carb6 4.47757 6.38406 0.701 0.4938
## carb8 7.25041 8.36057 0.867 0.3995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.833 on 15 degrees of freedom
## Multiple R-squared: 0.9914, Adjusted R-squared: 0.9817
## F-statistic: 102 on 17 and 15 DF, p-value: 1.979e-12
This model yields a very strong overall p-value and multiple R^2 value, but it’s possible we could have some confounding variables included here or too many regressors. Another potential issue is that some of the variables could be correlated to one another, which would cause variance inflation.
I notice that a lot of the individual variables have poor p-values. Horsepower (hp) and weight (wt) have the strongest individual p-values, so it may be best to look at another model with just those two along with transmission. You can see the variables horsepower and weight are not strongly correlated to eachother either.
cor(df$hp, df$wt)
## [1] 0.6587479
#Multivariate model with mpg predicted by hp, wt, and transmission
fit3 <- lm(mpg ~ wt + am + hp - 1, data = df)
summary(fit3)
##
## Call:
## lm(formula = mpg ~ wt + am + hp - 1, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## wt -2.878575 0.904971 -3.181 0.003574 **
## amAutomatic 34.002875 2.642659 12.867 2.82e-13 ***
## amManual 36.086585 1.736338 20.783 < 2e-16 ***
## hp -0.037479 0.009605 -3.902 0.000546 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.9872, Adjusted R-squared: 0.9853
## F-statistic: 538.2 on 4 and 28 DF, p-value: < 2.2e-16
This model also yields a very strong multiple R^2 value and p-value and is even more impressive since there are fewer variables involved. I feel confident about this being a good choice of model to work with to predict MPG. This model yields manual transmissions being responsible for 2.08 better MPG in cars than automatic transmissions on average.
Lastly, it is worth analyzing the residual plots (see Figure 2). In the Residuals vs Fitted plot, everything looks ok. We don’t see any patterns giving issues of heteroskadacity. The Q-Q plot shows approximate normality of the error terms. The Scale-Location plot, like the Residuals vs Fitted, also doesn’t show any unusual patterns. The Residuals vs Leverage plot doesn’t show any high leverage points with an unusual residual.
As stated in the executive summary, the results of the analysis show that cars in this data set with manual transmission got better MPG than those with automatic transmission. The best answer is that when you account for horsepower and weight which can also influece MPG in addition to the transmission type, it appears that a car with manual transmission opposed to automatic can expect on average 2.08 more MPG.
The adjusted r^2 = 0.9853 which means that 98.53% of the variation in MPG can be explained by the variables transmission type, weight, and horsepower. That still leaves for some variation unaccounted for.
A final statement is that “All models are wrong, but some models are useful.” So one thing I can be certain of is that this model is wrong, but perhaps it may be of some use.
This was the graph looked at to get an idea of how the transmissions alone affected MPG.
library(ggplot2)
qplot(am, mpg, data = df, geom = "boxplot", color = factor(am), xlab = "Transmission Type")
This is a graph of the residuals from the final multivariate model used as the best prediction model.
plot(fit3)