Executive summary

The objective of this project was to take a data set of 32 car models from the model years 1973–74, analyze the MPG performance and car characteristics, and answer for Motor Trend magazine the following two questions:

Question 1: Is an automatic or manual transmission better for MPG (miles per gallon)

Question 2: Quantify the MPG difference between automatic and manual transmissions

A linear regression model with MPG as the dependent variable and the transmission type as an independent variable showed that a manual transmission vehicles better, with a 7.25 MPG advantage over automatic transmission vehicles.

Data Description

The data used to answer the two key questions was taken from the description of the ‘mtcars’ database, which is one of the datasets that are included in the RStudio program. The data originated from the 1974 edition of Motor Trend magazine.

The data was stored in the ‘R’ programming language data frame format, and consists of 32 rows of observations, and 11 variables. For this analysis, only the following two columns were relevant:

Assumptions

The two questions implied that ‘mpg’ column represented the response, or dependent, variable, and that the transmission type was the most important independent variable. For clarity, the column name for the transmission was changed from ‘am’ to ‘transmission.’

A summary of the 32 MPG values showed that they ranged from 10.4 to 33.9, with a median of 19.2. The histogram of the MPG values is included in the Appendix.

cardata <- mtcars
colnames(cardata)[9] <- "transmission" 
# Renamed transmission variable from 'am' to 'transmission'
summary(cardata$mpg)

Data Analysis

Estimating differences due to transmission type: A t-test to compare the mean MPG values for automatic and manual transmission vehicles. The two sets of data were unpaired, and it was not assumed that there were equal variances.

manuals <- cardata[cardata$transmission==1,"mpg"]
automatics <- cardata[cardata$transmission==0,"mpg"]
t.test(manuals,automatics)

The results show that t = 3.7671, p-value = 0.001374, and that the 95% confidence interval did not include zero, so one can reject the null hypothesis that there are no difference in MPG means.

To answer the two key questions in this analysis, the linear regression model was more appropriate because it could answer both key questions by showing whether the MPG means of manual and automatic transmission vehicles are significantly different, and whether the type of transmission plays a significant role in that difference.

Linear regression model for MPG: The linear regression model used observed MPG and transmission type values as inputs to the following linear regression model: MPG = β0 + β1*(transmission)

The β0 value modeled the automatic transmission MPG and the β1 value modeld the influence of a manual transmission on the MPG estimate. transmission type.

basereg <- lm(mpg ~ transmission, cardata)
summary(basereg)

The results of the regression analysis shows that the transmission variable makes a significant contribution to the model (t=4.106, p=0.000285). In other words, one can reject the null hypothesis that β1 = 0.

While the β1 value is significantly different from zero, the R-squared value of 0.36 implying that only about a third of the variation is explained by the model.

Although the R-squared value was relatively low, there is sufficient information from the regression to answer the two questions which were the focus of this analysis:

Question 1: Is an automatic or manual transmission better for MPG (miles per gallon)

Based on the linear model, the estimated MPG for a manual transmission (transmission = 1) is 17.15 + 7.25 = 24.4, which is higher than the automatic transmission (transmission = 0) MPG of 17.15.

Question 2: Quantify the MPG difference between automatic and manual transmissions

Given that ‘transmission’ is a binary value, the MPG difference is given entirely by the β1 value of the linear regression model, and that value is 7.25 miles per gallon.

Additional insights from the regression: The least squares estimate given by the regression analysis produces residual values, which are the differences between the actual MPG values and the estimated values from the linear estimate, should be normally distributed. The residuals in this analysis ranged from -9.3923 to 9.5077, with a median of -0.2974. Although there were only 32 values, the histogram and Normal probability plot in the Appendix show that of the residual values were roughly normally distributed.

Appendix