Motor Trend, a magazine about the automobile industry, is interested in exploring the relationship between a set of variables and miles per gallon (MPG). By looking at a data set of a collection of cars, Motor Trend is particularly interested in the following 2 questions:
From our analysis, we concluded that there is a significant difference between the mean MPG for manual and automatic transmission cars. Essentially, manual transmission cars achieved an MPG of 1.8 greater than that of automatic transmission cars.
# Loading Libraries and Data
library(ggplot2)
data(mtcars)
head(mtcars)
# Transform Variables into Factors
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels=c("Automatic","Manual"))
Plots for Exploratory Data Analyses are included in the Appendix (Plot 1). They show that there is a significant difference between the mean MPG for manual and automatic transmission cars, with automatic transmission cars having a lower mean MPG.
# Data Analysis using a t-test
aggregate(mpg~am, data = mtcars, mean)
D_automatic <- mtcars[mtcars$am == "Automatic",]
D_manual <- mtcars[mtcars$am == "Manual",]
t.test(D_automatic$mpg, D_manual$mpg)
##
## Welch Two Sample t-test
##
## data: D_automatic$mpg and D_manual$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
The p-value obtained is 0.001374. This suggests that there is a significant difference between the mean MPG for manual and automatic transmission cars.
# Data Analysis using a Simple Linear Regression Model
init <- lm(mpg ~ am, data = mtcars)
summary(init)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
As shown above, the mean MPG for automatic transmission cars is 17.1 while the mean MPG for manual transmission cars is higher by 7.2. The R^2 value is 0.36 which suggests that a simple linear regression model only explains 36% of the variance. Hence, we need to analyze our data with a multivariate linear regression model.
We explored the correlations between MPG and the other variables with a pairs plot as shown in the Appendix (Plot 2). Essentially, the variables: cyl, disp, hp, and wt have the strongest correlations with MPG. We analyzed our data using these variables with a multivariate linear regression model.
# Data Analysis using a Multivariate Linear Regression Model
betterFit <- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
anova(init, betterFit)
The p-value obtained is 8.637e-08. This suggests that a multivariate linear regression model may be a better fit for our data analysis than the initial simple regression model. We looked at the residuals for non-normality as shown in the Appendix (Plot 3). They are all normally distributed and homoskedastic.
# Summarizing the Data
summary(betterFit)
##
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9374 -1.3347 -0.3903 1.1910 5.0757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.864276 2.695416 12.564 2.67e-12 ***
## amManual 1.806099 1.421079 1.271 0.2155
## cyl6 -3.136067 1.469090 -2.135 0.0428 *
## cyl8 -2.717781 2.898149 -0.938 0.3573
## disp 0.004088 0.012767 0.320 0.7515
## hp -0.032480 0.013983 -2.323 0.0286 *
## wt -2.738695 1.175978 -2.329 0.0282 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.453 on 25 degrees of freedom
## Multiple R-squared: 0.8664, Adjusted R-squared: 0.8344
## F-statistic: 27.03 on 6 and 25 DF, p-value: 8.861e-10
As shown above, the multivariate linear regression model explains 86.64% of the variance. The variables: cyl, disp, hp, and wt have an effect on the correlation between MPG and the type of transmission cars. Essentially, the difference between the mean MPG for manual and automatic transmission cars is 1.81.
Plot 1 - Boxplot of MPG by Transmission Type
# Plot 1 - Boxplot of MPG by Transmission Type
boxplot(mpg ~ am, data = mtcars, col = (c("red","blue")),
ylab = "Miles per Gallon (MPG)", xlab = "Transmission Type")
Plot 2 - Pairs Plot of Dataset
# Plot 2 - Pairs Plot of Dataset
pairs(mpg ~ ., data = mtcars)
Plot 3 - Plot of Residuals
# Plot 3 - Plot of Residuals
par(mfrow = c(2,2))
plot(betterFit)