This regression analysis has been performed using the mtcars dataset for the “project” objective of being a Motor Trend magazine employee. This summary reviews the relationship between the set of variables for vehicles and miles per gallon (MPG) as the outcome variable of interest. This project investigates two primary questions:
From the regression analysis performed, we find that the manual transmission vehicles have 7.245 more MPG versus the automatic transmission. We also find that the \(R^2\) value is 0.3598…indicating that our model only explains 35.98% of the variance. Requiring further analysis. In addition, we find that the best model performed in the exploratory appendix section, indicates that the mpg ~ wt + qsec + am model is the best model.
The mtcars dataset is from 1974. It contains 11 performance/design variable attributes of 32 car models from 1973-1974. The description of the variables can be located in the appendix section.
The data is read into R and the data transformations by factoring for necesary variables to investigate the regression analysis.
data(mtcars)
head(mtcars,4)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# create factors with value labels
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
summary(mtcars)
## mpg cyl disp hp drat
## Min. :10.4 4cyl:11 Min. : 71.1 Min. : 52.0 Min. :2.76
## 1st Qu.:15.4 6cyl: 7 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.08
## Median :19.2 8cyl:14 Median :196.3 Median :123.0 Median :3.69
## Mean :20.1 Mean :230.7 Mean :146.7 Mean :3.60
## 3rd Qu.:22.8 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.92
## Max. :33.9 Max. :472.0 Max. :335.0 Max. :4.93
## wt qsec vs am gear
## Min. :1.51 Min. :14.5 Min. :0.000 Automatic:19 3gears:15
## 1st Qu.:2.58 1st Qu.:16.9 1st Qu.:0.000 Manual :13 4gears:12
## Median :3.33 Median :17.7 Median :0.000 5gears: 5
## Mean :3.22 Mean :17.8 Mean :0.438
## 3rd Qu.:3.61 3rd Qu.:18.9 3rd Qu.:1.000
## Max. :5.42 Max. :22.9 Max. :1.000
## carb
## Min. :1.00
## 1st Qu.:2.00
## Median :2.00
## Mean :2.81
## 3rd Qu.:4.00
## Max. :8.00
Included in the appendix are various graphics. Figure 1, the boxplot of the varialbes mpg
when am
is either Automatic
or Manual
. The plot indicates that an increase in mpg occurs when the transmission is manual.
The first step is to run a correlation analysis using all 11 variables and mpg values.
data(mtcars)
cor.out <- sort(cor(mtcars)[,1])
round(cor.out, 3)
wt cyl disp hp carb qsec gear am vs drat
-0.868 -0.852 -0.848 -0.776 -0.551 0.419 0.480 0.600 0.664 0.681 mpg 1.000
The second step is to run a t-test analysis using the variables of interest.
t_mt <- t.test(mpg ~ am, data = mtcars)
t_mt
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean in group 0 mean in group 1
## 17.15 24.39
The third step is to run a simple regression analysis using the variables of interest. We are using AIC criteria for the best model selection and investigating outliers in the plots provided in the appendix.
reg.1 <- lm( mpg ~ factor(am), data=mtcars)
summary(reg.1)
##
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.392 -3.092 -0.297 3.244 9.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.15 1.12 15.25 1.1e-15 ***
## factor(am)1 7.24 1.76 4.11 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.338
## F-statistic: 16.9 on 1 and 30 DF, p-value: 0.000285
best <- lm(mpg ~ am + wt + qsec, data = mtcars)
anova(reg.1, best)
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ am + wt + qsec
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 721
## 2 28 169 2 552 45.6 1.6e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Position | Variable Code | Variable Name Description |
---|---|---|
[,1] | mpg | Miles/(US) gallon |
[,2] | cyl | Number of cylinders |
[,3] | disp | Displacement (cubic inches) |
[,4] | hp | Gross horsepower |
[,5] | drat | Rear axle ratio |
[,6] | wt | Weight (lb/1000) |
[,7] | qsec | 1/4 mile time |
[,8] | vs | V/S |
[,9] | am | Transmission (0 = automatic, 1 = manual) |
[,10] | gear | Number of forward gears |
[,11] | carb | Number of carburetors |
library(ggplot2)
# Boxplots of mpg by transmission style
# observations (points) are overlayed and jittered
qplot(am, mpg, data=mtcars, geom=c("boxplot", "jitter"),
fill=factor(am), main="Mileage by Transmission Type",
xlab="", ylab="Miles per Gallon")
# Separate regressions of mpg on weight for each number of cylinders
qplot(factor(am), mpg, data=mtcars, geom=c("point", "smooth"),
method="lm", formula=y~x, color=am,
main="Regression of MPG on Transmission Type",
xlab="Transmission", ylab="Miles per Gallon")
## geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
par(mfrow = c(2,2))
plot(best)
one_model <- lm(mpg ~ ., data = mtcars)
find.model <- step(one_model, direction = "both")
## Start: AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - cyl 1 0.08 148 68.9
## - vs 1 0.16 148 68.9
## - carb 1 0.41 148 69.0
## - gear 1 1.35 149 69.2
## - drat 1 1.63 149 69.2
## - disp 1 3.92 151 69.7
## - hp 1 6.84 154 70.3
## - qsec 1 8.86 156 70.8
## <none> 148 70.9
## - am 1 10.55 158 71.1
## - wt 1 27.01 174 74.3
##
## Step: AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - vs 1 0.27 148 67.0
## - carb 1 0.52 148 67.0
## - gear 1 1.82 149 67.3
## - drat 1 1.98 150 67.3
## - disp 1 3.90 152 67.7
## - hp 1 7.36 155 68.5
## <none> 148 68.9
## - qsec 1 10.09 158 69.0
## - am 1 11.84 159 69.4
## + cyl 1 0.08 148 70.9
## - wt 1 27.03 175 72.3
##
## Step: AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - carb 1 0.69 148 65.1
## - gear 1 2.14 150 65.4
## - drat 1 2.21 150 65.4
## - disp 1 3.65 152 65.8
## - hp 1 7.11 155 66.5
## <none> 148 67.0
## - am 1 11.57 159 67.4
## - qsec 1 15.68 164 68.2
## + vs 1 0.27 148 68.9
## + cyl 1 0.19 148 68.9
## - wt 1 27.38 175 70.4
##
## Step: AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
##
## Df Sum of Sq RSS AIC
## - gear 1 1.6 150 63.5
## - drat 1 1.9 150 63.5
## <none> 148 65.1
## - disp 1 10.1 159 65.2
## - am 1 12.3 161 65.7
## - hp 1 14.8 163 66.2
## + carb 1 0.7 148 67.0
## + vs 1 0.4 148 67.0
## + cyl 1 0.4 148 67.0
## - qsec 1 26.4 175 68.4
## - wt 1 69.1 218 75.3
##
## Step: AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - drat 1 3.3 153 62.2
## - disp 1 8.5 159 63.2
## <none> 150 63.5
## - hp 1 13.3 163 64.2
## + gear 1 1.6 148 65.1
## + cyl 1 1.0 149 65.2
## + vs 1 0.6 149 65.3
## + carb 1 0.1 150 65.4
## - am 1 20.0 170 65.5
## - qsec 1 25.6 176 66.5
## - wt 1 67.6 218 73.4
##
## Step: AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - disp 1 6.6 160 61.5
## <none> 153 62.2
## - hp 1 12.6 166 62.7
## + drat 1 3.3 150 63.5
## + gear 1 3.0 150 63.5
## + cyl 1 2.4 151 63.6
## + vs 1 1.1 152 63.9
## + carb 1 0.0 153 64.2
## - qsec 1 26.5 180 65.3
## - am 1 32.2 186 66.3
## - wt 1 69.0 222 72.1
##
## Step: AIC=61.52
## mpg ~ hp + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - hp 1 9.2 169 61.3
## <none> 160 61.5
## + disp 1 6.6 153 62.2
## + carb 1 3.2 157 62.9
## + drat 1 1.4 159 63.2
## - qsec 1 20.2 180 63.3
## + cyl 1 0.2 160 63.5
## + vs 1 0.2 160 63.5
## + gear 1 0.2 160 63.5
## - am 1 26.0 186 64.3
## - wt 1 78.5 239 72.3
##
## Step: AIC=61.31
## mpg ~ wt + qsec + am
##
## Df Sum of Sq RSS AIC
## <none> 169 61.3
## + hp 1 9.2 160 61.5
## + carb 1 8.0 161 61.8
## + disp 1 3.3 166 62.7
## + cyl 1 1.5 168 63.0
## + drat 1 1.4 168 63.0
## + gear 1 0.1 169 63.3
## + vs 1 0.0 169 63.3
## - am 1 26.2 195 63.9
## - qsec 1 109.0 278 75.2
## - wt 1 183.3 353 82.8
summary(find.model)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.481 -1.556 -0.726 1.411 4.661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.618 6.960 1.38 0.17792
## wt -3.917 0.711 -5.51 7e-06 ***
## qsec 1.226 0.289 4.25 0.00022 ***
## am 2.936 1.411 2.08 0.04672 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.46 on 28 degrees of freedom
## Multiple R-squared: 0.85, Adjusted R-squared: 0.834
## F-statistic: 52.7 on 3 and 28 DF, p-value: 1.21e-11