The purpose of this analysis is to explore the relationship between a set of variables and miles per gallon(MPG)(outcome). We will be focussing our analysis on the following points.
data(mtcars)
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
A data frame with 32 observations on 11 variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (lb/1000)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
There are 11 variables. Since we are interested in the relationship of MPG with these variables, we use the cor() function to get those values,
cor(mtcars$mpg,mtcars[,-1])
## cyl disp hp drat wt qsec vs am gear
## [1,] -0.8522 -0.8476 -0.7762 0.6812 -0.8677 0.4187 0.664 0.5998 0.4803
## carb
## [1,] -0.5509
We can see that Number of Cylinders, Displacement, Gross Horsepower, Weight and Number of Carburetors are negatively correlated to Miles per Gallon. On the other hand, Rear Axle Ratio, 1/4 mile time, V/S, Transmission and Number of Forward Gears are positively correlated to Miles Per Gallon.
Changing some variables into factor variables.
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am<-as.factor(mtcars$am)
levels(mtcars$am)<-c("Automatic","Manual")
A boxplot was created to see the effect of Automatic and Manual Cars on the Miles per Gallon. We can clearly see that the average Miles per Gallon of Manual cars is more than the average Miles per Gallon of Automatic cars.
T-Test was performed to support the hypothesis.
t.test(mtcars$mpg~mtcars$am,conf.level=0.95)
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.15 24.39
The p-value is 0.001374 (<0.05). Based on this, one might conclude that mpg for Manual cars is greater than mpg for Automatic cars. However, the important point here is that we are considering all variables constant. Other variables might have an effect on mpg too and thus we need to find the best model.
Stepwise model selection was performed using backward elimination to determine the variables for the best model.
fullModel<-lm(mpg~., data=mtcars)
bestModel<-step(fullModel, direction="backward", trace=FALSE)
summary(bestModel)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.939 -1.256 -0.401 1.125 5.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.7083 2.6049 12.94 7.7e-13 ***
## cyl6 -3.0313 1.4073 -2.15 0.0407 *
## cyl8 -2.1637 2.2843 -0.95 0.3523
## hp -0.0321 0.0137 -2.35 0.0269 *
## wt -2.4968 0.8856 -2.82 0.0091 **
## amManual 1.8092 1.3963 1.30 0.2065
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.866, Adjusted R-squared: 0.84
## F-statistic: 33.6 on 5 and 26 DF, p-value: 1.51e-10
Based on the above results, the best model consists of cyl6,cyl8, hp, wt and amManual. The adjusted R squared indicates that about 84% variance is explained by the best model. The output also suggests that mpg decreases by 3.03 and 2.16 with respect to cyl 6 and cyl8, 0.03(hp), 2.49(wt). On the other hand, mpg increases by 1.80 by having a manual transmission.
Finally we plot the residual and diagnostic plot for this linear model.
Boxplot for MPG vs Transmission Type:
boxplot(mpg ~ am, data = mtcars, col = "blue", ylab = "Miles per Gallon")
Residual and diagnostic plot of the Best Model:
par(mfrow=c(2, 2))
plot(bestModel)