This project analyses mtcars, a data set extracted from the 1974 Motor Trend US magazine comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. The goal of the analysis is to explore the relationship between a set of variables and miles per gallon (MPG). The analysis shows that:
The figures can be found in the Appendix at the end of the document.
data(mtcars)
mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$carb <- as.factor(mtcars$carb)
The data set is a data frame with 32 observations on 11 variables (Figure 1). Some variables (wt, hp, cyl, disp, vs, am) seem to be correlated to mpg (Figure 2). Focusing the analysis on trasmission type am, the boxplot (Figure 3) shows that cars with manual transmission seem to have higher miles per gallon. This is confirmed by running a t-test (Figure 4) where the null hypothesis of no difference between manual and automatic is rejected.
The first model has only one independent variable, am. The coefficient for am1 represent the increase in mpg when changing from automatic to manual transmission. The model produced statistically significant coefficients, but the R-squared says that about 36% of the variance is explained by the model.
fit_easy <- lm(mpg~am,data=mtcars)
summary(fit_easy)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
A multivariable model is built using a stepwise model selection to find which variables are significant predictors.
fit_all <- lm(mpg~.,data=mtcars)
fit_multi <- step(fit_all,direction = 'both')
summary(fit_multi)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## am1 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
The multivariable model has a higher R-squared, thus it explaines more variance. The coefficient for am is positive, meaning that manual transmission has higher miles per gallon than automatic.
Finally the model residuals are plotted (Figure 5). The chart with fitted values / residuals shows the points are randomly distributed and the normal Q-Q plot shows residuals are normally distributed. The Scale-Location chart seems to indicate constant variance of the residuals. There are a few outliers, but they do not seem to have a big impact on the model.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
pairs(mtcars)
boxplot(mpg~am,data=mtcars, col=c('blue','red'),xlab='Trasmission',ylab='MPG',
main='Transmission Type and MPG',
names=c('Automatic','Manual'))
t.test(mpg~am,data=mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
par(mfrow=c(2,2))
plot(fit_multi)