You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
1 “Is an automatic or manual transmission better for MPG”
2 “Quantify the MPG difference between automatic and manual transmissions”
data(mtcars)
t.test(mtcars[mtcars$am == 0,]$mpg, mtcars[mtcars$am == 1,]$mpg)
##
## Welch Two Sample t-test
##
## data: mtcars[mtcars$am == 0, ]$mpg and mtcars[mtcars$am == 1, ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
It seems to exist a significant difference in mpg between cars with automatic transmission and cars with manual transmission.
fit <- lm(mpg ~ factor(am), data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## factor(am)1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
This model seems to be highly significative but I guess we’re missing something beacuse the impact in efficiency seems to big to me. There are other variables that influence the effiency that we’re not taking into consideration such as weight, horse power, acceleration etc..
Let’s plot our regression for now.
Let’s add the weight variable in the regression
fit2 <- lm(mpg ~ factor(am) + wt, data = mtcars)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ factor(am) + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5295 -2.3619 -0.1317 1.4025 6.8782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155 3.05464 12.218 5.84e-13 ***
## factor(am)1 -0.02362 1.54565 -0.015 0.988
## wt -5.35281 0.78824 -6.791 1.87e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.098 on 29 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7358
## F-statistic: 44.17 on 2 and 29 DF, p-value: 1.579e-09
As I thought, adding the weight variable the transmission seems even to be insignificant in term of efficiency. Let’s plot the residuals to see wheter there’s something wrong.
I’m finally add another vaurable to the regression: Accelaration. Since acceleration seems to be uncorrelated with the weight variable, it would be useful to take it in control.
fit5 <- lm(mpg ~ factor(am) + wt + qsec, data = mtcars)
summary(fit5)
##
## Call:
## lm(formula = mpg ~ factor(am) + wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## factor(am)1 2.9358 1.4109 2.081 0.046716 *
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
the p-value of the coefficient for the manual transmission is barely significant for a treshold of .05 and its confidence interval is the following, which, by the way, is the measure of my uncertainty.
confint(fit5)[2,]
## 2.5 % 97.5 %
## 0.04573031 5.82594408
Given that, I cannot conclude that I have enough data to answer the question.