Peer-graded Assignment: Regression Models Course Project

Assignement

You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:

1 “Is an automatic or manual transmission better for MPG”

2 “Quantify the MPG difference between automatic and manual transmissions”

Exploratory Data Analisys

data(mtcars)
t.test(mtcars[mtcars$am == 0,]$mpg, mtcars[mtcars$am == 1,]$mpg)

## 
##  Welch Two Sample t-test
## 
## data:  mtcars[mtcars$am == 0, ]$mpg and mtcars[mtcars$am == 1, ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

It seems to exist a significant difference in mpg between cars with automatic transmission and cars with manual transmission.

Linear Regresison

fit <- lm(mpg  ~ factor(am), data = mtcars)
summary(fit)

## 
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## factor(am)1    7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

This model seems to be highly significative but I guess we’re missing something beacuse the impact in efficiency seems to big to me. There are other variables that influence the effiency that we’re not taking into consideration such as weight, horse power, acceleration etc..

Let’s plot our regression for now.

Let’s add the weight variable in the regression

fit2 <- lm(mpg  ~ factor(am) + wt, data = mtcars)
summary(fit2)

## 
## Call:
## lm(formula = mpg ~ factor(am) + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5295 -2.3619 -0.1317  1.4025  6.8782 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.32155    3.05464  12.218 5.84e-13 ***
## factor(am)1 -0.02362    1.54565  -0.015    0.988    
## wt          -5.35281    0.78824  -6.791 1.87e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.098 on 29 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7358 
## F-statistic: 44.17 on 2 and 29 DF,  p-value: 1.579e-09

As I thought, adding the weight variable the transmission seems even to be insignificant in term of efficiency. Let’s plot the residuals to see wheter there’s something wrong.

I’m finally add another vaurable to the regression: Accelaration. Since acceleration seems to be uncorrelated with the weight variable, it would be useful to take it in control.

fit5 <- lm(mpg ~ factor(am) + wt + qsec, data = mtcars)
summary(fit5)

## 
## Call:
## lm(formula = mpg ~ factor(am) + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## factor(am)1   2.9358     1.4109   2.081 0.046716 *  
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Uncertainty in the conclusions and regression inference

the p-value of the coefficient for the manual transmission is barely significant for a treshold of .05 and its confidence interval is the following, which, by the way, is the measure of my uncertainty.

confint(fit5)[2,]

##      2.5 %     97.5 % 
## 0.04573031 5.82594408

Given that, I cannot conclude that I have enough data to answer the question.

Executive Summary

In general, cars with manual transmission are 7.24 higher in MPG(Mile per Gallon) than cars with automatic transmission.
However, if weight is also considered, we will find that the former are 0.02 lower in MPG than the latter; if weight and qsec(1/4 mile time) also considered, 2.93 higher.
Hence we cannot conclude that an automatic transmission is better for MPG than a manual one.

Peer-graded Assignment: Regression Models Course Project

Gaspare Mattarella

29/4/2020

Peer-graded Assignment: Regression Models Course Project

Assignement

Exploratory Data Analisys

Linear Regresison

Uncertainty in the conclusions and regression inference

Executive Summary