Question 1
## Warning in eval(substitute(list(...)), `_data`, parent.frame()): NAs introduced
## by coercion
## mpg cylinders displacement horsepower weight acceleration year
## 1 18 8 307 130 3504 12.0 70
## 2 15 8 350 165 3693 11.5 70
## 3 18 8 318 150 3436 11.0 70
## 4 16 8 304 150 3433 12.0 70
## 5 17 8 302 140 3449 10.5 70
## 6 15 8 429 198 4341 10.0 70
Question 2
## `geom_smooth()` using formula 'y ~ x'
Question 3
##
## Call:
## lm(formula = mpg ~ cylinders, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.2607 -3.3841 -0.6478 2.5538 17.9022
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.9493 0.8330 51.56 <2e-16 ***
## cylinders -3.5629 0.1458 -24.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.942 on 396 degrees of freedom
## Multiple R-squared: 0.6012, Adjusted R-squared: 0.6002
## F-statistic: 597.1 on 1 and 396 DF, p-value: < 2.2e-16
For every one unit increase in cylinders fuel efficiency in miles per gallon is decreasing at a rate of 3.5629. The result seems to be in line with the graph from question 2 as we can see it decreasing.
Question 4
##
## Call:
## lm(formula = mpg ~ cylinders + weight + year, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9727 -2.3180 -0.0755 2.0138 14.3505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.925603 4.037305 -3.449 0.000623 ***
## cylinders -0.087402 0.232075 -0.377 0.706665
## weight -0.006511 0.000459 -14.185 < 2e-16 ***
## year 0.753286 0.049802 15.126 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.438 on 394 degrees of freedom
## Multiple R-squared: 0.8079, Adjusted R-squared: 0.8065
## F-statistic: 552.4 on 3 and 394 DF, p-value: < 2.2e-16
With a one unit increase in cylinders fuel efficiency decreases at a rate of 0.0874, with a one pound increase in weight fuel efficiency decreases at a rate of 0.0065, and with a one year increase in year fuel efficiency increases at a rate of 0.7532. The coefficient cylinders is not statistically significant with a p-value of 0.706665.
Question 5: What could explain the difference between the answers is the addition of the new variables showing their effect on the dependent variable. In question 3 just incluiding cylinders and not taking into consideration other variables can result in a very biased result due to the omitted variable bias. So in question 4 when taking into consideration the effect of weight and year as well can help decrease the bias from question 3. And as can be seen in the results the coefficient of cylinders in question 3 was -3.5629 which shows a very negative correlation but in question 4 when taking into consideration two other variables we can see that the coefficient of cylinders is now -0.0874 which shows that we would have interpreted the effect of cylinders on fuel efficiency a lot more than it really is.
Question 6
##
## Call:
## lm(formula = mpg ~ year, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.024 -5.451 -0.390 4.947 18.200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -69.55560 6.58911 -10.56 <2e-16 ***
## year 1.22445 0.08659 14.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.379 on 396 degrees of freedom
## Multiple R-squared: 0.3356, Adjusted R-squared: 0.3339
## F-statistic: 200 on 1 and 396 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = mpg ~ year + weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8777 -2.3140 -0.1211 2.0591 14.3330
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.420e+01 3.968e+00 -3.578 0.000389 ***
## year 7.566e-01 4.898e-02 15.447 < 2e-16 ***
## weight -6.664e-03 2.139e-04 -31.161 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.435 on 395 degrees of freedom
## Multiple R-squared: 0.8079, Adjusted R-squared: 0.8069
## F-statistic: 830.4 on 2 and 395 DF, p-value: < 2.2e-16
First I decided to simply run a regression model that shows the change in fuel efficiency with relation to years and as can be see in the scatter plot above fuel efficiency has increased with time and this alone is helping to prove what my friend is asserting, that over the years fuel efficency is being improved as they appear to be more conscious. I assumed that car manufacturers would have tried to produce lighter cars as the years progressed to help improve fuel efficiency, as lighter cars help improve efficiency; however, based on the results after running the regression it appaears that there is a slight negtive correlation between weight and miles per gallon proving the hypothesis wrong. We do still see that efficiency has increased over the years and this could be due to a different variable.