Question 1

## Warning in eval(substitute(list(...)), `_data`, parent.frame()): NAs introduced
## by coercion
##   mpg cylinders displacement horsepower weight acceleration year
## 1  18         8          307        130   3504         12.0   70
## 2  15         8          350        165   3693         11.5   70
## 3  18         8          318        150   3436         11.0   70
## 4  16         8          304        150   3433         12.0   70
## 5  17         8          302        140   3449         10.5   70
## 6  15         8          429        198   4341         10.0   70

Question 2

## `geom_smooth()` using formula 'y ~ x'

Question 3

## 
## Call:
## lm(formula = mpg ~ cylinders, data = cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.2607  -3.3841  -0.6478   2.5538  17.9022 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.9493     0.8330   51.56   <2e-16 ***
## cylinders    -3.5629     0.1458  -24.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.942 on 396 degrees of freedom
## Multiple R-squared:  0.6012, Adjusted R-squared:  0.6002 
## F-statistic: 597.1 on 1 and 396 DF,  p-value: < 2.2e-16

For every one unit increase in cylinders fuel efficiency in miles per gallon is decreasing at a rate of 3.5629. The result seems to be in line with the graph from question 2 as we can see it decreasing.

Question 4

## 
## Call:
## lm(formula = mpg ~ cylinders + weight + year, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9727 -2.3180 -0.0755  2.0138 14.3505 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -13.925603   4.037305  -3.449 0.000623 ***
## cylinders    -0.087402   0.232075  -0.377 0.706665    
## weight       -0.006511   0.000459 -14.185  < 2e-16 ***
## year          0.753286   0.049802  15.126  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.438 on 394 degrees of freedom
## Multiple R-squared:  0.8079, Adjusted R-squared:  0.8065 
## F-statistic: 552.4 on 3 and 394 DF,  p-value: < 2.2e-16

With a one unit increase in cylinders fuel efficiency decreases at a rate of 0.0874, with a one pound increase in weight fuel efficiency decreases at a rate of 0.0065, and with a one year increase in year fuel efficiency increases at a rate of 0.7532. The coefficient cylinders is not statistically significant with a p-value of 0.706665.

Question 5: What could explain the difference between the answers is the addition of the new variables showing their effect on the dependent variable. In question 3 just incluiding cylinders and not taking into consideration other variables can result in a very biased result due to the omitted variable bias. So in question 4 when taking into consideration the effect of weight and year as well can help decrease the bias from question 3. And as can be seen in the results the coefficient of cylinders in question 3 was -3.5629 which shows a very negative correlation but in question 4 when taking into consideration two other variables we can see that the coefficient of cylinders is now -0.0874 which shows that we would have interpreted the effect of cylinders on fuel efficiency a lot more than it really is.

Question 6

## 
## Call:
## lm(formula = mpg ~ year, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.024  -5.451  -0.390   4.947  18.200 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -69.55560    6.58911  -10.56   <2e-16 ***
## year          1.22445    0.08659   14.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.379 on 396 degrees of freedom
## Multiple R-squared:  0.3356, Adjusted R-squared:  0.3339 
## F-statistic:   200 on 1 and 396 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'

## 
## Call:
## lm(formula = mpg ~ year + weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8777 -2.3140 -0.1211  2.0591 14.3330 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.420e+01  3.968e+00  -3.578 0.000389 ***
## year         7.566e-01  4.898e-02  15.447  < 2e-16 ***
## weight      -6.664e-03  2.139e-04 -31.161  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.435 on 395 degrees of freedom
## Multiple R-squared:  0.8079, Adjusted R-squared:  0.8069 
## F-statistic: 830.4 on 2 and 395 DF,  p-value: < 2.2e-16

First I decided to simply run a regression model that shows the change in fuel efficiency with relation to years and as can be see in the scatter plot above fuel efficiency has increased with time and this alone is helping to prove what my friend is asserting, that over the years fuel efficency is being improved as they appear to be more conscious. I assumed that car manufacturers would have tried to produce lighter cars as the years progressed to help improve fuel efficiency, as lighter cars help improve efficiency; however, based on the results after running the regression it appaears that there is a slight negtive correlation between weight and miles per gallon proving the hypothesis wrong. We do still see that efficiency has increased over the years and this could be due to a different variable.