Question 1
## Warning in eval(substitute(list(...)), `_data`, parent.frame()): NAs introduced
## by coercion
## mpg cylinders displacement horsepower weight acceleration year
## 1 18 8 307 130 3504 12.0 70
## 2 15 8 350 165 3693 11.5 70
## 3 18 8 318 150 3436 11.0 70
## 4 16 8 304 150 3433 12.0 70
## 5 17 8 302 140 3449 10.5 70
## 6 15 8 429 198 4341 10.0 70
Question 2
## `geom_smooth()` using formula 'y ~ x'
Question 3
##
## Call:
## lm(formula = mpg ~ cylinders, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.2607 -3.3841 -0.6478 2.5538 17.9022
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.9493 0.8330 51.56 <2e-16 ***
## cylinders -3.5629 0.1458 -24.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.942 on 396 degrees of freedom
## Multiple R-squared: 0.6012, Adjusted R-squared: 0.6002
## F-statistic: 597.1 on 1 and 396 DF, p-value: < 2.2e-16
With a one unit increase in cylinders, fuel efficiency decreases at a rate of approx 3.56 miles per gallon (mpg).
The coefficient of cylinders is in line with the graphical representation I found in question 2, as the coefficient is negative (-3.56) and the line in the statistical graph/slope is moving downwards.
Question 4
##
## Call:
## lm(formula = mpg ~ cylinders + weight + year, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9727 -2.3180 -0.0755 2.0138 14.3505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.925603 4.037305 -3.449 0.000623 ***
## cylinders -0.087402 0.232075 -0.377 0.706665
## weight -0.006511 0.000459 -14.185 < 2e-16 ***
## year 0.753286 0.049802 15.126 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.438 on 394 degrees of freedom
## Multiple R-squared: 0.8079, Adjusted R-squared: 0.8065
## F-statistic: 552.4 on 3 and 394 DF, p-value: < 2.2e-16
With a one unit increase of cylinder, the fuel efficiency decreases at a rate of approx 0.09 (mpg). With a one pound increase in weight, the fuel efficiency decreases at a rate of approx 0.0065 (mpg).
As the car model year goes up by a year, the fuel efficiency increases at a rate of approx 0.75 (mpg).
The coefficient of cylinders is not statistically significant at the 10% level, and the P-value is 0.7.
Question 5
The difference between the results in question 3 and question 4 is that when more independent variables are added and controlled (i.e. year, weight of vehicle), there is less bias on the dependent variable (outcome). The condition that is necessary for this to occur would be controlling the biasedness of the effect of the cylinder or the “omitted variable bias” on the fuel efficiency. Therefore, When the year and weight of the vehicle are added, the impact of the increase of one unit of cylinders on the fuel efficiency changes the coefficient from approx -3.56 to approx -0.08 which indicates we would have interpreted the impact of cylinders on mpg wrong.
Question 6
##
## Call:
## lm(formula = mpg ~ year, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.024 -5.451 -0.390 4.947 18.200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -69.55560 6.58911 -10.56 <2e-16 ***
## year 1.22445 0.08659 14.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.379 on 396 degrees of freedom
## Multiple R-squared: 0.3356, Adjusted R-squared: 0.3339
## F-statistic: 200 on 1 and 396 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = mpg ~ acceleration, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.007 -5.636 -1.242 4.758 23.192
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.9698 2.0432 2.432 0.0154 *
## acceleration 1.1912 0.1292 9.217 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.101 on 396 degrees of freedom
## Multiple R-squared: 0.1766, Adjusted R-squared: 0.1746
## F-statistic: 84.96 on 1 and 396 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = mpg ~ acceleration + year, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9929 -5.0302 -0.5953 4.7848 18.2562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -68.95578 6.24080 -11.049 < 2e-16 ***
## acceleration 0.78317 0.11482 6.821 3.41e-11 ***
## year 1.05615 0.08563 12.334 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.041 on 395 degrees of freedom
## Multiple R-squared: 0.4056, Adjusted R-squared: 0.4026
## F-statistic: 134.7 on 2 and 395 DF, p-value: < 2.2e-16
As we can see in the first graphical representation, as the year of the car goes up by one year, the fuel efficiency in miles per gallon increases. The coefficient also supports this because the coefficient on year is positive (approx 1.22). Therefore, as the car year goes up by one year, the fuel efficiency improves by 1.22 miles per gallon. I also ran the linear regression for acceleration and year as independent variables, as I believe that as the time to accelerate improves by a second, fuel efficiency in mpg should improve. I also believe that over the years we should see an improvement of acceleration as manufacturers are trying to improve fuel efficiency. Based on the results, we can see that both year and acceleration have positive correlations, indicating that what I believe is true and this could be a way to improve that car manufacturers are trying to improve fuel efficiency over the years.