Problem 2

  1. Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results. Comment on the output.
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.6.2
data(Auto)
str(Auto)
## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
mod_cars<-lm(mpg~horsepower, data = Auto)
summary(mod_cars)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

Answer the following

  • Is there a relationship between the predictor and the response? Yes. There is moderately strong relationship (this is from the 0.6059 Multiple R-Squared value).

  • How strong is the relationship between the predictor and the response? The relationship is moderately strong.

  • Is the relationship between the predictor positive or negative? The relationship is negative because of the negative slope.

  • What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?

## 95% Confidence Interval
confint(mod_cars)
##                 2.5 %     97.5 %
## (Intercept) 38.525212 41.3465103
## horsepower  -0.170517 -0.1451725
## Prediction with an MPG of 98
newdata=data.frame(horsepower=98)
predict(mod_cars, newdata)
##        1 
## 24.46708
  1. Plot the response and the predictor. Use the abline() function to display the least squares regression line.
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
names(mod_cars)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
mod_cars$coefficients
## (Intercept)  horsepower 
##  39.9358610  -0.1578447
ggplot(mod_cars, aes(x=Auto$horsepower, y=Auto$mpg))+
  geom_point()+
  geom_abline(intercept = mod_cars$coefficients[1], slope = mod_cars$coefficients[2], color="forestgreen")

  1. Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.
plot(mod_cars)

From these plots, I can see that the data is slightly skewed to the right. It does not follow a normal distribution. There is no real pattern, though the start of one might be able to be seen. There are also a lot of values that would determine this skew; this is determined from the various values around the Cook’s Distance and the various values that are both an outlier on the x and y axis.