Problem 2 for Homework #5

Problem 2

Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results. Comment on the output.

library(ISLR)

## Warning: package 'ISLR' was built under R version 3.6.2

data(Auto)
str(Auto)

## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...

mod_cars<-lm(mpg~horsepower, data = Auto)
summary(mod_cars)

## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

Answer the following

Is there a relationship between the predictor and the response? Yes. There is moderately strong relationship (this is from the 0.6059 Multiple R-Squared value).
How strong is the relationship between the predictor and the response? The relationship is moderately strong.
Is the relationship between the predictor positive or negative? The relationship is negative because of the negative slope.
What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?

## 95% Confidence Interval
confint(mod_cars)

##                 2.5 %     97.5 %
## (Intercept) 38.525212 41.3465103
## horsepower  -0.170517 -0.1451725

## Prediction with an MPG of 98
newdata=data.frame(horsepower=98)
predict(mod_cars, newdata)

##        1 
## 24.46708

Plot the response and the predictor. Use the abline() function to display the least squares regression line.

library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0

## -- Conflicts ---------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

names(mod_cars)

##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"

mod_cars$coefficients

## (Intercept)  horsepower 
##  39.9358610  -0.1578447

ggplot(mod_cars, aes(x=Auto$horsepower, y=Auto$mpg))+
  geom_point()+
  geom_abline(intercept = mod_cars$coefficients[1], slope = mod_cars$coefficients[2], color="forestgreen")

Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.

plot(mod_cars)

From these plots, I can see that the data is slightly skewed to the right. It does not follow a normal distribution. There is no real pattern, though the start of one might be able to be seen. There are also a lot of values that would determine this skew; this is determined from the various values around the Cook’s Distance and the various values that are both an outlier on the x and y axis.

Problem 2 for Homework #5

Ben Jaffe

3/9/2020

Problem 2

Answer the following