library(ISLR)
## Warning: package 'ISLR' was built under R version 3.6.2
data(Auto)
str(Auto)
## 'data.frame': 392 obs. of 9 variables:
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : num 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : num 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : num 3504 3693 3436 3433 3449 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : num 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : num 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
mod_cars<-lm(mpg~horsepower, data = Auto)
summary(mod_cars)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Is there a relationship between the predictor and the response? Yes. There is moderately strong relationship (this is from the 0.6059 Multiple R-Squared value).
How strong is the relationship between the predictor and the response? The relationship is moderately strong.
Is the relationship between the predictor positive or negative? The relationship is negative because of the negative slope.
What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?
## 95% Confidence Interval
confint(mod_cars)
## 2.5 % 97.5 %
## (Intercept) 38.525212 41.3465103
## horsepower -0.170517 -0.1451725
## Prediction with an MPG of 98
newdata=data.frame(horsepower=98)
predict(mod_cars, newdata)
## 1
## 24.46708
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
names(mod_cars)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
mod_cars$coefficients
## (Intercept) horsepower
## 39.9358610 -0.1578447
ggplot(mod_cars, aes(x=Auto$horsepower, y=Auto$mpg))+
geom_point()+
geom_abline(intercept = mod_cars$coefficients[1], slope = mod_cars$coefficients[2], color="forestgreen")
plot(mod_cars)
From these plots, I can see that the data is slightly skewed to the right. It does not follow a normal distribution. There is no real pattern, though the start of one might be able to be seen. There are also a lot of values that would determine this skew; this is determined from the various values around the Cook’s Distance and the various values that are both an outlier on the x and y axis.