library(broom)
library(MASS)
library(ISLR2)
##
## Attaching package: 'ISLR2'
## The following object is masked from 'package:MASS':
##
## Boston
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
## ✔ dials 1.2.0 ✔ rsample 1.2.0
## ✔ dplyr 1.1.3 ✔ tibble 3.2.1
## ✔ ggplot2 3.5.0 ✔ tidyr 1.3.1
## ✔ infer 1.0.5 ✔ tune 1.1.2
## ✔ modeldata 1.2.0 ✔ workflows 1.1.3
## ✔ parsnip 1.1.1 ✔ workflowsets 1.0.1
## ✔ purrr 1.0.2 ✔ yardstick 1.2.0
## ✔ recipes 1.0.8
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
## ✖ recipes::step() masks stats::step()
## • Use suppressPackageStartupMessages() to eliminate package startup messages
library(ggplot2)
mpg
as the
response and horsepower
as the predictor. Comment on the
outputdata("Auto")
head(Auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
lm_model <- linear_reg() %>%
set_engine('lm') %>%
set_mode('regression')
lm_fit <- lm_model %>%
fit(mpg ~ horsepower, data = Auto)
tidy(lm_fit)
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 39.9 0.717 55.7 1.22e-187
## 2 horsepower -0.158 0.00645 -24.5 7.03e- 81
glance(lm_fit)
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.606 0.605 4.91 600. 7.03e-81 1 -1179. 2363. 2375.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
- The p-value of the coefficient of the variable horsepower being approximately 0 (<2e-16) indicates that the varible is statistically significant in predicting mpg, which implies there exists a relationship between mpg and horsepower
- The relationship between the variable mpg and the variable horsepower is negative, indicated by the negative value of the coefficient of horsepower (-0.157845)
- The R-squared value of 0.6059 indicates that 60.59% of the variability in the variable mpg can be explained by the variable horsepower
ggplot(Auto, aes(x = horsepower, y = mpg)) +
geom_point(color = "purple") + # Add scatterplot points
geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 1) + # Add regression line
labs(title = "Scatterplot of mpg vs horsepower with Regression Line",
x = "Horsepower",
y = "Miles per Gallon") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
library(ggfortify)
## Registered S3 method overwritten by 'ggfortify':
## method from
## autoplot.glmnet parsnip
autoplot(lm_fit, which = 1:4) +
labs(title = "Diagnostic Plots of Least Squares Regression Fit")
> > Looking at the Residuals vs. Fitted plot, there is a clear
U-shape to the residuals, which is a strong indicator of non-linearity
in the data. This, when combined with plot we got in 8(b), we can say
that the simple linear regression model is not a good fit. The second
plot shows that the residuals are normally distributed. The third plot
shows that the variance of the errors is constant. Finally, the plot of
standardized residuals versus leverage indicates the presence of a few
outliers (higher than 2 or lower than -2) and a few high leverage
points.