Linear regression with the Cars dataset.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(broom)
## Warning: package 'broom' was built under R version 3.5.1
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.2
cars_df <- cars
cars_df %>% head() %>% kable()
| speed | dist |
|---|---|
| 4 | 2 |
| 4 | 10 |
| 7 | 4 |
| 7 | 22 |
| 8 | 16 |
| 9 | 10 |
cars_df %>%
ggplot(aes(x=speed, y=dist)) +
geom_point() +
geom_smooth(method="lm", se=T) +
labs(x="Speed", y="Stopping Distance", title="Cars Dataset")
lm_fit <- lm(dist ~ speed, data=cars_df)
summary(lm_fit)
##
## Call:
## lm(formula = dist ~ speed, data = cars_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
The R-squared is 0.65.
lm_df <- augment(lm_fit)
lm_df %>%
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0)
The residual plot looks decent. There are not obvious signs of non-linearity, though perhaps a bit of heteroscedasticity.