DATA605_HW11_Simple Regression Analysis
library(tidyverse)Question
Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
Solution
glimpse(cars)## Rows: 50
## Columns: 2
## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13~
## $ dist <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34~
Dataset cars includes 50 observations with 2 variables - speed and dist(distance) as shown in the glimpse above
Visualization
plot(cars$speed,cars$dist, main="Distance vs Speed",
xlab="Speed", ylab="Dist")The Linear Model Function
The simplest regression model is a straightline. It has the mathematical form:
\(ŷ = b_{0} + b_{1}x\)
where:
\(x\) is the input to the system,
\(b_{0}\) is the y-intercept of the line,
\(b_{1}\) is the slope, and
\(ŷ\) is the output value the model predicts. The ^ indicates a predicted or estimated value, not the actual observed value.
cars_lm <- lm(cars$dist ~ cars$speed)
cars_lm##
## Call:
## lm(formula = cars$dist ~ cars$speed)
##
## Coefficients:
## (Intercept) cars$speed
## -17.579 3.932
In this case, the y-intercept is b_{0} = -17.579 and the slope is b_{1} = 3.932. Thus, the final regression model is:
\(y = -17.579 + 3.932x\)
plot(cars$speed, cars$dist, xlab='Speed', ylab='Distance', main='Cars Linear Regression Model')
abline(cars_lm)summary(cars_lm)##
## Call:
## lm(formula = cars$dist ~ cars$speed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## cars$speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Residual Analysis
plot(fitted(cars_lm),resid(cars_lm), main="Residuals")
abline(0, 0)qqnorm(resid(cars_lm))
qqline(resid(cars_lm))par(mfrow=c(2,2))
plot(cars_lm)From the residual vs fitted value plot we can see the there is no definitepattern in the data hence the data randomness of the residuals and heteroscidatcity is satisfied.
From the normal q-q plot we can see that the residuals are Approximately normally distributed.
From the overall analysis we can say that the model is a well fitted model since the assumptions of the linear regression model are satisfied here.