DATA605_HW11_Simple Regression Analysis

library(tidyverse)

Question

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

Solution

glimpse(cars)
## Rows: 50
## Columns: 2
## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13~
## $ dist  <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34~

Dataset cars includes 50 observations with 2 variables - speed and dist(distance) as shown in the glimpse above

Visualization

plot(cars$speed,cars$dist, main="Distance vs Speed",
xlab="Speed", ylab="Dist")

The Linear Model Function

The simplest regression model is a straightline. It has the mathematical form:

\(ŷ = b_{0} + b_{1}x\)

where:

\(x\) is the input to the system,

\(b_{0}\) is the y-intercept of the line,

\(b_{1}\) is the slope, and

\(ŷ\) is the output value the model predicts. The ^ indicates a predicted or estimated value, not the actual observed value.

cars_lm <- lm(cars$dist ~ cars$speed)
cars_lm
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Coefficients:
## (Intercept)   cars$speed  
##     -17.579        3.932

In this case, the y-intercept is b_{0} = -17.579 and the slope is b_{1} = 3.932. Thus, the final regression model is:

\(y = -17.579 + 3.932x\)

plot(cars$speed, cars$dist, xlab='Speed', ylab='Distance', main='Cars Linear Regression Model')
abline(cars_lm)

summary(cars_lm)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Residual Analysis

plot(fitted(cars_lm),resid(cars_lm), main="Residuals")
abline(0, 0)

qqnorm(resid(cars_lm))
qqline(resid(cars_lm))

par(mfrow=c(2,2))
plot(cars_lm)

From the residual vs fitted value plot we can see the there is no definitepattern in the data hence the data randomness of the residuals and heteroscidatcity is satisfied.

From the normal q-q plot we can see that the residuals are Approximately normally distributed.

From the overall analysis we can say that the model is a well fitted model since the assumptions of the linear regression model are satisfied here.