Attach and display the built-in data set

attach(cars)
str(cars)
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
dim(cars)
## [1] 50  2
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

The built-in data set contains 50 observations containing speed and stopping distance.

Visualize and fit a regression line

fit <- lm(dist ~ speed, data = cars)
plot(speed, dist)
abline(fit)

fit
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932

Linear Regression Model

Based on the model, the y-intercept is -17.579 and slope is 3.932

stoppingdistance = -17.579 + 3.932 * speed

Evaluating the Quality of the Model

summary(fit)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

With a low p-value of 1.49e-12, there is a great probability that speed is relevant or signifant in the model. The reported R-squared of 0.6511 for this model means that the model explains 65.11 percent of the data’s variation.

Residual Analysis

Residual vs Fitted

The plot seems to show residuals around the horizontal line without distinct patterns.

plot(fitted(fit),resid(fit))

Normal Q-Q

The plot seems not too concerning as most data points are near the line.

qqnorm(resid(fit))
qqline(resid(fit))