# Load cars dataset
cars_data <- data(cars)
# Create scatterplot
plot(cars$speed, cars$dist, xlab = "Speed (mph)", ylab = "Stopping distance (ft)")

# Fit linear regression model
model <- lm(dist ~ speed, data = cars)

model
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932

The y-intercept is \(a_0\) is -17.59 and the slope is \(a_1\) = 3.92 We know that linear equation as \[\hat{y} = a_0 + a_1 \cdot x_1\] Thus, the final regression model is: \[\hat{distance} = -17.59 + 3.92\cdot speed\]

# Plot linear regression line
plot(dist ~ speed, data = cars) + 
abline(model)

## integer(0)
#It plots a line on the active plot window, using the slope and intercept of the linear model given in its argument.
# Check summary of model
summary(model)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
plot(fitted(model),resid(model))

The residuals are not uniformly scattered above and be-low zero. Overall, this plot tells us that using the speed as the sole predictor in the regression model does not sufficiently or fully explain the data. This doesn’t mean our data is useless. We need to make tighter residual for better prediction

qqnorm(resid(model))
qqline(resid(model))

as in the graph above we can see that the plots above the line are slightly diverged away from the line.

par(mfrow=c(2,2))
plot(model)

Residuals vs Fitted:

the residuals are dispersing away from 0 when the fitted values increase. It doesn’t mean CAR dataset can’t be used for regression model. I would say it is fairly usable because the most of the residuals are not totally diverged away.

Scale Location: It is the other way to visualize residuals vs fitted values. the only difference is that residuals are standarized meaning it is square rooted.

Residuals vs Levearge: this plot can be used to identify possible outliers. The term “possible outliers” means that there are one or more observations in the dataset that appear to be significantly different from the rest of the observations.However, it’s important to note that the presence of outliers does not necessarily mean that the data is incorrect or that the analysis is invalid. I can understand the meaning of it but would like to understand visually.