# Load cars dataset
cars_data <- data(cars)
# Create scatterplot
plot(cars$speed, cars$dist, xlab = "Speed (mph)", ylab = "Stopping distance (ft)")
# Fit linear regression model
model <- lm(dist ~ speed, data = cars)
model
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Coefficients:
## (Intercept) speed
## -17.579 3.932
The y-intercept is \(a_0\) is -17.59 and the slope is \(a_1\) = 3.92 We know that linear equation as \[\hat{y} = a_0 + a_1 \cdot x_1\] Thus, the final regression model is: \[\hat{distance} = -17.59 + 3.92\cdot speed\]
# Plot linear regression line
plot(dist ~ speed, data = cars) +
abline(model)
## integer(0)
#It plots a line on the active plot window, using the slope and intercept of the linear model given in its argument.
# Check summary of model
summary(model)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
plot(fitted(model),resid(model))
The residuals are not uniformly scattered above and be-low zero.
Overall, this plot tells us that using the speed as the sole predictor
in the regression model does not sufficiently or fully explain the data.
This doesn’t mean our data is useless. We need to make tighter residual
for better prediction
qqnorm(resid(model))
qqline(resid(model))
as in the graph above we can see that the plots above the line are slightly diverged away from the line.
par(mfrow=c(2,2))
plot(model)
Residuals vs Fitted:
the residuals are dispersing away from 0 when the fitted values increase. It doesn’t mean CAR dataset can’t be used for regression model. I would say it is fairly usable because the most of the residuals are not totally diverged away.
Scale Location: It is the other way to visualize residuals vs fitted values. the only difference is that residuals are standarized meaning it is square rooted.
Residuals vs Levearge: this plot can be used to identify possible outliers. The term “possible outliers” means that there are one or more observations in the dataset that appear to be significantly different from the rest of the observations.However, it’s important to note that the presence of outliers does not necessarily mean that the data is incorrect or that the analysis is invalid. I can understand the meaning of it but would like to understand visually.