Exploring the cars data set.
data(cars)
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
plot(cars$speed, cars$dist, main = "Stopping Distance vs Speed", xlab = "Speed (mph)", ylab = "Stopping Distance (feet)", pch = 19)
Linear model.
model <- lm(dist ~ speed, data = cars)
summary(model)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Model summary: The estimated coefficient for speed is 3.9324, which means that for each additional mph increase in speed, the stopping distance is expected to increase by about 3.93 feet.The p-values for both the intercept and speed are below the 0.05 threshold, suggesting that the relationship between speed and stopping distance is statistically significant.The R-squared value of 0.6511 means that about 65.11% of the variability in stopping distance is explained by speed.
Visualization.
plot(cars$speed, cars$dist, main = "Stopping Distance vs Speed",
xlab = "Speed (mph)", ylab = "Stopping Distance (feet)",
pch = 19, col = "blue")
abline(model, col = "red")
Residual Analysis.
# Plotting Residuals vs Fitted Values
plot(model$fitted.values, resid(model),
xlab = "Fitted values", ylab = "Residuals",
main = "Residuals vs Fitted", pch = 19)
abline(h = 0, col = "red")
The residuals vs fitted Plot shows how the residuals are distributed across the fitted values. There is some random scatter.
# Normal Q-Q Plot
qqnorm(resid(model))
qqline(resid(model), col = "red")
The normal Q-Q plot helps assess whether the residuals are normally distributed, which is an assumption of linear regression. The points lie roughly along the line, but with some deviations, especially at the ends. This suggests that the residuals are approximately normally distributed, but there may be some outliers or influential points at the right end.
Overall impression of the model: The model seems to have a decent fit, but there are signs that some assumptions of linear regression may not be fully met (potential slight heteroscedasticity suggested by the residuals vs fitted plot; along with slight deviations from normality in the Q-Q plot could be due to outliers). But the model is statistically significant and with a moderately good fit (judging by the R squared).