This research aims to construct a linear model to predict stopping distance based on speed using the “cars” dataset available in R. The methodology follows a structured approach comprising exploratory data analysis, model building, evaluation of model quality, and residual analysis, as outlined in Chapter 3 of the textbook.
The “cars” dataset, a built-in dataset in R, is utilized for this analysis. The dataset consists of two variables: speed (in miles per hour) and stopping distance (in feet).
Histograms reveal that both speed and stopping distance exhibit
non-normal distributions. However, the scatterplot illustrates a linear
relationship between the two variables.
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
The linear regression model indicates that speed is a statistically significant predictor of stopping distance \((p < 0.05)\).
The model overall demonstrates statistical significance, with a
significant portion of variance in stopping distance explained by speed
\((R-squared = 0.6511)\). The
coefficient for speed suggests that for each 1 mph increase, stopping
distance is expected to increase by 3.9324 feet.
cars %>%
ggplot(aes(speed, dist)) +
geom_point() +
geom_smooth(formula = 'y ~ x', method = 'lm', se = TRUE)
Scatterplot overlays the regression line, indicating a reasonable fit of
predicted values to actual values.
Fitted versus residuals plot suggests no evidence of heteroscedasticity,
and residuals appear randomly distributed around zero. The QQ plot
indicates relatively normal distribution of residuals, with some
outliers in the right tail. The leverage plot shows that single
observations do not exert undue influence on parameter estimates.
The linear regression model provides a satisfactory fit for predicting stopping distance based on speed. Despite non-normal distributions of variables, the model meets key assumptions of linear regression. Overall, the analysis validates the effectiveness of speed as a predictor and underscores the importance of proper model evaluation and residual analysis in regression modeling.