I. Data Visualization
plot(cars[,"speed"],cars[,"dist"], main="Stopping Distance as a Function of Speed", xlab="Speed", ylab="Stopping Distance")

The plot shows that as car speed increases, the stopping distance also inreases as expected.
A regression model will help us quantify this relationship.
II. The Linear Model Function
#attach(cars)
cars.lm <- lm(dist ~ speed, data=cars)
cars.lm
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Coefficients:
## (Intercept) speed
## -17.579 3.932
stopping distance = -17.579 + 3.932 speed
attach(cars)
plot(speed, dist)
abline(cars.lm)

III. Evaluating the Quality of the Model
summary(cars.lm)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
The median of residual is close to zero (-2.272); minimum and maximum resdual values are not far apart (given that our range for distance is 0 to 125); 1st and 3rd quartiles are almost identical implying residual values that are normally distributed.
The summary also shows the standard error for the estimated value of the intercept and slope. The SE for the intercept and slope are 6.7584 and 0.4155 respectively which is a little bit high for the intercept (we want it to be at least 5 times less than the estimated value) but good for the slope (at about 10 times less than the estimated value). The significance \(p\) value are both low (0.0123 for the intercept good at “*“”, and 1.49e-12 for the slope which is excellent at “***“”). This means that the independent variable (speed) is relevant to this model.
Finally a Multiple R-squared value of 0.6511 and Adjusted R-squared of 0.6438 tells us that the model explains about 65% of the data’s variation. This means that our model is a good fit but not an excellent fit for the data provided.
IV. Residual Analysis
plot(fitted(cars.lm), resid(cars.lm))

comparing residual values versus the actual observed data for the speed vs stopping distance model, we see that our moduel has a tendecy to underestimate the actual value for the data. I would have to say though that it is a good model with a lot of the data clutterd near the zero line. There are however a few positive outliers (~+40 residual values) towards as the value for (speed) increases - which means that for lower speeds, the model can better predit the stopping distance than with higher speeds.
qqnorm(resid(cars.lm))
qqline(resid(cars.lm))

Another use of residuals is to generate a quantile to quantile plot of sample data quantiles against theoretical quantiles (quantile values as predicted by the model). As we can see from the graph, for the lower quantiles and the 1st positive quantile, samples are closely lined-up to the theoretical qqline. This signifies a normal distribution of the observed data. We can see a divergence though towards the higher positive quantiles.
As with the residual analysis, the quantile to quantile (Q-Q plot) analysis also shows that the model is an excellent representation of the observed data execpt for larger values of the observed data.
Thus for speeds less than 20 (75th quantile), the model is an excellent predictor of stopping distance.