N Selina Assignment 11 - Data 605

Noori Selina

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

First we will load the built in dataset

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Next, the data is plotted, the plot shows that as the stopping distance increases, speed increases too.

plot(cars, xlab = "Speed", ylab = "Stopping distance")

Now, let’s construct a linear model based on a single-factor regression. In this model, the independent variable (input) is speed, while the dependent variable (output) is stopping distance.

The model equation takes the form: stopping distance = -17.5791 + 3.9324 * speed.

We will utilize the lm() function in R to perform the regression analysis, and then summarize the results to gain insights into the relationship between speed and stopping distance.

cars.lm <- lm(dist ~ speed, data = cars)
summary(cars.lm)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

After running the linear model

  • The distribution of residuals suggests normality, which is good.
  • The standard error for the speed coefficient is around 9.4 times its value, indicating a solid model fit.
  • The extremely low p-value (1.49e-12) indicates that speed significantly affects stopping distance in the model.
  • With a p-value of 0.0123, the intercept also appears to play a significant role in the model.
  • The multiple R-squared value of 0.6511 suggests that the model accounts for approximately 65.11% of the variation in the data, demonstrating its effectiveness.

Plotting the linear model

plot(cars, xlab = "Speed", ylab = "Stopping distance")
abline(cars.lm)

Plotting the residuals The plot illustrates that the residuals are evenly distributed around zero, showing a uniform scattering both above and below the zero line.

plot(fitted(cars.lm), resid(cars.lm))

Plotting Normal QQ plot The plot reveals some skewness to the right, and the points deviating from the straight line suggest that the data might not be entirely normally distributed.

qqnorm(resid(cars.lm))
qqline(resid(cars.lm))