First we load the ‘cars’ dataset.
Then, we take a look at our data via a histogram.
hist(cars$dist)
hist(cars$speed)
Below, we build our model an look at it’s summary.
linear.model <- lm(cars$dist~cars$speed)
summary(linear.model)
##
## Call:
## lm(formula = cars$dist ~ cars$speed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## cars$speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
plot(x = cars$speed, y = cars$dist)
intercept <- coef(linear.model)[1]
slope <- coef(linear.model)[2]
slope
## cars$speed
## 3.932409
intercept
## (Intercept)
## -17.57909
This shows a simple linear model fitted to our data. In particular: \[ \overline{\text{stopping distance}} = -17.6 + 3.93* mph \]
Next we conduct a residual analysis.
residual <- residuals(linear.model)
residual <- as.data.frame(residual)
hist(residual$residual)
plot(fitted(linear.model), resid(linear.model))
Next, we visualize the qqnorm plot.
qqnorm(resid(linear.model))
qqline(resid(linear.model))
Then we use a Shapiro-Wilk normality test to see if the sample comes froma normally distributed population.
shapiro.test(linear.model$residuals)
##
## Shapiro-Wilk normality test
##
## data: linear.model$residuals
## W = 0.94509, p-value = 0.02152
Because the p value is .02% we can reject the null hypothesis that the sample came from a normal distribution.