Exploration: Stopping distance as a function of speed.
As speed increases, so does stopping distance. But by how much?
First we’ll do some data investigation.
attach(cars)
plot(speed, dist, col='blue')
abline(lm(dist~speed, data=cars), col='red', lwd=3)Distributions
Speed looks normal; dist looks gamma.
par(mfrow=c(1,2))
hist(speed, freq=F, col='lightblue', breaks=10)
lines(density(speed), col='red', lwd=3)
hist(dist, freq=F, col='lightblue', breaks=10)
lines(density(dist), col='red', lwd=3)
What is observation 49? Appears to be an outlier.
par(mfrow=c(1,2))
Boxplot(speed, data=cars, col='lightblue')
Boxplot(dist, data=cars, col='lightblue')## [1] 49
Oh yeah. 120 feet to stop from 24 mph? It’s on the long side.
cars[which(cars$speed >20 & cars$speed<30), ]## speed dist
## 44 22 66
## 45 23 54
## 46 24 70
## 47 24 92
## 48 24 93
## 49 24 120
## 50 25 85
describe(cars)[2,-c(1,6,7,10)]## n mean sd median min max skew kurtosis se
## dist 50 42.98 25.77 36 2 120 0.76 0.12 3.64
Lose observation 49 and see the difference.
describe(cars[-49,])[2,-c(1,6,7,10)]## n mean sd median min max skew kurtosis se
## dist 49 41.41 23.49 36 2 93 0.5 -0.64 3.36
Meh. Not much. Leave it in.
Now for that regression model. Speed matters!
For every 1 mph in speed, stopping distance increases 3.9 feet. (That’s 3/4’s of a football field to stop from 60 mph, or 17 car lengths.) Although the model (measured by adjusted R2) only explains about 65 percent of the variation in the data.
Then there is that standard error. It is only 9.46 times smaller than our coefficient for speed, so that’s not great.
But the residual distribution looks good – 1st and 3rd quartiles are roughly the same magnitude, though smaller than the 1.5 times standard error that is recommended.
fit <- lm(dist~speed, data=cars)
par(mfrow=c(2,2))
summary(fit) ##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Here is a plot of the model (same as above, really.)
plot(cars$speed, cars$dist, main='Stopping distance',
col='blue', xlab='Speed (mph)', ylab='Distance (ft)')
abline(fit, col='red', lwd=3)Here’s a look at the residuals. They appear to be evenly distributed.
plot(fit)