Our data show the number of crashes, injuries, and deaths where the police listed ‘racing’ as one of the contributing causes of the crash. They were released by the office of Judith Collins, Police Minister, to confirm that a 2009 law change allowing crushing of the offenders’ cars was effective. We’re going to see how much the data support the claim.
crashdata<-read.csv("~/racing-crashes.csv")
head(crashdata)
## year deaths injured crashes
## 1 2001 7 77 70
## 2 2002 3 88 85
## 3 2003 9 79 72
## 4 2004 8 71 77
## 5 2005 6 97 97
## 6 2006 9 87 92
First, we want to look at graphs.
Last time I looked at crashes. This time I’ll look at injuries.
crashdata$pre9 <- with(crashdata, pmin(2009, year))
crashdata$post9 <- with(crashdata, pmax(2009, year))
crashdata$pre7 <- with(crashdata, pmin(2007, year))
crashdata$post7 <- with(crashdata, pmax(2007, year))
change7model <- lm(injured~pre7+post7,data=crashdata)
change9model <- lm(injured~pre9+post9,data=crashdata)
plot(injured~year,data=crashdata,type="o",col="blue",ylim=c(0,120))
lines(2001:2014,fitted(change7model),lty=3,lwd=2)
lines(2001:2014,fitted(change9model), col="orange",lty=3,lwd=2)
summary(change7model)
##
## Call:
## lm(formula = injured ~ pre7 + post7, data = crashdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.877 -4.256 -2.234 6.174 15.430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21166.5595 2646.0110 7.999 6.54e-06 ***
## pre7 0.1558 1.5668 0.099 0.923
## post7 -10.6613 1.3187 -8.085 5.91e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.722 on 11 degrees of freedom
## Multiple R-squared: 0.9005, Adjusted R-squared: 0.8824
## F-statistic: 49.78 on 2 and 11 DF, p-value: 3.076e-06
summary(change9model)
##
## Call:
## lm(formula = injured ~ pre9 + post9, data = crashdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.322 -8.403 -4.276 10.293 20.360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28539.757 4267.261 6.688 3.43e-05 ***
## pre9 -3.159 1.548 -2.041 0.06604 .
## post9 -11.015 2.619 -4.206 0.00147 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.25 on 11 degrees of freedom
## Multiple R-squared: 0.8153, Adjusted R-squared: 0.7817
## F-statistic: 24.28 on 2 and 11 DF, p-value: 9.235e-05
We’d like to have a formal assessment of which model is better. This is a little tricky, because of the ‘corner’ in the model and because we have to account for choosing 2007 based on the graph. Treating the choice of year as a continuous parameter is (slightly) conservative, so we do that:
deviance(change9model)-deviance(change7model)
## [1] 890.1547
test<-(deviance(change9model)-deviance(change7model))/summary(change7model)$sigma^2
pf(test, 1, change9model$df.resid-1,lower=FALSE)
## [1] 0.01186402
This is the same conclusion as for crashes, though slightly weaker evidence. The data fit a decrease starting in 2007 better than one starting in late 2008 or 2009.