Our data show the number of crashes, injuries, and deaths where the police listed ‘racing’ as one of the contributing causes of the crash. They were released by the office of Judith Collins, Police Minister, to confirm that a 2009 law change allowing crushing of the offenders’ cars was effective. We’re going to see how much the data support the claim.

crashdata<-read.csv("~/racing-crashes.csv")
head(crashdata)
##   year deaths injured crashes
## 1 2001      7      77      70
## 2 2002      3      88      85
## 3 2003      9      79      72
## 4 2004      8      71      77
## 5 2005      6      97      97
## 6 2006      9      87      92

First, we want to look at graphs.

Last time I looked at crashes. This time I’ll look at injuries.

crashdata$pre9 <- with(crashdata, pmin(2009, year))
crashdata$post9 <- with(crashdata, pmax(2009, year))
crashdata$pre7 <- with(crashdata, pmin(2007, year))
crashdata$post7 <- with(crashdata, pmax(2007, year))

change7model <- lm(injured~pre7+post7,data=crashdata)
change9model <- lm(injured~pre9+post9,data=crashdata)
plot(injured~year,data=crashdata,type="o",col="blue",ylim=c(0,120))
lines(2001:2014,fitted(change7model),lty=3,lwd=2)
lines(2001:2014,fitted(change9model), col="orange",lty=3,lwd=2)

summary(change7model)
## 
## Call:
## lm(formula = injured ~ pre7 + post7, data = crashdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.877  -4.256  -2.234   6.174  15.430 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 21166.5595  2646.0110   7.999 6.54e-06 ***
## pre7            0.1558     1.5668   0.099    0.923    
## post7         -10.6613     1.3187  -8.085 5.91e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.722 on 11 degrees of freedom
## Multiple R-squared:  0.9005, Adjusted R-squared:  0.8824 
## F-statistic: 49.78 on 2 and 11 DF,  p-value: 3.076e-06
summary(change9model)
## 
## Call:
## lm(formula = injured ~ pre9 + post9, data = crashdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.322  -8.403  -4.276  10.293  20.360 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28539.757   4267.261   6.688 3.43e-05 ***
## pre9           -3.159      1.548  -2.041  0.06604 .  
## post9         -11.015      2.619  -4.206  0.00147 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.25 on 11 degrees of freedom
## Multiple R-squared:  0.8153, Adjusted R-squared:  0.7817 
## F-statistic: 24.28 on 2 and 11 DF,  p-value: 9.235e-05

We’d like to have a formal assessment of which model is better. This is a little tricky, because of the ‘corner’ in the model and because we have to account for choosing 2007 based on the graph. Treating the choice of year as a continuous parameter is (slightly) conservative, so we do that:

deviance(change9model)-deviance(change7model)
## [1] 890.1547
test<-(deviance(change9model)-deviance(change7model))/summary(change7model)$sigma^2
pf(test, 1, change9model$df.resid-1,lower=FALSE)
## [1] 0.01186402

This is the same conclusion as for crashes, though slightly weaker evidence. The data fit a decrease starting in 2007 better than one starting in late 2008 or 2009.