Our data show the number of crashes, injuries, and deaths where the police listed ‘racing’ as one of the contributing causes of the crash. They were released by the office of Judith Collins, Police Minister, to confirm that a 2009 law change allowing crushing of the offenders’ cars was effective. We’re going to see how much the data support the claim.

crashdata<-read.csv("~/racing-crashes.csv")
head(crashdata)
##   year deaths injured crashes
## 1 2001      7      77      70
## 2 2002      3      88      85
## 3 2003      9      79      72
## 4 2004      8      71      77
## 5 2005      6      97      97
## 6 2006      9      87      92

First, we want to look at graphs.

The law changed in 2009, so we could fit a model to pre-2009 and post-2009

plot(crashes~year,data=crashdata,type="o",col="blue",ylim=c(0,120))
pre_model <- lm(crashes~year, data=subset(crashdata, year<=2009))
post_model <- lm(crashes~year, data=subset(crashdata, year>=2009))
lines(2001:2009,fitted(pre_model),lty=2)
lines(2009:2014,fitted(post_model),lty=2)

It looks like a better fit has a change in 2007

plot(crashes~year,data=crashdata,type="o",col="blue",ylim=c(0,120))
pre7_model <- lm(crashes~year, data=subset(crashdata, year<=2007))
post7_model <- lm(crashes~year, data=subset(crashdata, year>=2007))
lines(2001:2007,fitted(pre7_model),lty=2,col="orange",lwd=2)
lines(2007:2014,fitted(post7_model),lty=2, col="orange",lwd=2)

We can force the lines to meet up and use a single model for each one.

crashdata$pre9 <- with(crashdata, pmin(2009, year))
crashdata$post9 <- with(crashdata, pmax(2009, year))
crashdata$pre7 <- with(crashdata, pmin(2007, year))
crashdata$post7 <- with(crashdata, pmax(2007, year))

change7model <- lm(crashes~pre7+post7,data=crashdata)
change9model <- lm(crashes~pre9+post9,data=crashdata)
plot(crashes~year,data=crashdata,type="o",col="blue",ylim=c(0,120))
lines(2001:2014,fitted(change7model),lty=3,lwd=2)
lines(2001:2014,fitted(change9model), col="orange",lty=3,lwd=2)

summary(change7model)
## 
## Call:
## lm(formula = crashes ~ pre7 + post7, data = crashdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.234  -7.526  -1.532   6.886  16.831 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13105.571   2651.317   4.943  0.00044 ***
## pre7            4.753      1.570   3.028  0.01150 *  
## post7         -11.234      1.321  -8.502 3.65e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.742 on 11 degrees of freedom
## Multiple R-squared:  0.8772, Adjusted R-squared:  0.8549 
## F-statistic:  39.3 on 2 and 11 DF,  p-value: 9.768e-06
summary(change9model)
## 
## Call:
## lm(formula = crashes ~ pre9 + post9, data = crashdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.791  -9.769  -1.586   7.448  32.169 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 25237.6409  4736.2179   5.329 0.000242 ***
## pre9            0.5145     1.7181   0.299 0.770163    
## post9         -13.0346     2.9069  -4.484 0.000925 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.7 on 11 degrees of freedom
## Multiple R-squared:  0.7204, Adjusted R-squared:  0.6696 
## F-statistic: 14.17 on 2 and 11 DF,  p-value: 0.0009031

We’d like to have a formal assessment of which model is better. This is a little tricky, because of the ‘corner’ in the model and because we have to account for choosing 2007 based on the graph. Treating the choice of year as a continuous parameter is (slightly) conservative, so we do that:

deviance(change9model)-deviance(change7model)
## [1] 1333.456
test<-(deviance(change9model)-deviance(change7model))/summary(change7model)$sigma^2
pf(test, 1, change9model$df.resid-1,lower=FALSE)
## [1] 0.003792894

We’d reach the same conclusion looking at injuries rather than crashes; there, fortunately, aren’t enough deaths to say much about statistically.

In a more complete analysis we’d also want to compare the trends to overall trends in road crashes (which turns out not to change the conclusions) and perhaps to some other sorts of crime. However, while the data do show a drop in crashes due to racing, it seems to have happened a couple of years before the law change. The data do not provide much support for the law change.