Our data show the number of crashes, injuries, and deaths where the police listed ‘racing’ as one of the contributing causes of the crash. They were released by the office of Judith Collins, Police Minister, to confirm that a 2009 law change allowing crushing of the offenders’ cars was effective. We’re going to see how much the data support the claim.

crashdata<-read.csv("~/racing-crashes.csv")
head(crashdata)
##   year deaths injured crashes
## 1 2001      7      77      70
## 2 2002      3      88      85
## 3 2003      9      79      72
## 4 2004      8      71      77
## 5 2005      6      97      97
## 6 2006      9      87      92

I’ve done a traditional statistical analysis. This time I’ll try a visual line-up, as suggested by Di Cook and co-workers. It’s not quite the usual visual line-up, since the problem has a lot more imposed structure than data sets where ‘any structure’ is the question.

First, I’ll fit a model that really has a change in 2009

crashdata$pre9 <- with(crashdata, pmin(2009, year))
crashdata$post9 <- with(crashdata, pmax(2009, year))

change9model <- lm(crashes~pre9+post9,data=crashdata)

Now I can simulate data from the model:

make_data<-function(){
  newdata<-fitted(change9model)+sample(resid(change9model),replace=TRUE) 
  round(newdata)
}
make_data()
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 
## 80 72 66 80 72 67 98 88 94 76 58 45 23  9

Put together one real data set and 19 simulated ones

simulated<-replicate(19,make_data())
real<-crashdata$crashes
secret<-sample(1:20,1)
combined<-matrix(nrow=length(real),ncol=20)
combined[,secret]<-real
combined[,-secret]<-simulated

par(mfrow=c(5,4),mar=c(2,2,1,1))
for(i in 1:20){
  plot(2001:2014,combined[,i],type="o",ylim=range(0,combined))
  abline(v=2009,col="blue",lty=3)
}

For comparison, things are (I think) harder to see with the 2007 changepoint.

crashdata$pre7 <- with(crashdata, pmin(2007, year))
crashdata$post7 <- with(crashdata, pmax(2007, year))
change7model <- lm(crashes~pre7+post7,data=crashdata)

make_data7<-function(){
  newdata<-fitted(change7model)+sample(resid(change7model),replace=TRUE) 
  round(newdata)
}


simulated<-replicate(19,make_data7())
real<-crashdata$crashes
secret7<-sample(1:20,1)
combined<-matrix(nrow=length(real),ncol=20)
combined[,secret7]<-real
combined[,-secret7]<-simulated

par(mfrow=c(5,4),mar=c(2,2,1,1))
for(i in 1:20){
  plot(2001:2014,combined[,i],type="o",ylim=range(0,combined))
    abline(v=2009,col="blue",lty=3)
}

I’d really need to use different people to look at the second line-up, because it has the same data. However, the bias goes against the way I think the data are, so it should be relatively safe.

The true data was 12 for the first graphs and 15 for the second, counting along rows.