Changes in the probability of drawn matches

Introduction

Following on from the analysis of first innings batting totals since 1970, I decided to look at how the probability of obtaining a result in a test match has changed over time. Geoffrey Boycott’s complaint revolves around the inability of modern teams to graft. Matches dominated by grafting tend to end in a draw when a team fails to take ten last innings wickets.

The data table has all four innings but it contains a column with the result of the entire match. So we can extract just the first innings and then look at the probability of a match being drawn, won or lost by the team batting first.

knitr::opts_chunk$set(warning = FALSE,message = FALSE,error = TRUE)

d<-read.csv("match_innings.csv") ## Get this from github
d$Date<-as.Date(d$Date) ## Need to restore the format to date
d$Overs<-as.numeric(as.character(d$Overs)) ## Forgot to do this. Convert to numeric
d$draw<-(d$Result=="draw")*1 ## Change to binary for logistic regression
d$win<-(d$Result=="won")*1
d$lost<-(d$Result=="lost")*1
d<-subset(d,d$Year>1970)
d$decade<-as.factor(cut(d$Year,c(1970,1980,1990,2000,2010,2017)))
levels(d$decade)<-c("seventies","eighties","nineties","noughties","recent")
d<-subset(d,d$Inns==1)
d$txt<-paste(d$Team,d$Opposition,d$Ground,d$Year,d$Result,sep=" ")
library(ggplot2)
library(plotly)

Overall probability of a match being drawn

Using the magic of ggplot and plotly produces a line for a binomial glm (logistic regression) that can be queried for any year by hovering over it. This is much more understandable than a table with the parameters of a logistic regresion, uses exactly the same calculations, and avoids the need for back transformation. Note that for this figure its only sensible to hover over the trend line, rather than the points.

library(mgcv)
g0<-ggplot(d,aes(x=Year,y=draw))
g1<-g0+geom_point()+stat_smooth(method="glm",method.args=list(family="binomial"))
g1<-g1 +theme_bw()
ggplotly(g1)

There is a very clear downward trend. Draws are thankfully becoming much less common than they once were.

I am only using data on the first innings here, so splitting the results by team does not produce the whole picture for the national side, as it excludes the matches in which they batted second. However it is still interesting.

g1<-g1+facet_wrap("Team")
ggplotly(g1)

There is a general downward trend among both strong teams (that tend to win rather than draw) and weaker teams that lose.

Are matches with high first innings totals more likely to be drawn?

A high first innings total suggests that batting conditions may be good for both teams, leading to a higher probability of the game running out of time. Hovering over the points themselves show the actual match result for reference.

g0<-ggplot(d,aes(x=Total,y=draw))
g1<-g0+geom_point(aes(text=txt))+stat_smooth(method="glm",method.args=list(family="binomial"))
g1<-g1 +theme_bw()
ggplotly(g1)

Splitting this up by decade might help show any changes in the general pattern that lead to high scoring games end up drawn.

g1<-g1+facet_wrap("decade")
ggplotly(g1)

So, Geoffrey Boycott really does have a point. The art of grinding out a draw after a team posts a large first innings total may be on the decline. Whether this is a good or bad aspect of the modern game depends on your viewpoint of course.

Probability of a win given a result

I’ll now exclude all the drawn matches from the data frame and only take those with a result either way. We can now ask whether the team batting first is more likely to win matches which do have a result.

d<-subset(d,d$draw!=1)
g0<-ggplot(d,aes(x=Year,y=win))
g1<-g0+geom_point()+stat_smooth(method="glm",method.args=list(family="binomial"))
g1<-g1 +theme_bw()
ggplotly(g1)

So no real trend. So just batting first does not give a team an overall advantage when a match is not drawn, despite the commonly held wisdom that it should. Again this can be split by team. This will show up overall changes in performance rather than any advantage by batting first of course.

g1<-g1+facet_wrap("Team")
ggplotly(g1)

This sadly shows the decline of the West Indies again.

Probability of winning based on total scored

So, batting first does not, on average, give a team an increased chance of winning. However posting a large first innings total (or a large second innings total) clearly should.

g0<-ggplot(d,aes(x=Total,y=win))
g1<-g0+geom_point(aes(text=txt))+stat_smooth(method="glm",method.args=list(family="binomial"))
g1<-g1 +theme_bw()
ggplotly(g1)

The point where a team has an increased chance of the result going in their favour is at around 300.

Again this can be facet wrapped. Hovering over the mid point shows that teams with weaker bowling attacks do tend to need to post higher first innings totals in order to win the match.

g1<-g1+facet_wrap("Team")
ggplotly(g1)