Geofrey Boycott constantly laments the inability of modern batsmen to build innings in his model After listening to another of his discourses I decided to have a look at the statistics myself. I placed some code to download the data here
http://rpubs.com/dgolicher/cricket_download
A Github pull will bring down the resulting csv files needed to reproduce my analysis
To answer the question I decided to focus only on the first innings of every match played since 1970. The reasoning behind this is that many other factors come into play in later innings as the game develops. However there is a fairly consistent and uniform aim in the first innings of every game, which is simply to make as many runs as the conditions and quality of the bowling allow. Very few first innings are declared or cut short by the weather. So this produces a good baseline for comparisons
knitr::opts_chunk$set(warning = FALSE,message = FALSE,error = TRUE)
d<-read.csv("match_innings.csv") ## Get this from github
d$Date<-as.Date(d$Date) ## Need to restore the format to date
d$Overs<-as.numeric(as.character(d$Overs)) ## Forgot to do this. Convert to numeric
d$txt<-paste(d$Team,d$Opposition,d$Ground,sep=" ")
dd<-subset(d,d$Inns==1) ## Just take first innings
dd<-subset(dd,dd$Year>1970) ## Since 1970
Has there been any change in mean total score for the first innings? Plotting the data and fitting a spline would show up any trend within the noise.
library(plotly)
theme_set(theme_bw())
g0<-ggplot(dd,aes(x=Date,y=Total))
g1<-g0+geom_point(aes(text=txt))+geom_smooth()
ggplotly(g1)
This is a plotly figure so the details can be investigated by hovering over the line or points.
The mean for first innings total hit a low of 314 in 1980 and dipped again at the end of 1990’s but is now higher than it is has ever been, at 369.
It might be argued that this could be attributable to changes in which teams are playing on which grounds. Looking at the individual patterns for each team.
g1<-g1+facet_wrap("Team")
ggplotly(g1)
The trend is consistent for most teams, although the sad decline of the West Indian team since the 70s is evident. Bangladesh have improved greatly. Zimbabwe is something of a special case. South Africa were banned from test cricket at the beginning of the period. So it does look to the eye as if there is a general trend. A statistical model of this could take team and ground as random effects in order to hold for mean performances by each team and on each ground and test for a fixed effect of time (linear trend since 1970)
library(lmerTest)
mod<-lmer(data=dd,Total~Year+(1|Team)+(1|Ground))
summary(mod)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula: Total ~ Year + (1 | Team) + (1 | Ground)
## Data: dd
##
## REML criterion at convergence: 20096.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.58892 -0.75224 -0.09537 0.71532 3.05231
##
## Random effects:
## Groups Name Variance Std.Dev.
## Ground (Intercept) 410.1 20.25
## Team (Intercept) 1684.7 41.05
## Residual 17134.1 130.90
## Number of obs: 1593, groups: Ground, 93; Team, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -2181.3099 556.0322 1423.9000 -3.923 9.16e-05 ***
## Year 1.2537 0.2781 1428.2000 4.509 7.06e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Year -1.000
This shows a significant linear trend of an increase in mean first innings scores of 1.25 runs per year since 1970 after taking into account difference between the teams playing.
Fitting the model with an interaction between year and team would detect which teams did not follow the trend.
mod<-lmer(data=dd,Total~Year*Team+(1|Ground))
summary(mod)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula: Total ~ Year * Team + (1 | Ground)
## Data: dd
##
## REML criterion at convergence: 19951.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6682 -0.7393 -0.0797 0.6959 3.2315
##
## Random effects:
## Groups Name Variance Std.Dev.
## Ground (Intercept) 452.4 21.27
## Residual 16894.7 129.98
## Number of obs: 1593, groups: Ground, 93
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -3.293e+03 1.216e+03 1.566e+03 -2.707 0.00685 **
## Year 1.836e+00 6.095e-01 1.565e+03 3.012 0.00264 **
## TeamBangladesh -1.736e+04 7.255e+03 1.572e+03 -2.393 0.01683 *
## TeamEngland -1.141e+01 1.676e+03 1.550e+03 -0.007 0.99457
## TeamIndia 9.730e+01 1.845e+03 1.569e+03 0.053 0.95796
## TeamNew Zealand -1.443e+03 2.168e+03 1.558e+03 -0.666 0.50577
## TeamPakistan 4.922e+03 1.980e+03 1.565e+03 2.486 0.01301 *
## TeamSouth Africa 2.243e+03 3.563e+03 1.560e+03 0.630 0.52908
## TeamSri Lanka -1.905e+03 2.650e+03 1.562e+03 -0.719 0.47224
## TeamWest Indies 5.828e+03 1.979e+03 1.560e+03 2.946 0.00327 **
## TeamZimbabwe 4.816e+03 5.814e+03 1.553e+03 0.828 0.40758
## Year:TeamBangladesh 8.575e+00 3.616e+00 1.572e+03 2.371 0.01784 *
## Year:TeamEngland -1.413e-02 8.401e-01 1.551e+03 -0.017 0.98658
## Year:TeamIndia -5.870e-02 9.247e-01 1.568e+03 -0.063 0.94939
## Year:TeamNew Zealand 6.968e-01 1.085e+00 1.559e+03 0.642 0.52094
## Year:TeamPakistan -2.486e+00 9.924e-01 1.565e+03 -2.506 0.01233 *
## Year:TeamSouth Africa -1.125e+00 1.778e+00 1.560e+03 -0.633 0.52694
## Year:TeamSri Lanka 9.201e-01 1.325e+00 1.562e+03 0.695 0.48741
## Year:TeamWest Indies -2.942e+00 9.919e-01 1.560e+03 -2.966 0.00306 **
## Year:TeamZimbabwe -2.465e+00 2.904e+00 1.553e+03 -0.849 0.39615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A careful inspection of the table shows that the formal statistics based on the assumption of linear trends (rather than the curvature of the fitted splines), confirm the statistical significance of the observations. The West Indies and India have significantly higher first innings totals that the reference team used in the model output table (Australia) over the whole period, but the decine is shown in the interaction term.
It could be that first innings scores have increased over time simply because they are taking longer in terms of overs to compelete. However we can also look at the strike rate over the innings in terms of runs scored per over. Has there been any change in mean scoring rate for the first innings expressed as runs per over?
g0<-ggplot(dd,aes(x=Date,y=RPO))
g1<-g0+geom_point(aes(text=txt))+geom_smooth()
ggplotly(g1)
There is a clear ramping up of the strike rate that seems to correspond to the increasing influence of twenty20 games in the mid 2000’s. Strike rate moves up from a fairly consistent 2.7 to 2.9 runs per over to a mean of 3.1 to 3.4 in the first innings of modern (post 2020) tests. Again a facet wrap can look at this per team
g1<-g1+facet_wrap("Team")
ggplotly(g1)
mod<-lmer(data=dd,RPO~Year+(1|Team)+(1|Ground))
summary(mod)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula: RPO ~ Year + (1 | Team) + (1 | Ground)
## Data: dd
##
## REML criterion at convergence: 2624.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9129 -0.6695 -0.0774 0.6149 4.2802
##
## Random effects:
## Groups Name Variance Std.Dev.
## Ground (Intercept) 0.005319 0.07293
## Team (Intercept) 0.028422 0.16859
## Residual 0.292370 0.54071
## Number of obs: 1593, groups: Ground, 93; Team, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -2.914e+01 2.286e+00 1.408e+03 -12.75 <2e-16 ***
## Year 1.610e-02 1.143e-03 1.412e+03 14.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Year -1.000
There is evidence that Geoffrey Boycott is right. Test cricket does seem to have been influenced by twenty20. The result has been to speed up coring rates. However this does not seem to have reduced the number of runs scored overall in the first innings, when players are most focussed on building a large total. In fact the reverse has occured. They’re not playing with sticks of rhubarb, Geoffrey.