1 Introduction

Cricket, with its long-standing tradition of statistical analysis, has traditionally relied on the batting average, defined as the total runs scored divided by the number of dismissals, to evaluate a batter’s performance. This metric, while simple and intuitive, often fails to capture the true range of a player’s contribution, especially in Test cricket, where innings can span many sessions and involve varied scoring patterns. In this study, we focus on the top 25 run-getters in Test cricket during the 2010s decade (1st January 2010 to 31st December 2019) and explore an alternative measure, the Expected Runs per Dismissal (ERPD), which provides a more nuanced assessment of batting performance.

2 Drawbacks of batting average

The batting average, despite its widespread use, has several limitations.

Sensitivity to notout innings: Batters who remain not out receive an artificial boost to their average since these innings are not counted as dismissals. This can result in overestimating a batter’s effectiveness, particularly for those who tend to bat for long durations.
Loss of distributional information: By aggregating performance into a single number, the batting average ignores the variability and distribution of scores. Two batters with similar averages might have vastly different scoring patterns—one might be highly consistent, while another might have sporadic high scores offset by low scores.
Negligency of survival dynamics: Batting average does not consider how long a batter survives in an innings before being dismissed. In the context of Test cricket, the ability to occupy the crease and build an innings is as crucial as the runs scored, and survival analysis offers a framework to incorporate this dimension.

Given these drawbacks, alternative metrics have been proposed by researchers and practitioners. For example, metrics such as APM (Average Player Multiplier), Expected Average, BEREX (Bernoulli Run Expectation), RAAR (Runs Above Average Replacement) etc. have been suggested as potential improvements over the traditional average. These measures, however, often still rely on summarizing performance without fully accounting for the survival aspect of batting.

3 Expected Runs per Dismissal (ERPD)

3.1 Introduction to ERPD

The Expected Runs per Dismissal (ERPD) is introduced as a more robust measure that overcomes many of the limitations of the batting average. ERPD is defined as the expected number of runs a batter is likely to score before being dismissed, taking into account the complete distribution of scores.

Mathematically, ERPD is expressed as:

\[ \text{ERPD} = E[X \mid \text{Dismissal}] = \int_0^\infty x f(x) \, dx \]

where \(X\) denotes the runs scored in an innings and \(f(x)\) is the probability density function of \(X\).

3.2 Distributional assumption: Lognormal fit

After thorough statistical analysis employing tools such as the Kolmogorov-Smirnov test and comparing criteria like the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), it was found that lognormal distribution offers the best fit for the distribution of runs scored by batters while being dismissed in a Test innings.

The following data (batter under study: Alastair Cook) shows the selection of lognormal distribution as the best fit among few well-known distributions.

Goodness-of-fit statistics
                             1-mle-norm 2-mle-exp 3-mle-gamma 4-mle-lnorm
Kolmogorov-Smirnov statistic   0.204965 0.1268075  0.09541033  0.05958898
Cramer-von Mises statistic     2.407679 0.5559498  0.25110920  0.10153055
Anderson-Darling statistic    13.453451 2.7504675  1.27300494  0.80366991

Goodness-of-fit criteria
                               1-mle-norm 2-mle-exp 3-mle-gamma 4-mle-lnorm
Akaike's Information Criterion   1968.408  1749.770    1748.639    1746.379
Bayesian Information Criterion   1974.838  1752.985    1755.069    1752.809

The lognormal model is particularly attractive because it naturally accommodates the skewness observed in cricket scores, with many low to moderate scores and a long right tail capturing the occasional big innings.

Fitting of the distribution ' lnorm ' by maximum likelihood 
Parameters : 
        estimate Std. Error
meanlog 3.061100 0.09510433
sdlog   1.290058 0.06724874
Loglikelihood:  -871.1894   AIC:  1746.379   BIC:  1752.809 
Correlation matrix:
             meanlog        sdlog
meanlog 1.000000e+00 1.454202e-09
sdlog   1.454202e-09 1.000000e+00

\[ \ln(X) \sim N(\mu, \sigma^2) \]

Accordingly, the expected value is given by: \[ E[X] = \exp\left(\mu + \frac{\sigma^2}{2} \right) \]

This forms the basis for the ERPD calculation for innings in which a batter has been dismissed.

3.3 Implementing Survival Analysis for Notout Innings

Furthermore, survival analysis concepts are applied to extend ERPD to innings in which a batter has remained notout. If we denote the survival function by:

\[ S(x) = P(X > x) = 1 - F(x), \]

where \(F(x)\) is the cumulative distribution function of \(X\), the conditional expectation of runs given that a batter has already scored \(x\) many runs can be expressed as:

\[ E[X \mid X > x] = \frac{\int_x^\infty t f(t) \, dt}{S(x)} = \frac{\int_x^\infty t f(t) \, dt}{\int_x^\infty f(t) \, dt} \]

For innings in which a batter hasn’t been dismissed, \(x\) denotes a right-censored data and \(E[X \mid X > x]\) serves as an estimate of the batter’s score had he/she been dismissed at some point hypothetically. By incorporating these techniques into the ERPD framework, we achieve a metric that not only reflects the central tendency of a batter’s scoring but also incorporates their survival ability and the inherent risk of dismissal.

3.4 Calculating ERPD for a batter

Consider Murali Vijay’s scores in Test cricket from 1st January 2010 to 31st December 2019 as the data of interest. His ERPD for innings in which he was dismissed has been derived as follows.

data <- read.csv("//Users//rhitankarsmacbook//Library//Mobile Documents//com~apple~CloudDocs/Cricket Stats//ERPD//Murali_Vijay.csv")

out_values <- data$Out 
duck_values <- out_values[out_values == 0]
duck_innings <- length(duck_values)
  
out_values <- out_values[out_values > 0]
out_values <- out_values[!is.na(out_values)]

hist(out_values, probability = TRUE, breaks = 20, col = "lightblue", main = "Histogram of out values")
lines(density(out_values), col = "red", lwd = 2)

fit_norm <- fitdist(out_values, "norm")
fit_exp <- fitdist(out_values, "exp")
fit_gamma <- fitdist(out_values, "gamma")
fit_lognorm <- fitdist(out_values, "lnorm")

# Compare AIC values
gofstat(list(fit_norm, fit_exp, fit_gamma, fit_lognorm))

Goodness-of-fit statistics
                             1-mle-norm 2-mle-exp 3-mle-gamma 4-mle-lnorm
Kolmogorov-Smirnov statistic  0.2122892 0.1464157   0.1241796  0.08092825
Cramer-von Mises statistic    1.3611348 0.4167276   0.2853081  0.09442460
Anderson-Darling statistic    7.6805112 2.2910722   1.6777359  0.66486409

Goodness-of-fit criteria
                               1-mle-norm 2-mle-exp 3-mle-gamma 4-mle-lnorm
Akaike's Information Criterion   975.0590  877.7462    879.1151    873.2391
Bayesian Information Criterion   980.1242  880.2788    884.1803    878.3043

Lowest K-S test statistic, Cramer-von Mises test statistic, AIC and BIC values suggest lognormal to be the best fit for Vijay’s scores in innings where he was dismissed.

summary(fit_lognorm)

Fitting of the distribution ' lnorm ' by maximum likelihood 
Parameters : 
        estimate Std. Error
meanlog 3.062178 0.12567095
sdlog   1.211927 0.08886251
Loglikelihood:  -434.6195   AIC:  873.2391   BIC:  878.3043 
Correlation matrix:
              meanlog         sdlog
meanlog  1.000000e+00 -4.760964e-10
sdlog   -4.760964e-10  1.000000e+00

lognorm_mu <- fit_lognorm$estimate["meanlog"]
lognorm_sigma <- fit_lognorm$estimate["sdlog"]
ERPD_out <- exp(lognorm_mu + (lognorm_sigma^2) / 2)

cat("ERPD for out values (excluding ducks) :", ERPD_out, "\n")

ERPD for out values (excluding ducks) : 44.54775

For the remaining right censored data, i.e. innings in which Vijay was dismissed, we calculate the Mean Residual Life (MRL) for each knock to obtain the corresponding ERPD.

notout_values <- data$Notout[!is.na(data$Notout)]
mu <- fit_lognorm$estimate["meanlog"] 
sigma <- fit_lognorm$estimate["sdlog"] 

ERPD_notout <- function(x, mu, sigma) {
  # Compute survival function S(x) = 1 - CDF(x)
  Sx <- 1 - plnorm(x, meanlog = mu, sdlog = sigma)
  
  # Compute expected runs beyond x
  Ex_given_x <- integrate(function(t) t * dlnorm(t, meanlog = mu, sdlog = sigma), 
                          lower = x, upper = Inf)$value
  
  return(Ex_given_x / Sx) 
}

ERPD_notout_values <- sapply(notout_values, ERPD_notout, mu = mu, sigma = sigma)
cat("ERPD for notout values:", ERPD_notout_values, "\n")

ERPD for notout values: 90.71932

Finally the overall ERPD of Murali Vijay is calculated as follows.

num_out <- length(out_values)
num_notout <- length(ERPD_notout_values)

total_innings <- num_out + num_notout + duck_innings
overall_ERPD <- (ERPD_out * num_out + sum(ERPD_notout_values)) / total_innings
cat("Overall ERPD:", overall_ERPD, "\n")

Overall ERPD: 40.70827

4 Ranking the best Test batters of 2010s decade using ERPD

The study considers the top 25 run-getters in Test cricket during the 2010s decade (1st January 2010 to 31st December 2019). One can rank them by virtue of their batting averages in this period as follows.

Batters ranked by Averages in Test cricket (2010-2019)
Player	Inns	Notouts	Runs	Average	Strike.Rate	X100	X50	Ducks	X4s	X6s
SPD Smith (AUS)	130	16	7164	62.84	55.59	26	28	4	795	42
KC Sangakkara (SL)	86	7	4851	61.40	52.22	17	20	7	514	29
AB de Villiers (SA)	98	10	5059	57.48	55.65	13	27	7	573	45
V Kohli (IND)	141	10	7202	54.97	57.81	27	22	10	805	22
Younis Khan (PAK)	101	12	4839	54.37	50.47	18	12	7	444	46
KS Williamson (NZ)	137	13	6379	51.44	51.55	21	31	9	694	14
Misbah-ul-Haq (PAK)	101	17	4225	50.29	46.15	8	35	6	405	73
HM Amla (SA)	146	12	6695	49.96	50.48	21	27	7	834	12
CA Pujara (IND)	124	8	5740	49.48	46.69	18	24	7	682	14
MJ Clarke (AUS)	107	10	4717	48.62	58.37	16	10	4	556	22
JE Root (ENG)	164	12	7359	48.41	54.37	17	45	8	829	20
DA Warner (AUS)	153	6	7088	48.21	73.04	23	30	9	840	56
LRPL Taylor (NZ)	133	19	5486	48.12	60.12	15	25	14	637	35
AN Cook (ENG)	201	11	8818	46.41	46.93	23	37	6	1010	9
IR Bell (ENG)	114	15	4436	44.80	48.88	13	25	7	539	25
AD Mathews (SL)	140	20	5325	44.37	48.54	9	32	2	566	52
BB McCullum (NZ)	95	5	3979	44.21	66.39	9	16	6	471	79
AM Rahane (IND)	105	11	4112	43.74	50.65	11	22	6	463	29
Azhar Ali (PAK)	146	8	5885	42.64	41.82	16	31	14	549	16
LD Chandimal (SL)	100	7	3846	41.35	49.01	11	18	4	411	22
F du Plessis (SA)	106	14	3799	41.29	45.98	9	21	9	466	20
Asad Shafiq (PAK)	122	6	4528	39.03	48.61	12	26	13	498	29
M Vijay (IND)	102	1	3821	37.83	45.78	12	14	8	450	32
FDM Karunaratne (SL)	124	4	4421	36.84	49.04	9	24	12	442	8
JM Bairstow (ENG)	123	7	4030	34.74	55.07	6	21	10	473	26

However, if the batters are by ranked according to their ERPD values, the table shows significant changes.

Batters ranked by ERPD in Test cricket (2010-2019)
Batter	Runs	Average	ERPD
Kumar Sangakkara	4851	61.40	81.51080
Steve Smith	7164	62.84	75.98680
Kane Williamson	6379	51.44	68.89218
Younis Khan	4839	54.37	66.92585
Virat Kohli	7202	54.97	66.37127
Joe Root	7359	48.41	63.03840
Hashim Amla	6695	49.96	62.53119
Cheteshwar Pujara	5740	49.48	61.60445
AB de Villiers	5059	57.48	61.13865
Ian Bell	4436	44.80	59.87259
Misbah-ul-Haq	4225	50.29	59.24620
David Warner	7088	48.21	58.51765
Michael Clarke	4717	48.62	57.08995
Ajinkya Rahane	4112	43.74	56.06439
Alastair Cook	8818	46.41	56.04557
Ross Taylor	5486	48.12	55.89375
Angelo Mathews	5325	44.37	53.81018
Azhar Ali	5885	42.64	53.28048
Faf du Plessis	3799	41.29	51.65885
Brendon McCullum	3979	44.21	49.02250
Dinesh Chandimal	3846	41.35	49.02013
Dimuth Karunaratne	4421	36.84	42.34182
Asad Shafiq	4528	39.03	41.01311
Murali Vijay	3821	37.83	40.70827
Jonny Bairstow	4030	34.74	36.51510

Note that, although somehow comparable, Expected Runs per Dismissal is not a direct equivalent of batting averages. One should not confuse Kumar Sangakkara’s batting average with him expected to score 81.5108 runs everytime he came out to bat during 2010s. Batters remaining notout on extremely high scores (a common case during declarations after completion of specific milestones) is a major reason for ERPD being generally higher than batting averages in Test cricket. The effect of abrupt rise in ERPD due to such scenarios can be controlled by slight modification in the working formula of MRL.

Brian Lara’s 400* against the Poms back in 2004 remains the highest individual Test score till date. A deeper look into history tells us that there have been only 6 instances of batters crossing the 350-run mark in Test cricket. A further investigation leads us to a count of only 8 batters scoring more than 335 runs in 148 years of the history of the game, none of them occuring post 2006.

Based on this fact, if the ERPD for unbeaten knocks are calculated with a modification as follows, \[ E[X \mid X > x] = \frac{\int_x^{335} t f(t) \, dt}{S(x)} \] then the resulting list of Adjusted ERPD of batters based on their performance in Test cricket during 2010s decade is given by:

Batters ranked by Adjusted ERPD in Test cricket (2010-2019)
Batter	Runs	Average	ERPD	AERPD	Shift	MOA
Kumar Sangakkara	4851	61.40	81.51080	73.42050	-8.09030	1.1957736
Steve Smith	7164	62.84	75.98680	65.24929	-10.73751	1.0383401
Virat Kohli	7202	54.97	66.37127	59.97174	-6.39953	1.0909904
Kane Williamson	6379	51.44	68.89218	57.99365	-10.89853	1.1274038
Joe Root	7359	48.41	63.03840	56.25753	-6.78087	1.1621056
Cheteshwar Pujara	5740	49.48	61.60445	55.43618	-6.16827	1.1203755
AB de Villiers	5059	57.48	61.13865	55.13152	-6.00713	0.9591427
Younis Khan	4839	54.37	66.92585	54.72924	-12.19661	1.0066073
David Warner	7088	48.21	58.51765	52.72440	-5.79325	1.0936403
Misbah-ul-Haq	4225	50.29	59.24620	52.51657	-6.72963	1.0442746
Hashim Amla	6695	49.96	62.53119	52.07122	-10.45997	1.0422582
Ian Bell	4436	44.80	59.87259	51.56033	-8.31226	1.1509002
Ajinkya Rahane	4112	43.74	56.06439	51.02417	-5.04022	1.1665334
Alastair Cook	8818	46.41	56.04557	50.79877	-5.24680	1.0945652
Ross Taylor	5486	48.12	55.89375	49.43804	-6.45571	1.0273907
Angelo Mathews	5325	44.37	53.81018	49.19461	-4.61557	1.1087359
Brendon McCullum	3979	44.21	49.02250	47.59190	-1.43060	1.0764963
Azhar Ali	5885	42.64	53.28048	46.69366	-6.58682	1.0950671
Faf du Plessis	3799	41.29	51.65885	46.42369	-5.23516	1.1243325
Michael Clarke	4717	48.62	57.08995	45.31727	-11.77268	0.9320705
Dinesh Chandimal	3846	41.35	49.02013	43.84775	-5.17238	1.0604051
Dimuth Karunaratne	4421	36.84	42.34182	40.96594	-1.37588	1.1119962
Murali Vijay	3821	37.83	40.70827	40.55762	-0.15065	1.0721020
Asad Shafiq	4528	39.03	41.01311	39.87997	-1.13314	1.0217774
Jonny Bairstow	4030	34.74	36.51510	35.24685	-1.26825	1.0145898

In the above table, MOA denotes the Multiplier on Average, quantifying the change in a batter’s AERPD as compared to their original batting averages \((= \frac{AERPD}{Average})\). Only AB de Villiers and Michael Clarke display a fall as compared to their averages denoting an underestimation of their batting averages - a rare case primarily occuring due to an overly right skewed distribution of scores as compared to other batters.

Note that the adjusted ERPD values have reduced the magnitude of overestimating a batter’s average score by a significant margin. Major shifts such as Hashim Amla and Younis Khan dropping down while Virat Kohli, Cheteshwar Pujara, AB de Villiers rising up the rankings etc are evident from the AERPD table. Kumar Sangakkara’s AERPD still remains fairly ahead of anyone else in the list though. The Sri Lankan maverick was quite extraordinary afterall!

Although none of ERPD or AERPD could be claimed to be a perfect metric to judge a batter’s ability yet, it provides a different perspective than the oversimplified batting average which has been in use for ages. To conclude with, this project is a simple attempt to try our hands on parallel to one of George E.P. Box’s famous quote: “All models are wrong, but some are useful.”

Expected Runs per Dismissal (ERPD): an alternative metric to batting average

Rhitankar Bandyopadhyay

Indian Statistical Institute, Kolkata

March 13, 2025