“Curiouser and curiouser!” cried Alice

“The time has come,” the walrus said, “to talk of many things: Of shoes and ships - and sealing wax - of cabbages and kings”

“Begin at the beginning,”the King said, very gravely,“and go on till you come to the end: then stop.”

“And what is the use of a book,” thought Alice, “without pictures or conversation?”

            Excerpts from Alice in Wonderland by Lewis Carroll

Introduction

In this post I take my package cricketr for a spin. For this analysis I focus on the Indian batting legends - Sachin Tendulkar (Master Blaster) - Rahul Dravid (The will) - Sourav Ganguly ( The Dada Prince) - Sunil Gavaskar (Little Master)

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency The plot below indicate the Tendulkar’s average is the highest. He is followed by Dravid, Gavaskar and then Ganguly

batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")

batsmanPerfBoxHist("./dravid.csv","Rahul Dravid")

batsmanPerfBoxHist("./ganguly.csv","Sourav Ganguly")

batsmanPerfBoxHist("./gavaskar.csv","Sunil Gavaskar")

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. Tendulkar leads in the Mean Strike Rate for each runs in the range 100- 180. Ganguly has a very good Mean Strike Rate for runs range 40 -80

frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
names <- list("Tendulkar","Dravid","Ganguly","Gavaskar")
relativeBatsmanSR(frames,names)

Relative Runs Frequency Percentage

The plot below show the percentage contribution in each 10 runs bucket over the entire career.The percentage Runs Frequency is fairly close but Gavaskar seems to lead most of the way

frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
names <- list("Tendulkar","Dravid","Ganguly","Gavaskar")
relativeRunsFreqPerf(frames,names)

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following - Tendulkar and Ganguly’s career has a downward trend and their retirement didn’t come too soon - Dravid and Gavaskar’s career definitely shows an upswing. They probably had a year or two left.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./dravid.csv","Dravid")
batsmanMovingAverage("./ganguly.csv","Ganguly")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")

dev.off()
## null device 
##           1

Runs forecast

The forecast for the batsman is shown below. The plots indicate that only Tendulkar seemed to maintain a consistency over the period while the rest seem to score less than their forecasted runs in the last 10% of the career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./dravid.csv","Rahul Dravid")
batsmanPerfForecast("./ganguly.csv","Sourav Ganguly")
batsmanPerfForecast("./gavaskar.csv","Sunil Gavaskar")

dev.off()
## null device 
##           1

Check for batsman in-form/out-of-form

The following snippet checks whether the batsman is in-inform or ouyt-of-form during the last 10% innings of the career. This is done by choosing the null hypothesis (h0) to indicate that the batsmen are in-form. Ha is the alternative hypothesis that they are not-in-form. The population is based on the 1st 90% of career runs. The last 10% is taken as the sample and a check is made on the lower tail to see if the sample mean is less than 95% confidence interval. If this difference is >0.05 then the batsman is considered out-of-form.

The computation show that Tendulkar was out-of-form while the other’s weren’t. While Dravid and Gavaskar’s moving average do show an upward trend the surprise is Ganguly. This could be that Ganguly was able to keep his average in the last 10% to with the 95$ confidence interval. It has to be noted that Ganguly’s average was much lower than Tendulkar

checkBatsmanInForm("./tendulkar.csv","Tendulkar")
## *******************************************************************************************
## 
## Population size: 294  Mean of population: 50.48 
## Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 
## 
## Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Tendulkar 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./dravid.csv","Dravid")
## *******************************************************************************************
## 
## Population size: 256  Mean of population: 46.98 
## Sample size: 29  Mean of sample: 43.48 SD of sample: 40.89 
## 
## Null hypothesis H0 : Dravid 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Dravid 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Dravid 's Form Status: In-Form because the p value: 0.324138  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./ganguly.csv","Ganguly")
## *******************************************************************************************
## 
## Population size: 169  Mean of population: 38.94 
## Sample size: 19  Mean of sample: 33.21 SD of sample: 32.97 
## 
## Null hypothesis H0 : Ganguly 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Ganguly 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Ganguly 's Form Status: In-Form because the p value: 0.229006  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## *******************************************************************************************
## 
## Population size: 125  Mean of population: 44.67 
## Sample size: 14  Mean of sample: 57.86 SD of sample: 58.55 
## 
## Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Gavaskar 's Form Status: In-Form because the p value: 0.793276  is greater than alpha=  0.05"
## *******************************************************************************************
dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Tendulkar")
battingPerf3d("./dravid.csv","Dravid")

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./ganguly.csv","Ganguly")
battingPerf3d("./gavaskar.csv","Gavaskar")

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
dravid <- batsmanRunsPredict("./dravid.csv","Dravid",newdataframe=newDF)
ganguly <- batsmanRunsPredict("./ganguly.csv","Ganguly",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Tendulkar has a much higher Runs scored than all of the others.

Tendulkar is followed by Ganguly who we saw earlier had a very good strike rate. However it must be noted that Dravid and Gavaskar have a better average.

batsmen <-cbind(round(tendulkar$Runs),round(dravid$Runs),round(ganguly$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Dravid","Ganguly","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Dravid Ganguly Gavaskar
## 1          10           30         7      1       7        4
## 2          38           71        23     14      21       17
## 3          66          111        39     27      35       30
## 4          94          152        54     40      50       43
## 5         121          193        70     54      64       56
## 6         149          234        86     67      78       69
## 7         177          274       102     80      93       82
## 8         205          315       118     94     107       95
## 9         233          356       134    107     121      108
## 10        261          396       150    120     136      121
## 11        289          437       165    134     150      134
## 12        316          478       181    147     165      147
## 13        344          519       197    160     179      160
## 14        372          559       213    173     193      173
## 15        400          600       229    187     208      186

Contribution to matches won and lost

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost(35320,"Tendulkar")
batsmanContributionWonLost(28114,"Dravid")
batsmanContributionWonLost(28779,"Ganguly")
batsmanContributionWonLost(28794,"Gavaskar")