In this post my package ‘cricketr’ takes a swing at One Day Internationals(ODIs). Like test batsman who adapt to ODIs with some innovative strokes, the cricketr package has some additional functions and some modified functions to handle the high strike and economy rates in ODIs. As before I have chosen my top 4 ODI batsmen and top 4 ODI bowlers. You should be able to install the package from GitHub and use the many of the functions available in the package.
Please be mindful of the ESPN Cricinfo Terms of Use
You can also read this post at Rpubs as odi-cricketr. Dowload this report as a PDF file from odi-cricketr
Batsmen
Bowlers
I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below
library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)
The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virendar Sehwag, etc. This will bring up a page which have the profile number for the player e.g. for Virendar Sehwag this would be http://www.espncricinfo.com/india/content/player/35263.html. Hence, Sehwag’s profile is 35263. This can be used to get the data for Virat Sehwag as shown below
sehwag <- getPlayerDataOD(35263,dir="..",file="sehwag.csv",type="batting")
The following plots gives the analysis of the 4 ODI batsmen
The 3 charts below give the number of
A regression line is fitted in each of these plots for each of the ODI batsmen A. Virender Sehwag
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./sehwag.csv","Sehwag")
batsman6s("./sehwag.csv","Sehwag")
batsmanScoringRateODTT("./sehwag.csv","Sehwag")
dev.off()
## null device
## 1
B. AB Devilliers
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./devilliers.csv","Devillier")
batsman6s("./devilliers.csv","Devillier")
batsmanScoringRateODTT("./devilliers.csv","Devillier")
dev.off()
## null device
## 1
C. Chris Gayle
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gayle.csv","Gayle")
batsman6s("./gayle.csv","Gayle")
batsmanScoringRateODTT("./gayle.csv","Gayle")
dev.off()
## null device
## 1
D. Glenn Maxwell
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./maxwell.csv","Maxwell")
batsman6s("./maxwell.csv","Maxwell")
batsmanScoringRateODTT("./maxwell.csv","Maxwell")
dev.off()
## null device
## 1
In this first plot I plot the Mean Strike Rate of the batsmen. It can be seen that Maxwell has a awesome strike rate in ODIs. However we need to keep in mind that Maxwell has relatively much fewer (only 45 innings) innings. He is followed by Sehwag who(most innings- 245) also has an excellent strike rate till 100 runs and then we have Devilliers who roars ahead. This is also seen in the overall strike rate in above
par(mar=c(4,4,2,2))
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeBatsmanSRODTT(frames,names)
Sehwag leads in the percentage of runs in 10 run ranges upto 50 runs. Maxwell and Devilliers lead in 55-66 & 66-85 respectively.
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeRunsFreqPerfODTT(frames,names)
The plot below shows the percentage of runs made by the batsmen by ways of 1s,2s,3s, 4s and 6s. It can be seen that Sehwag has the higheest percent of 4s (33.36%) in his overall runs in ODIs. Maxwell has the highest percentage of 6s (13.36%) in his ODI career. If we take the overall 4s+6s then Sehwag leads with (33.36 +5.95 = 39.31%),followed by Gayle (27.80+10.15=37.95%)
The plot below shows the contrib
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
runs4s6s <-batsman4s6s(frames,names)
print(runs4s6s)
## Sehwag Devilliers Gayle Maxwell
## Runs(1s,2s,3s) 60.69 67.39 62.05 62.11
## 4s 33.36 24.28 27.80 24.53
## 6s 5.95 8.32 10.15 13.36
The forecast for the batsman is shown below.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./sehwag.csv","Sehwag")
batsmanPerfForecast("./devilliers.csv","Devilliers")
batsmanPerfForecast("./gayle.csv","Gayle")
batsmanPerfForecast("./maxwell.csv","Maxwell")
dev.off()
## null device
## 1
The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./sehwag.csv","V Sehwag")
battingPerf3d("./devilliers.csv","AB Devilliers")
dev.off()
## null device
## 1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./gayle.csv","C Gayle")
battingPerf3d("./maxwell.csv","G Maxwell")
dev.off()
## null device
## 1
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)
sehwag <- batsmanRunsPredict("./sehwag.csv","Sehwag",newdataframe=newDF)
devilliers <- batsmanRunsPredict("./devilliers.csv","Devilliers",newdataframe=newDF)
gayle <- batsmanRunsPredict("./gayle.csv","Gayle",newdataframe=newDF)
maxwell <- batsmanRunsPredict("./maxwell.csv","Maxwell",newdataframe=newDF)
The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Maxwell sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease followed by Sehwag. But we have to keep in mind that Maxwell has only around 1/5th of the innings of Sehwag (45 to Sehwag’s 245 innings). They are followed by Devilliers and then finally Gayle
batsmen <-cbind(round(sehwag$Runs),round(devilliers$Runs),round(gayle$Runs),round(maxwell$Runs))
colnames(batsmen) <- c("Sehwag","Devilliers","Gayle","Maxwell")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
## BallsFaced MinsAtCrease Sehwag Devilliers Gayle Maxwell
## 1 10 30 11 12 11 18
## 2 31 51 33 32 28 43
## 3 52 72 55 52 46 67
## 4 73 93 77 71 63 92
## 5 94 114 100 91 81 117
## 6 116 136 122 111 98 141
## 7 137 157 144 130 116 166
## 8 158 178 167 150 133 191
## 9 179 199 189 170 151 215
## 10 200 220 211 190 168 240
The plots below the runs likelihood of batsman. This uses K-Means It can be seen that Devilliers has almost 27.75% likelihood to make around 90+ runs. Gayle and Sehwag have 34% to make 40+ runs. A. Virender Sehwag
batsmanRunsLikelihood("./sehwag.csv","Sehwag")
## Summary of Sehwag 's runs scoring likelihood
## **************************************************
##
## There is a 53.81 % likelihood that Sehwag will make 11 Runs in 13 balls over 18 Minutes
## There is a 34.75 % likelihood that Sehwag will make 44 Runs in 41 balls over 63 Minutes
## There is a 11.44 % likelihood that Sehwag will make 106 Runs in 95 balls over 140 Minutes
B. AB Devillier
batsmanRunsLikelihood("./devilliers.csv","Devilliers")
## Summary of Devilliers 's runs scoring likelihood
## **************************************************
##
## There is a 27.75 % likelihood that Devilliers will make 94 Runs in 88 balls over 127 Minutes
## There is a 36.99 % likelihood that Devilliers will make 10 Runs in 14 balls over 20 Minutes
## There is a 35.26 % likelihood that Devilliers will make 40 Runs in 42 balls over 60 Minutes
C. Chris Gayle
batsmanRunsLikelihood("./gayle.csv","Gayle")
## Summary of Gayle 's runs scoring likelihood
## **************************************************
##
## There is a 50.23 % likelihood that Gayle will make 9 Runs in 14 balls over 18 Minutes
## There is a 35.16 % likelihood that Gayle will make 44 Runs in 50 balls over 70 Minutes
## There is a 14.61 % likelihood that Gayle will make 110 Runs in 119 balls over 167 Minutes
D. Glenn Maxwell
batsmanRunsLikelihood("./maxwell.csv","Maxwell")
## Summary of Maxwell 's runs scoring likelihood
## **************************************************
##
## There is a 44.44 % likelihood that Maxwell will make 5 Runs in 7 balls over 9 Minutes
## There is a 15.56 % likelihood that Maxwell will make 87 Runs in 62 balls over 77 Minutes
## There is a 40 % likelihood that Maxwell will make 36 Runs in 28 balls over 36 Minutes
A. Virender Sehwag
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./sehwag.csv","Sehwag")
batsmanAvgRunsOpposition("./sehwag.csv","Sehwag")
dev.off()
## null device
## 1
B. AB Devilliers
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./devilliers.csv","Devilliers")
batsmanAvgRunsOpposition("./devilliers.csv","Devilliers")
dev.off()
## null device
## 1
C. Chris Gayle
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./gayle.csv","Gayle")
batsmanAvgRunsOpposition("./gayle.csv","Gayle")
dev.off()
## null device
## 1
D. Glenn Maxwell
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./maxwell.csv","Maxwell")
batsmanAvgRunsOpposition("./maxwell.csv","Maxwell")
dev.off()
## null device
## 1
The moving average for the 4 batsmen indicate the following
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./sehwag.csv","Sehwag")
batsmanMovingAverage("./devilliers.csv","Devilliers")
batsmanMovingAverage("./gayle.csv","Gayle")
batsmanMovingAverage("./maxwell.csv","Maxwell")
dev.off()
## null device
## 1
checkBatsmanInForm("./sehwag.csv","Sehwag")
## *******************************************************************************************
##
## Population size: 143 Mean of population: 33.76
## Sample size: 16 Mean of sample: 37.44 SD of sample: 55.15
##
## Null hypothesis H0 : Sehwag 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Sehwag 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Sehwag 's Form Status: In-Form because the p value: 0.603525 is greater than alpha= 0.05"
## *******************************************************************************************
checkBatsmanInForm("./devilliers.csv","Devilliers")
## *******************************************************************************************
##
## Population size: 111 Mean of population: 43.5
## Sample size: 13 Mean of sample: 57.62 SD of sample: 40.69
##
## Null hypothesis H0 : Devilliers 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Devilliers 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Devilliers 's Form Status: In-Form because the p value: 0.883541 is greater than alpha= 0.05"
## *******************************************************************************************
checkBatsmanInForm("./gayle.csv","Gayle")
## *******************************************************************************************
##
## Population size: 140 Mean of population: 37.1
## Sample size: 16 Mean of sample: 17.25 SD of sample: 20.25
##
## Null hypothesis H0 : Gayle 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Gayle 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Gayle 's Form Status: Out-of-Form because the p value: 0.000609 is less than alpha= 0.05"
## *******************************************************************************************
checkBatsmanInForm("./maxwell.csv","Maxwell")
## *******************************************************************************************
##
## Population size: 28 Mean of population: 25.25
## Sample size: 4 Mean of sample: 64.25 SD of sample: 36.97
##
## Null hypothesis H0 : Maxwell 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Maxwell 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Maxwell 's Form Status: In-Form because the p value: 0.948744 is greater than alpha= 0.05"
## *******************************************************************************************
Malinga has the highest number of innings and wickets followed closely by Mitchell. Steyn and Southee have relatively fewer innings.
To get the bowler’s data use
malinga <- getPlayerDataOD(49758,dir=".",file="malinga.csv",type="bowling")
This plot gives the percentage of wickets for each wickets (1,2,3…etc).
par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./mitchell.csv","J Mitchell")
bowlerWktsFreqPercent("./malinga.csv","Malinga")
bowlerWktsFreqPercent("./steyn.csv","Steyn")
bowlerWktsFreqPercent("./southee.csv","southee")
dev.off()
## null device
## 1
The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. M Johnson and Steyn are more economical than Malinga and Southee corroborating the figures above
par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./mitchell.csv","J Mitchell")
bowlerWktsRunsPlot("./malinga.csv","Malinga")
bowlerWktsRunsPlot("./steyn.csv","Steyn")
bowlerWktsRunsPlot("./southee.csv","southee")
dev.off()
## null device
## 1
A. Mitchell Johnson
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./mitchell.csv","J Mitchell")
bowlerAvgWktsOpposition("./mitchell.csv","J Mitchell")
dev.off()
## null device
## 1
B. Lasith Malinga
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./malinga.csv","Malinga")
bowlerAvgWktsOpposition("./malinga.csv","Malinga")
dev.off()
## null device
## 1
C. Dale Steyn
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./steyn.csv","Steyn")
bowlerAvgWktsOpposition("./steyn.csv","Steyn")
dev.off()
## null device
## 1
D. Tim Southee
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./southee.csv","southee")
bowlerAvgWktsOpposition("./southee.csv","southee")
dev.off()
## null device
## 1
The plot below shows that Mitchell Johnson and Southee have more wickets in 3-4 wickets range while Steyn and Malinga in 1-2 wicket range
frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingPerf(frames,names)
Steyn had the best economy rate followed by M Johnson. Malinga and Southee have a poorer economy rate
frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingERODTT(frames,names)
Johnson and Steyn career vs wicket graph is on the up-swing. Southee is maintaining a reasonable record while Malinga shows a decline in ODI performance
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./mitchell.csv","M Johnson")
bowlerMovingAverage("./malinga.csv","Malinga")
bowlerMovingAverage("./steyn.csv","Steyn")
bowlerMovingAverage("./southee.csv","Southee")
dev.off()
## null device
## 1
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./mitchell.csv","M Johnson")
## Warning in HoltWinters(ts.train): optimization difficulties: ERROR:
## ABNORMAL_TERMINATION_IN_LNSRCH
bowlerPerfForecast("./malinga.csv","Malinga")
bowlerPerfForecast("./steyn.csv","Steyn")
bowlerPerfForecast("./southee.csv","southee")
dev.off()
## null device
## 1
All the bowlers are shown to be still in-form
checkBowlerInForm("./mitchell.csv","J Mitchell")
## *******************************************************************************************
##
## Population size: 135 Mean of population: 1.55
## Sample size: 15 Mean of sample: 2 SD of sample: 1.07
##
## Null hypothesis H0 : J Mitchell 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : J Mitchell 's sample average is below the 95% confidence
## interval of population average
##
## [1] "J Mitchell 's Form Status: In-Form because the p value: 0.937917 is greater than alpha= 0.05"
## *******************************************************************************************
checkBowlerInForm("./malinga.csv","Malinga")
## *******************************************************************************************
##
## Population size: 163 Mean of population: 1.58
## Sample size: 19 Mean of sample: 1.58 SD of sample: 1.22
##
## Null hypothesis H0 : Malinga 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Malinga 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Malinga 's Form Status: In-Form because the p value: 0.5 is greater than alpha= 0.05"
## *******************************************************************************************
checkBowlerInForm("./steyn.csv","Steyn")
## *******************************************************************************************
##
## Population size: 93 Mean of population: 1.59
## Sample size: 11 Mean of sample: 1.45 SD of sample: 0.69
##
## Null hypothesis H0 : Steyn 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : Steyn 's sample average is below the 95% confidence
## interval of population average
##
## [1] "Steyn 's Form Status: In-Form because the p value: 0.257438 is greater than alpha= 0.05"
## *******************************************************************************************
checkBowlerInForm("./southee.csv","southee")
## *******************************************************************************************
##
## Population size: 86 Mean of population: 1.48
## Sample size: 10 Mean of sample: 0.8 SD of sample: 1.14
##
## Null hypothesis H0 : southee 's sample average is within 95% confidence interval
## of population average
## Alternative hypothesis Ha : southee 's sample average is below the 95% confidence
## interval of population average
##
## [1] "southee 's Form Status: Out-of-Form because the p value: 0.044302 is less than alpha= 0.05"
## *******************************************************************************************
Here are some key conclusions ODI batsmen
ODI bowlers
Also see my other posts in R
You may also like