Yet all experience is an arch wherethro’
Gleams that untravell’d world whose margin fades
For ever and forever when I move.
How dull it is to pause, to make an end,
To rust unburnish’d, not to shine in use!

            Ulysses by Alfred Tennyson
            

Introduction

This is an introductory post in which I introduce a cricketing package ‘cricketr’ whicj I have created. This package was a natural culmination to many earlier posts on cricketers and my completing 9 modules of an absorbing topics in Data Science Specialization, from John Hopkins University at Coursera. The thought of creating this package struck me some time back, and I have finally been able to bring this to fruition.

So here it is. My R package ‘cricketr!!!’

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package only uses data from test cricket. I plan to develop functionality for One-day and Twenty20 cricket later.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

The cricketr package

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has function that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/econmony rate for bowlers are available.

Other interesting functions include batting performance moving average, forecast and a function to check whether the batsmans in in-form or out-of-form.

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below

The cricketr package can be installed from GitHub with

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)
tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2),
                          result=c(1,2,4))

Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

Sachin Tendulkar’s performance - Basic Analyses

The 3 plots below provide the following for Tendulkar

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A his togram of runs frequency percentages in runs ranges
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Sachin Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Sachin Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Sachin Tendulkar")

dev.off()
## null device 
##           1

3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Sachin’s Runs versus Balls Faced and Minutes at crease. A linear regression model is then fitted between Runs and Balls Faced + Minutes at crease

battingPerf3d("./tendulkar.csv","Sachin Tendulkar")

Predict runs for batsman given Balls Faced and Minutes at Crease

The above linear regression model can be used for predicting the runs for the batsman given the Balls Faced and Minutes at crease as follows

BF <- seq( 10, 100,length=10)
Mins <- seq(30,200,length=10)
newDF <- data.frame(BF,Mins)
batsmanRunsPredict("./tendulkar.csv","Sachin Tendulkar",newdataframe=newDF)
##    Balls Faced   Minutes      Runs
## 1           10  30.00000  6.960818
## 2           20  48.88889 13.428240
## 3           30  67.77778 19.895663
## 4           40  86.66667 26.363086
## 5           50 105.55556 32.830509
## 6           60 124.44444 39.297932
## 7           70 143.33333 45.765355
## 8           80 162.22222 52.232778
## 9           90 181.11111 58.700201
## 10         100 200.00000 65.167624

Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are conputed and plotted. In this plot Sachin Tendulkar’s highest tendencies are computed and plotted using K-Means

batsmanRunsLikelihood("./tendulkar.csv","Tendulkar")

## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 16.51 % likelihood that Tendulkar  will make  139 Runs in  251 balls over 353  Minutes 
## There is a 58.41 % likelihood that Tendulkar  will make  16 Runs in  31 balls over  44  Minutes 
## There is a 25.08 % likelihood that Tendulkar  will make  66 Runs in  122 balls over 167  Minutes

A look at the Top 4 batsman - Tendulkar, Kallis, Ponting and Sangakkara

The batsmen with the most hundreds in test cricket are

  1. Sachin Tendulkar :Average:53.78,100’s - 51, 50’s - 68
  2. Jacques Kallis : Average: 55.47, 100’s - 45, 50’s - 58
  3. Ricky Ponting : Average: 51.85, 100’s - 41 , 50’s - 62
  4. Kumara Sangakarra: Average: 58.04 ,100’s - 38 , 50’s - 52

in that order.

The following plots take a closer at their performances. The box plots show the mean (red line) and median (blue line). The two ends of the boxplot display the 25th and 75th percentile.

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")

batsmanPerfBoxHist("./kallis.csv","Jacques Kallis")

batsmanPerfBoxHist("./ponting.csv","Ricky Ponting")

batsmanPerfBoxHist("./sangakkara.csv","K Sangakkara")

Contribution to won and lost matches

The plot below shows the contribution of Tendulkar, Kallis, Ponting and Sangakarra in matches won and lost. The plots show the range of runs scored as a boxplot (25th & 75th percentile) and the mean scored. The total matches won and lost are also printed in the plot.

All the players have scored more in the matches they won than the matches they lost. Ricky Ponting is the only batsman who seems to have more matches won to his credit than others. This could also be because he was a member of strong Australian team

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("35320","Tendulkar")
batsmanContributionWonLost("45789","Kallis")
batsmanContributionWonLost("7133","Ponting")
batsmanContributionWonLost("50710","Sangakarra")

dev.off()
## null device 
##           1

Performance at home and overseas

From the plot below it can be seen

Tendulkar has more matches overseas than at home and his performace is consistent in all venues at home or abroad. Ponting has lesser innings than Tendulkar and has an equally good performance at home and overseas.Kallis and Sangakkara’s performance abroad is lower than the performance at home.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("35320","Tendulkar")
batsmanPerfHomeAway("45789","Kallis")
batsmanPerfHomeAway("7133","Ponting")
batsmanPerfHomeAway("50710","Sangakarra")

dev.off()
## null device 
##           1

Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 0 - 50 Runs - Ponting leads followed by Tendulkar Range 50 -100 Runs - Ponting followed by Sangakkara Range 100 - 150 - Ponting and then Tendulkar

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeBatsmanSR(frames,names)

Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

Sangakkara leads followed by Ponting

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeRunsFreqPerf(frames,names)

Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4. Clearly . Kallis and Sangakkara have a few more years of great batting ahead. They seem to average on 50. . Tendulkar and Ponting definitely show a slump in the later years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Sachin Tendulkar")
batsmanMovingAverage("./kallis.csv","Jacques Kallis")
batsmanMovingAverage("./ponting.csv","Ricky Ponting")
batsmanMovingAverage("./sangakkara.csv","K Sangakkara")