“The unexamined life is not worth living.” - Socrates
“There is no easy way from the earth to the stars.” - Seneca
“If you want to go fast, go alone. If you want to go far, go together.” - African Proverb
In this post, I put my R package cricketr to analyze the Indian and Australia WTC squad ahead of the World Test Championship 2023.My R package cricketr had its birth on Jul 4, 2015. Cricketr uses data from Cricinfo.
Rohit Sharma (Captain), Shubman Gill, Cheteshwar Pujara, Virat Kohli, Ajinkya Rahane, Ravindra Jadeja, Shardul Thakur, Mohd. Shami, Mohd. Siraj, Ishan Kishan (wk).
According to me, Ishan Kishan has more experience than KS Bharat, though Rishabh Pant would have been the ideal wicket keeper/left-handed batsman. I think Shardul Thakur would be handful in the English conditions. For a spinner it either Ashwin or Jadeja. Maybe the balance shifts in favor of Jadeja
Pat Cummins (capt), Alex Carey (wk), Cameron Green, Josh Hazlewood, Usman Khawaja, Marnus Labuschagne, Nathan Lyon, Todd Murphy, Steven Smith (vice-capt), Mitchell Starc, David Warner.
Not sure if Scott Boland would fill in, instead of Todd Murphy
Let me give you a lay-of-the-land (post) below
The post below is organized into the following parts
All the above analysis use data from ESPN Statsguru and use my R pakage cricketr
The data for the different players have been obtained using calls such as the ones below.
# Get Shubman Gill's batting data
#shubman <-getPlayerData(1070173,dir=".",file="shubman.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4))
#shubmansp <- getPlayerDataSp(1070173,tdir=".",tfile="shubmansp.csv",ttype="batting")
#Get Shubman Gill's data from Jan 2016 - May 2023
#df <-getPlayerDataHA(1070173,tfile="shubman1.csv",type="batting", matchType="Test")
#df1=getPlayerDataOppnHA(infile="shubman1.csv",outfile="shubmanTestAus.csv",startDate="2016-01-01",endDate="2023-05-01")
#Get Shubman Gills data from Jan 2016 - May 2023, against Australia
#df <-getPlayerDataHA(1070173,tfile="shubman1.csv",type="batting", matchType="Test")
#df1=getPlayerDataOppnHA(infile="shubman1.csv",outfile="shubmanTestAus.csv",opposition="Australia",startDate="2016-01-01",endDate="2023-05-01")
Note: To get data for bowlers we need to use the corresponding profile no and use type =‘bowling’. Details in my posts below
To do similar analysis please go through the following posts
Note 1: I will not be analysing each and every chart as the charts are quite self-explanatory Note 2: I have had to tile charts together otherwise this will become a very, very long post. You are free to use my R package cricketr and check out for yourself ##3. Analysis of India WTC batsmen from Jan 2016 - May 2023
## Findings
Note 3: You can also read this post at Rpubs at Cricketr analyzes Ind-Aus faceoff in WTC 2023!! The formatting will be nicer!
Note 4: You can download this post as PDF to read at your leisure ind-aus-WTC.pdf
if (!require("cricketr")){
install.packages("cricketr",lib = "c:/test")
}
library(cricketr)
The analyses below include - Runs frequency plot - Mean strike rate - Run Ranges
Kohli’s strike rate increases with increasing runs, while Gill’s seems to drop. So it is with Pujara & Rahane
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("kohliTest.csv","Kohli")
batsmanMeanStrikeRate("kohliTest.csv","Kohli")
batsmanRunsRanges("kohliTest.csv","Kohli")
batsmanRunsFreqPerf("rohitTest.csv","Rohit")
batsmanMeanStrikeRate("rohitTest.csv","Rohit")
batsmanRunsRanges("rohitTest.csv","Rohit")
batsmanRunsFreqPerf("shubmanTest.csv","S Gill")
batsmanMeanStrikeRate("shubmanTest.csv","S Gill")
batsmanRunsRanges("shubmanTest.csv","S Gill")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("rahaneTest.csv","Rahane")
batsmanMeanStrikeRate("rahaneTest.csv","Rahane")
batsmanRunsRanges("rahaneTest.csv","Rahane")
batsmanRunsFreqPerf("pujaraTest.csv","Pujara")
batsmanMeanStrikeRate("pujaraTest.csv","Pujara")
batsmanRunsRanges("pujaraTest.csv","Pujara")
Kohli hits roughly 5 4s in his 50 versus Gill,Pujara who is able to smash 6 4s.
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsman4s("kohliTest.csv","Kohli")
batsman6s("kohliTest.csv","Kohli")
batsmanMeanStrikeRate("kohliTest.csv","Kohli")
batsman4s("rohitTest.csv","Rohit")
batsman6s("rohitTest.csv","Rohit")
batsmanMeanStrikeRate("rohitTest.csv","Rohit")
batsman4s("shubmanTest.csv","S Gill")
batsman6s("shubmanTest.csv","S Gill")
batsmanMeanStrikeRate("shubmanTest.csv","S Gill")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("rahaneTest.csv","Rahane")
batsman6s("rahaneTest.csv","Rahane")
batsmanMeanStrikeRate("rahane.csv","Rahane")
batsman4s("pujaraTest.csv","Pujara")
batsman6s("pujaraTest.csv","Pujara")
batsmanMeanStrikeRate("pujaraTest.csv","Pujara")
This plot shows a combined boxplot of the Runs ranges and a histog2ram of the Runs Frequency Kohli’s average is 48, while Rohit,Pujara is 40 with Rahane and Gill around 33.
batsmanPerfBoxHist("kohliTest.csv","Kohli")
batsmanPerfBoxHist("rohitTest.csv","Rohit")
batsmanPerfBoxHist("shubmanTest.csv","S Gill")
batsmanPerfBoxHist("rahaneTest.csv","Rahane")
batsmanPerfBoxHist("pujaraTest.csv","Pujara")
For the functions below you will have to use the getPlayerDataSp() function. When Rohit Sharma and Pujara have played well India have tended to win more often
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("kohlisp.csv","Kohli")
batsmanContributionWonLost("rohitsp.csv","Rohit")
batsmanContributionWonLost("rahanesp.csv","Rahane")
batsmanContributionWonLost("pujarasp.csv","Pujara")
This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("kohlisp.csv","Kohli")
batsmanPerfHomeAway("rohitsp.csv","Rohit")
batsmanPerfHomeAway("rahanesp.csv","Rahane")
batsmanPerfHomeAway("pujarasp.csv","Pujara")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("kohliTest.csv","Kohli")
batsmanAvgRunsGround("rohitTest.csv","Rohit")
batsmanAvgRunsGround("rahaneTest.csv","Rahane")
batsmanAvgRunsGround("pujaraTest.csv","Pujara")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("kohliTest.csv","Kohli")
batsmanAvgRunsOpposition("rohitTest.csv","Rohit")
batsmanAvgRunsOpposition("rahaneTest.csv","Rahane")
batsmanAvgRunsOpposition("pujaraTest.csv","Pujara")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanRunsLikelihood("kohli.csv","Kohli")
## Summary of Kohli 's runs scoring likelihood
## **************************************************
##
## There is a 16.28 % likelihood that Kohli will make 142 Runs in 237 balls over 335 Minutes
## There is a 52.91 % likelihood that Kohli will make 12 Runs in 26 balls over 35 Minutes
## There is a 30.81 % likelihood that Kohli will make 52 Runs in 100 balls over 139 Minutes
batsmanRunsLikelihood("rohit.csv","Rohit")
## Summary of Rohit 's runs scoring likelihood
## **************************************************
##
## There is a 10.81 % likelihood that Rohit will make 110 Runs in 199 balls over 282 Minutes
## There is a 45.95 % likelihood that Rohit will make 46 Runs in 85 balls over 124 Minutes
## There is a 43.24 % likelihood that Rohit will make 10 Runs in 21 balls over 32 Minutes
batsmanRunsLikelihood("rahane.csv","Rahane")
## Summary of Rahane 's runs scoring likelihood
## **************************************************
##
## There is a 59.69 % likelihood that Rahane will make 11 Runs in 24 balls over 35 Minutes
## There is a 11.63 % likelihood that Rahane will make 109 Runs in 200 balls over 289 Minutes
## There is a 28.68 % likelihood that Rahane will make 49 Runs in 104 balls over 147 Minutes
batsmanRunsLikelihood("pujara.csv","Pujara")
## Summary of Pujara 's runs scoring likelihood
## **************************************************
##
## There is a 60.49 % likelihood that Pujara will make 15 Runs in 38 balls over 55 Minutes
## There is a 31.48 % likelihood that Pujara will make 62 Runs in 142 balls over 204 Minutes
## There is a 8.02 % likelihood that Pujara will make 153 Runs in 319 balls over 445 Minutes
Kohli’s moving average in tests seem to havw dropped after a peak in 2017, 2018. So it is with Rahane
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("kohli.csv","Kohli")
batsmanMovingAverage("rohit.csv","Rohit")
batsmanMovingAverage("rahane.csv","Rahane")
batsmanMovingAverage("pujara.csv","Pujara")
Kohli’s cumulative average averages to ~48. Shubman Gill’s cumulative average is on the rise.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("kohliTest.csv","Kohli")
batsmanCumulativeAverageRuns("rohitTest.csv","Rohit")
batsmanCumulativeAverageRuns("rahaneTest.csv","Rahane")
batsmanCumulativeAverageRuns("pujaraTest.csv","Pujara")
batsmanCumulativeAverageRuns("shubmanTest.csv","S Gill")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("kohliTest.csv","Kohli")
batsmanCumulativeStrikeRate("rohitTest.csv","Rohit")
batsmanCumulativeStrikeRate("rahaneTest.csv","Rahane")
batsmanCumulativeStrikeRate("pujaraTest.csv","Pujara")
batsmanCumulativeStrikeRate("shubmanTest.csv","S Gill")
Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.
A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
Take a look at the runs forecasted for the batsman below.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("kohli.csv","Kohli")
batsmanPerfForecast("rohit.csv","Rohit")
batsmanPerfForecast("rahane.csv","Rahane")
batsmanPerfForecast("pujara.csv","Pujara")
The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanSR(frames,names)
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeRunsFreqPerf(frames,names)
Kohli’s tops the list, followed by Pujara and Rohit is 3rd. Gill is on the upswing. Hope he performs well.
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeAvgRuns(frames,names)
ROhit has the best strike rate followed by Kohli, with Shubman Gill ctaching up fast
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeStrikeRate(frames,names)
The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 - Batsman In-Form If p-value < 0.05 - Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
This is done for the Top 4 batsman
checkBatsmanInForm("kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 154 Mean of population: 47.03 \n Sample size: 18 Mean of sample: 32.22 SD of sample: 42.45 \n\n Null hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population average\n\n Kohli 's Form Status: In-Form because the p value: 0.078058 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("rohit.csv","Rohit")
## [1] "**************************** Form status of Rohit ****************************\n\n Population size: 66 Mean of population: 37.03 \n Sample size: 8 Mean of sample: 37.88 SD of sample: 35.38 \n\n Null hypothesis H0 : Rohit 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Rohit 's sample average is below the 95% confidence interval of population average\n\n Rohit 's Form Status: In-Form because the p value: 0.526254 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("rahane.csv","Rahane")
## [1] "**************************** Form status of Rahane ****************************\n\n Population size: 116 Mean of population: 34.78 \n Sample size: 13 Mean of sample: 21.38 SD of sample: 21.96 \n\n Null hypothesis H0 : Rahane 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Rahane 's sample average is below the 95% confidence interval of population average\n\n Rahane 's Form Status: Out-of-Form because the p value: 0.023244 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("pujara.csv","Pujara")
## [1] "**************************** Form status of Pujara ****************************\n\n Population size: 145 Mean of population: 41.93 \n Sample size: 17 Mean of sample: 33.24 SD of sample: 31.74 \n\n Null hypothesis H0 : Pujara 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Pujara 's sample average is below the 95% confidence interval of population average\n\n Pujara 's Form Status: In-Form because the p value: 0.137319 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("shubman.csv","S Gill")
## [1] "**************************** Form status of S Gill ****************************\n\n Population size: 23 Mean of population: 30.43 \n Sample size: 3 Mean of sample: 51.33 SD of sample: 66.88 \n\n Null hypothesis H0 : S Gill 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : S Gill 's sample average is below the 95% confidence interval of population average\n\n S Gill 's Form Status: In-Form because the p value: 0.687033 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
kohli1 <- batsmanRunsPredict("kohli.csv","Kohli",newdataframe=newDF)
rohit1 <- batsmanRunsPredict("rohit.csv","Rohit",newdataframe=newDF)
pujara1 <- batsmanRunsPredict("pujara.csv","Pujara",newdataframe=newDF)
rahane1 <- batsmanRunsPredict("rahane.csv","Rahane",newdataframe=newDF)
sgill1 <- batsmanRunsPredict("shubman.csv","S Gill",newdataframe=newDF)
batsmen <-cbind(round(kohli1$Runs),round(rohit1$Runs),round(pujara1$Runs),round(rahane1$Runs),round(sgill1$Runs))
colnames(batsmen) <- c("Kohli","Rohit","Pujara","Rahane","S Gill")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
## BallsFaced MinsAtCrease Kohli Rohit Pujara Rahane S Gill
## 1 10 30 6 3 3 2 7
## 2 38 71 24 19 16 17 24
## 3 66 111 41 35 29 31 40
## 4 94 152 58 51 42 45 56
## 5 121 193 76 66 55 59 73
## 6 149 234 93 82 68 74 89
## 7 177 274 110 98 80 88 106
## 8 205 315 128 114 93 102 122
## 9 233 356 145 129 106 116 139
## 10 261 396 163 145 119 130 155
## 11 289 437 180 161 132 145 171
## 12 316 478 197 177 144 159 188
## 13 344 519 215 192 157 173 204
## 14 372 559 232 208 170 187 221
## 15 400 600 249 224 183 202 237
Against Australia specifically between 2016 - 2023, Pujara has the best record followed by Rahane, with Gill in hot pursuit. Kohli and Rohit trail behind
frames <- list("kohliTestAus.csv","rohitTestAus.csv","pujaraTestAus.csv","rahaneTestAus.csv","shubmanTestAus.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeAvgRuns(frames,names)
In the Strike Rate department Gill tops followed by Rohit and Rahane
frames <- list("kohliTestAus.csv","rohitTestAus.csv","pujaraTestAus.csv","rahaneTestAus.csv","shubmanTestAus.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeStrikeRate(frames,names)
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("stevesmithTest.csv","S Smith")
batsmanMeanStrikeRate("stevesmithTest.csv","S Smith")
batsmanRunsRanges("stevesmithTest.csv","S Smith")
batsmanRunsFreqPerf("warnerTest.csv","Warner")
batsmanMeanStrikeRate("warnerTest.csv","Warner")
batsmanRunsRanges("warnerTest.csv","Warner")
batsmanRunsFreqPerf("labuschagneTest.csv","M Labuschagne")
batsmanMeanStrikeRate("labuschagneTest.csv","M Labuschagne")
batsmanRunsRanges("labuschagneTest.csv","M Labuschagne")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("cgreenTest.csv","C Green")
batsmanMeanStrikeRate("cgreenTest.csv","C Green")
batsmanRunsRanges("cgreenTest.csv","C Green")
batsmanRunsFreqPerf("khwajaTest.csv","Khwaja")
batsmanMeanStrikeRate("khwajaTest.csv","Khwaja")
batsmanRunsRanges("khwajaTest.csv","Khwaja")
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsman4s("stevesmithTest.csv","S Smith")
batsman6s("stevesmithTest.csv","S Smith")
batsmanMeanStrikeRate("stevesmithTest.csv","S Smith")
batsman4s("warnerTest.csv","Warner")
batsman6s("warnerTest.csv","Warner")
batsmanMeanStrikeRate("warnerTest.csv","Warner")
batsman4s("labuschagneTest.csv","M Labuschagne")
batsman6s("labuschagneTest.csv","M Labuschagne")
batsmanMeanStrikeRate("labuschagneTest.csv","M Labuschagne")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("cgreenTest.csv","C Green")
batsman6s("cgreenTest.csv","C Green")
batsmanMeanStrikeRate("cgreenTest.csv","C Green")
batsman4s("khwajaTest.csv","Khwaja")
batsman6s("khwajaTest.csv","Khwaja")
batsmanMeanStrikeRate("khwajaTest.csv","Khwaja")
This plot shows a combined boxplot of the Runs ranges and a histog2ram of the Runs Frequency
Smith, Labuschagne has an average of 53+ since 2016!! Warner & Khwaja are at ~46
batsmanPerfBoxHist("stevesmithTest.csv","S Smith")
batsmanPerfBoxHist("warnerTest.csv","Warner")
batsmanPerfBoxHist("labuschagneTest.csv","M Labuschagne")
batsmanPerfBoxHist("cgreenTest.csv","C Green")
batsmanPerfBoxHist("khwajaTest.csv","Khwaja")
For the 2 functions below you will have to use the getPlayerDataSp() function. Australia has won matches when Smith, Warner and Khwaja have played well.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("stevesmithsp.csv","S Smith")
## Warning in clean(file): NAs introduced by coercion
batsmanContributionWonLost("warnersp.csv","Warner")
## Warning in clean(file): NAs introduced by coercion
batsmanContributionWonLost("labuschagnesp.csv","M Labuschagne")
## Warning in clean(file): NAs introduced by coercion
batsmanContributionWonLost("cgreensp.csv","C Green")
batsmanContributionWonLost("khwajasp.csv","Khwaja")
This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("stevesmithsp.csv","S Smith")
## Warning in clean(file): NAs introduced by coercion
batsmanPerfHomeAway("warnersp.csv","Warner")
## Warning in clean(file): NAs introduced by coercion
batsmanPerfHomeAway("labuschagnesp.csv","M Labuschagne")
## Warning in clean(file): NAs introduced by coercion
batsmanPerfHomeAway("cgreensp.csv","C Green")
batsmanPerfHomeAway("khwajasp.csv","Khwaja")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("stevesmithTest.csv","S Smith")
batsmanAvgRunsGround("warnerTest.csv","Warner")
batsmanAvgRunsGround("labuschagneTest.csv","M Labuschagne")
batsmanAvgRunsGround("cgreenTest.csv","C Green")
batsmanAvgRunsGround("khwajaTest.csv","Khwaja")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("stevesmithTest.csv","S Smith")
batsmanAvgRunsOpposition("warnerTest.csv","Warner")
batsmanAvgRunsOpposition("labuschagneTest.csv","M Labuschagne")
batsmanAvgRunsOpposition("khwajaTest.csv","Khwaja")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanRunsLikelihood("stevesmithTest.csv","S Smith")
## Summary of S Smith 's runs scoring likelihood
## **************************************************
##
## There is a 58.76 % likelihood that S Smith will make 21 Runs in 38 balls over 56 Minutes
## There is a 24.74 % likelihood that S Smith will make 70 Runs in 148 balls over 210 Minutes
## There is a 16.49 % likelihood that S Smith will make 148 Runs in 268 balls over 398 Minutes
batsmanRunsLikelihood("warnerTest.csv","Warner")
## Summary of Warner 's runs scoring likelihood
## **************************************************
##
## There is a 7.22 % likelihood that Warner will make 155 Runs in 253 balls over 372 Minutes
## There is a 62.89 % likelihood that Warner will make 14 Runs in 21 balls over 32 Minutes
## There is a 29.9 % likelihood that Warner will make 65 Runs in 94 balls over 135 Minutes
batsmanRunsLikelihood("labuschagneTest.csv","M Labuschagne")
## Summary of M Labuschagne 's runs scoring likelihood
## **************************************************
##
## There is a 32.76 % likelihood that M Labuschagne will make 74 Runs in 144 balls over 206 Minutes
## There is a 55.17 % likelihood that M Labuschagne will make 22 Runs in 37 balls over 54 Minutes
## There is a 12.07 % likelihood that M Labuschagne will make 168 Runs in 297 balls over 420 Minutes
batsmanRunsLikelihood("khwajaTest.csv","Khwaja")
## Summary of Khwaja 's runs scoring likelihood
## **************************************************
##
## There is a 64.94 % likelihood that Khwaja will make 14 Runs in 29 balls over 42 Minutes
## There is a 27.27 % likelihood that Khwaja will make 79 Runs in 148 balls over 210 Minutes
## There is a 7.79 % likelihood that Khwaja will make 165 Runs in 351 balls over 515 Minutes
Smith and Warner’s moving average has been on a downward trend lately. Khwaja is playing well
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("stevesmith.csv","S Smith")
## Warning in clean(file): NAs introduced by coercion
batsmanMovingAverage("warner.csv","Warner")
## Warning in clean(file): NAs introduced by coercion
batsmanMovingAverage("labuschagne.csv","M Labuschagne")
## Warning in clean(file): NAs introduced by coercion
batsmanMovingAverage("khwaja.csv","Khwaja")
Labuschagne, SMith and Warner havwe very good cumulative average
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("stevesmithTest.csv","S Smith")
batsmanCumulativeAverageRuns("warnerTest.csv","Warner")
batsmanCumulativeAverageRuns("labuschagneTest.csv","M Labuschagne")
batsmanCumulativeAverageRuns("khwajaTest.csv","Khwaja")
Warner towers over the others in the cumulative strike rate, followed by Labuschagne and Smith
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("stevesmithTest.csv","S Smith")
batsmanCumulativeStrikeRate("warnerTest.csv","Warner")
batsmanCumulativeStrikeRate("labuschagneTest.csv","M Labuschagne")
batsmanCumulativeStrikeRate("khwajaTest.csv","Khwaja")
Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.
A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
Take a look at the runs forecasted for the batsman below.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("stevesmithTest.csv","S Smith")
batsmanPerfForecast("warnerTest.csv","Warner")
batsmanPerfForecast("labuschagneTest.csv","M Labuschagne")
batsmanPerfForecast("khwajaTest.csv","Khwaja")
The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanSR(frames,names)
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeRunsFreqPerf(frames,names)
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeAvgRuns(frames,names)
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeStrikeRate(frames,names)
The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 - Batsman In-Form If p-value < 0.05 - Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
This is done for the Top 4 batsman
checkBatsmanInForm("stevesmith.csv","S Smith")
## Warning in clean(file): NAs introduced by coercion
## [1] "**************************** Form status of S Smith ****************************\n\n Population size: 144 Mean of population: 53.76 \n Sample size: 17 Mean of sample: 45.65 SD of sample: 56.4 \n\n Null hypothesis H0 : S Smith 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : S Smith 's sample average is below the 95% confidence interval of population average\n\n S Smith 's Form Status: In-Form because the p value: 0.280533 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("warner.csv","Warner")
## Warning in clean(file): NAs introduced by coercion
## [1] "**************************** Form status of Warner ****************************\n\n Population size: 164 Mean of population: 45.2 \n Sample size: 19 Mean of sample: 26.63 SD of sample: 44.62 \n\n Null hypothesis H0 : Warner 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Warner 's sample average is below the 95% confidence interval of population average\n\n Warner 's Form Status: Out-of-Form because the p value: 0.042744 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("labuschagne.csv","M Labuschagne")
## Warning in clean(file): NAs introduced by coercion
## [1] "**************************** Form status of M Labuschagne ****************************\n\n Population size: 52 Mean of population: 59.56 \n Sample size: 6 Mean of sample: 29.67 SD of sample: 19.96 \n\n Null hypothesis H0 : M Labuschagne 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : M Labuschagne 's sample average is below the 95% confidence interval of population average\n\n M Labuschagne 's Form Status: Out-of-Form because the p value: 0.005239 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("khwaja.csv","Khwaja")
## [1] "**************************** Form status of Khwaja ****************************\n\n Population size: 89 Mean of population: 41.62 \n Sample size: 10 Mean of sample: 53.1 SD of sample: 76.34 \n\n Null hypothesis H0 : Khwaja 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Khwaja 's sample average is below the 95% confidence interval of population average\n\n Khwaja 's Form Status: In-Form because the p value: 0.677691 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
ssmith1 <- batsmanRunsPredict("stevesmith.csv","S Smith",newdataframe=newDF)
## Warning in clean(file): NAs introduced by coercion
warner1 <- batsmanRunsPredict("warner.csv","Warner",newdataframe=newDF)
## Warning in clean(file): NAs introduced by coercion
khwaja1 <- batsmanRunsPredict("khwaja.csv","Khwaja",newdataframe=newDF)
labuschagne1 <- batsmanRunsPredict("labuschagne.csv","Labuschagne",newdataframe=newDF)
## Warning in clean(file): NAs introduced by coercion
cgreen1 <- batsmanRunsPredict("cgreen.csv","C Green",newdataframe=newDF)
batsmen <-cbind(round(ssmith1$Runs),round(warner1$Runs),round(khwaja1$Runs),round(labuschagne1$Runs),round(cgreen1$Runs))
colnames(batsmen) <- c("S Smith","Warner","Khwaja","Labuschagne","C Green")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
## BallsFaced MinsAtCrease S Smith Warner Khwaja Labuschagne C Green
## 1 10 30 7 10 10 9 13
## 2 38 71 23 30 24 24 29
## 3 66 111 38 50 38 40 44
## 4 94 152 53 70 53 55 60
## 5 121 193 69 90 67 70 75
## 6 149 234 84 110 81 85 91
## 7 177 274 100 130 95 100 106
## 8 205 315 115 150 109 116 122
## 9 233 356 130 170 123 131 137
## 10 261 396 146 190 137 146 153
## 11 289 437 161 210 151 161 168
## 12 316 478 177 230 165 176 184
## 13 344 519 192 250 179 192 199
## 14 372 559 207 270 193 207 215
## 15 400 600 223 290 207 222 230
Labuschagne, Smith and C Green have good records against India
frames <- list("stevesmithTestInd.csv","warnerTestInd.csv","khwajaTestInd.csv","labuschagneTestInd.csv","cgreenTestInd.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeAvgRuns(frames,names)
Warner, Labuschagne and Smith have a good strike rate against India
frames <- list("stevesmithTestInd.csv","warnerTestInd.csv","khwajaTestInd.csv","labuschagneTestInd.csv","cgreenTestInd.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeStrikeRate(frames,names)
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("shamiTest.csv","Shami")
bowlerWktsFreqPercent("sirajTest.csv","Siraj")
bowlerWktsFreqPercent("ashwinTest.csv","Ashwin")
bowlerWktsFreqPercent("jadejaTest.csv","Jadeja")
bowlerWktsFreqPercent("shardulTest.csv","Shardul")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("shamiTest.csv","Shami")
bowlerWktsRunsPlot("sirajTest.csv","Siraj")
bowlerWktsRunsPlot("ashwinTest.csv","Ashwin")
bowlerWktsRunsPlot("jadejaTest.csv","Jadeja")
bowlerWktsRunsPlot("shardulTest.csv","Shardul")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("shamiTest.csv","Shami")
bowlerAvgWktsGround("sirajTest.csv","Siraj")
bowlerAvgWktsGround("ashwinTest.csv","Ashwin")
bowlerAvgWktsGround("jadejaTest.csv","Jadeja")
bowlerAvgWktsGround("shardulTest.csv","Shardul")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsOpposition("shamiTest.csv","Shami")
bowlerAvgWktsOpposition("sirajTest.csv","Siraj")
bowlerAvgWktsOpposition("ashwinTest.csv","Ashwin")
bowlerAvgWktsOpposition("jadejaTest.csv","Jadeja")
bowlerAvgWktsOpposition("shardulTest.csv","Shardul")
Ashwin’s performance has dropped over the years, while Siraj has been becoming better
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("shamiTest.csv","Shami")
bowlerCumulativeAvgWickets("sirajTest.csv","Siraj")
bowlerCumulativeAvgWickets("ashwinTest.csv","Ashwin")
bowlerCumulativeAvgWickets("jadejaTest.csv","Jadeja")
bowlerCumulativeAvgWickets("shardulTest.csv","Shardul")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("shamiTest.csv","Shami")
bowlerCumulativeAvgEconRate("sirajTest.csv","Siraj")
bowlerCumulativeAvgEconRate("ashwinTest.csv","Ashwin")
bowlerCumulativeAvgEconRate("jadejaTest.csv","Jadeja")
bowlerCumulativeAvgEconRate("shardulTest.csv","Shardul")
Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.
A Holt-Winters forecasting model is used to forecast future performance based on the 90% training set. The forecasted wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerPerfForecast("shamiTest.csv","Shami")
#bowlerPerfForecast("sirajTest.csv","Siraj")
bowlerPerfForecast("ashwinTest.csv","Ashwin")
bowlerPerfForecast("jadejaTest.csv","Jadeja")
bowlerPerfForecast("shardulTest.csv","Shardul")
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlingPerf(frames,names)
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlingER(frames,names)
Ashwin has the highest wickets followed by Jadeja against all teams
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgWickets(frames,names)
Jadeja has the best economy rate followed by Ashwin
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgEconRate(frames,names)
The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 - Batsman In-Form If p-value < 0.05 - Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
Note: The check for the form status of the bowlers indicate
checkBowlerInForm("shami.csv","Shami")
## [1] "**************************** Form status of Shami ****************************\n\n Population size: 106 Mean of population: 1.93 \n Sample size: 12 Mean of sample: 1.33 SD of sample: 1.23 \n\n Null hypothesis H0 : Shami 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Shami 's sample average is below the 95% confidence\n interval of population average\n\n Shami 's Form Status: In-Form because the p value: 0.058427 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("siraj.csv","Siraj")
## [1] "**************************** Form status of Siraj ****************************\n\n Population size: 29 Mean of population: 1.59 \n Sample size: 4 Mean of sample: 0.25 SD of sample: 0.5 \n\n Null hypothesis H0 : Siraj 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Siraj 's sample average is below the 95% confidence\n interval of population average\n\n Siraj 's Form Status: Out-of-Form because the p value: 0.002923 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("ashwin.csv","Ashwin")
## [1] "**************************** Form status of Ashwin ****************************\n\n Population size: 154 Mean of population: 2.77 \n Sample size: 18 Mean of sample: 2.44 SD of sample: 1.76 \n\n Null hypothesis H0 : Ashwin 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Ashwin 's sample average is below the 95% confidence\n interval of population average\n\n Ashwin 's Form Status: In-Form because the p value: 0.218345 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("jadeja.csv","Jadeja")
## [1] "**************************** Form status of Jadeja ****************************\n\n Population size: 108 Mean of population: 2.22 \n Sample size: 12 Mean of sample: 1.92 SD of sample: 2.35 \n\n Null hypothesis H0 : Jadeja 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Jadeja 's sample average is below the 95% confidence\n interval of population average\n\n Jadeja 's Form Status: In-Form because the p value: 0.333095 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("shardul.csv","Shardul")
## [1] "**************************** Form status of Shardul ****************************\n\n Population size: 13 Mean of population: 2 \n Sample size: 2 Mean of sample: 0.5 SD of sample: 0.71 \n\n Null hypothesis H0 : Shardul 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Shardul 's sample average is below the 95% confidence\n interval of population average\n\n Shardul 's Form Status: Out-of-Form because the p value: 0.04807 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
Against Australia specifically Jadeja has the best record followed by Ashwin
frames <- list("shamiTestAus.csv","sirajTestAus.csv","ashwinTestAus.csv","jadejaTestAus.csv","shardulTestAus.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgWickets(frames,names)
Jadeja has the best economy followed by Siraj, then Ashwin
frames <- list("shamiTestAus.csv","sirajTestAus.csv","ashwinTestAus.csv","jadejaTestAus.csv","shardulTestAus.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgEconRate(frames,names)
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("cumminsTest.csv","Cummins")
bowlerWktsFreqPercent("starcTest.csv","Starc")
bowlerWktsFreqPercent("hazzlewoodTest.csv","Hazzlewood")
bowlerWktsFreqPercent("todd.csv","Todd")
bowlerWktsFreqPercent("lyonTest.csv","N Lyon")
### 8b. Wickets frequency chart
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("cumminsTest.csv","Cummins")
bowlerWktsRunsPlot("starcTest.csv","Starc")
bowlerWktsRunsPlot("hazzlewoodTest.csv","Hazzlewood")
bowlerWktsRunsPlot("todd.csv","Todd")
bowlerWktsRunsPlot("lyonTest.csv","N Lyon")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("cumminsTest.csv","Cummins")
bowlerAvgWktsGround("starcTest.csv","Starc")
bowlerAvgWktsGround("hazzlewoodTest.csv","Hazzlewood")
bowlerAvgWktsGround("todd.csv","Todd")
bowlerAvgWktsGround("lyonTest.csv","N Lyon")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsOpposition("cumminsTest.csv","Cummins")
bowlerAvgWktsOpposition("starcTest.csv","Starc")
bowlerAvgWktsOpposition("hazzlewoodTest.csv","Hazzlewood")
bowlerAvgWktsOpposition("todd.csv","Todd")
bowlerAvgWktsOpposition("lyonTest.csv","N Lyon")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("cumminsTest.csv","Cummins")
bowlerCumulativeAvgWickets("starcTest.csv","Starc")
bowlerCumulativeAvgWickets("hazzlewoodTest.csv","Hazzlewood")
bowlerCumulativeAvgWickets("todd.csv","Todd")
bowlerCumulativeAvgWickets("lyonTest.csv","N Lyon")
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("cumminsTest.csv","Cummins")
bowlerCumulativeAvgEconRate("starcTest.csv","Starc")
bowlerCumulativeAvgEconRate("hazzlewoodTest.csv","Hazzlewood")
bowlerCumulativeAvgEconRate("todd.csv","Todd")
bowlerCumulativeAvgEconRate("lyonTest.csv","N Lyon")
Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.
A Holt-Winters forecasting model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerPerfForecast("cumminsTest.csv","Cummins")
bowlerPerfForecast("starcTest.csv","Starc")
bowlerPerfForecast("hazzlewoodTest.csv","Hazzlewood")
bowlerPerfForecast("lyonTest.csv","N Lyon")
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlingPerf(frames,names)
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlingER(frames,names)
Cummins, Starc and Lyons are the best performers
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlerCumulativeAvgWickets(frames,names)
Hazzlewood, Cummins have the best economy against all oppostion
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlerCumulativeAvgEconRate(frames,names)
The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 - Batsman In-Form If p-value < 0.05 - Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
Note: The check for the form status of the bowlers indicate
checkBowlerInForm("cummins.csv","Cummins")
## [1] "**************************** Form status of Cummins ****************************\n\n Population size: 81 Mean of population: 2.46 \n Sample size: 9 Mean of sample: 2 SD of sample: 1.5 \n\n Null hypothesis H0 : Cummins 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Cummins 's sample average is below the 95% confidence\n interval of population average\n\n Cummins 's Form Status: In-Form because the p value: 0.190785 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("starc.csv","Starc")
## [1] "**************************** Form status of Starc ****************************\n\n Population size: 126 Mean of population: 2.18 \n Sample size: 15 Mean of sample: 1.67 SD of sample: 1.18 \n\n Null hypothesis H0 : Starc 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Starc 's sample average is below the 95% confidence\n interval of population average\n\n Starc 's Form Status: In-Form because the p value: 0.057433 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("hazzlewood.csv","Hazzlewood")
## [1] "**************************** Form status of Hazzlewood ****************************\n\n Population size: 99 Mean of population: 2.04 \n Sample size: 12 Mean of sample: 1.67 SD of sample: 1.5 \n\n Null hypothesis H0 : Hazzlewood 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Hazzlewood 's sample average is below the 95% confidence\n interval of population average\n\n Hazzlewood 's Form Status: In-Form because the p value: 0.204787 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("lyon.csv","N Lyon")
## [1] "**************************** Form status of N Lyon ****************************\n\n Population size: 193 Mean of population: 2.08 \n Sample size: 22 Mean of sample: 2.95 SD of sample: 1.96 \n\n Null hypothesis H0 : N Lyon 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : N Lyon 's sample average is below the 95% confidence\n interval of population average\n\n N Lyon 's Form Status: In-Form because the p value: 0.975407 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
Against India Lyon, Cummins and Hazzlewood have performed well
frames <- list("cumminsTestInd.csv","starcTestInd.csv","hazzlewoodTestInd.csv","lyonTestInd.csv")
names <- list("Cummins","Starc","Hazzlewood","N Lyon")
relativeBowlerCumulativeAvgWickets(frames,names)
Hazzlewood, Lyon have a good economy rate against India
frames <- list("cumminsTestInd.csv","starcTestInd.csv","hazzlewoodTestInd.csv","lyonTestInd.csv")
names <- list("Cummins","Starc","Hazzlewood","N Lyon")
relativeBowlerCumulativeAvgEconRate(frames,names)
#The data for India & Australia teams were obtained with the following calls
#indiaTest <-getTeamDataHomeAway(dir=".",teamView="bat",matchType="Test",file="indiaTest.csv",save=TRUE,teamName="India")
#australiaTest <- getTeamDataHomeAway(matchType="Test",file="australiaTest.csv",save=TRUE,teamName="Australia")
Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24
teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)
teamWinLossStatusVsOpposition("australiaTest.csv",teamName="Australia",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)
Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24
teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("Australia"),homeOrAway=c("all"),matchType="Test",plot=TRUE)
At the Oval where WTC is going to be held India has won 4, lost 10 and drawn 10.
teamWinLossStatusAtGrounds("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("away"),matchType="Test",plot=TRUE)
plotTimelineofWinsLosses("indiaTest.csv",team="India",opposition=c("Australia"),
homeOrAway=c("away","neutral"), startDate="2016-01-01",endDate="2023-05-01")
The above analysis performs various analysis of India and Australia. While we know the performance of the player at India or Australia, we cannot judge how the match will progress in the swinging conditions of the Oval. Let us hope for a good match!
Feel free to try out your own analysis with cricketr. Have fun with cricketr!!