Full many a gem of purest ray serene,
The dark unfathomed caves of ocean bear;
Full many a flower is born to blush unseen,
And waste its sweetness on the desert air.
Thomas Gray, An Elegy Written In A Country Churchyard
It is finally here! cricpy, the python avatar , of my R package cricketr is now ready to rock-n-roll! My R package cricketr had its genesis about 3 and some years ago and went through a couple of enhancements. During this time I have always thought about creating an equivalent python package like cricketr. Now I have finally done it.
So here it is. My python package ‘cricpy!!!’
This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket
You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use
This post is also hosted on Rpubs at Introducing cricpy. You can also down the pdf version of this post at cricpy.pdf
Do check out my post on R package cricketr at Re-introducing cricketr! : An R package to analyze performances of cricketers
The cricpy package has several functions that perform several different analyses on both batsman and bowlers. The package has functions that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.
Other interesting functions include batting performance moving average, forecasting, performance of a player against different oppositions, contribution to wins and losses etc.
The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Rahul Dravid, Virat Kohli, Alastair Cook etc. This will bring up a page which have the profile number for the player e.g. for Rahul Dravid this would be http://www.espncricinfo.com/india/content/player/28114.html. Hence, Dravid’s profile is 28114. This can be used to get the data for Rahul Dravid as shown below
The cricpy package is now available with pip install cricpy!!!
The cricpy package is almost a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the lanuguages you can look up the package in the other and you will notice the parallel constructs. You can fork/clone the package at Github cricpy
The following 2 examples show the similarity between cricketr and cricpy packages
Importing cricketr in R
#install.packages("cricketr")
library(cricketr)
# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy
# You could either do
#1.
import cricpy.analytics as ca
#ca.batsman4s("../dravid.csv","Rahul Dravid")
# Or
#2.
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
## from pandas.core import datetools
from cricpy.analytics import *
#batsman4s("../dravid.csv","Rahul Dravid")
You can seen how the 2 calls are identical for both the R package cricketr and the Python package cricpy
library(cricketr)
batsman4s("../dravid.csv","Rahul Dravid")
import cricpy.analytics as ca
ca.batsman4s("../dravid.csv","Rahul Dravid")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:78: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
## runsPoly = poly.fit_transform(runs.reshape(-1,1))
#help("getPlayerData")
help(ca.getPlayerData)
## Help on function getPlayerData in module cricpy.analytics:
##
## getPlayerData(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2], result=[1, 2, 4], create=True)
## Get the player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##
## Description
##
## Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##
## Usage
##
## getPlayerData(profile,opposition="",host="",dir="./data",file="player001.csv",
## type="batting", homeOrAway=c(1,2),result=c(1,2,4))
## Arguments
##
## profile
## This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Sachin Tendulkar this turns out to be http://www.espncricinfo.com/india/content/player/35320.html. Hence the profile for Sachin is 35320
## opposition
## The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
## host
## The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
## dir
## Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
## file
## Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
## type
## type of data required. This can be "batting" or "bowling"
## homeOrAway
## This is a list with either 1,2 or both. 1 is for home 2 is for away
## result
## This is a list that can take values 1,2,4. 1 - won match 2- lost match 4- draw
## Details
##
## More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##
## Value
##
## Returns the player's dataframe
##
## Note
##
## Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##
## Author(s)
##
## Tinniam V Ganesh
##
## References
##
## http://www.espncricinfo.com/ci/content/stats/index.html
## https://gigadom.wordpress.com/
##
## See Also
##
## getPlayerDataSp
##
## Examples
##
## ## Not run:
## # Both home and away. Result = won,lost and drawn
## tendulkar = getPlayerData(35320,dir=".", file="tendulkar1.csv",
## type="batting", homeOrAway=[1,2],result=[1,2,4])
##
## # Only away. Get data only for won and lost innings
## tendulkar = getPlayerData(35320,dir=".", file="tendulkar2.csv",
## type="batting",homeOrAway=[2],result=[1,2])
##
## # Get bowling data and store in file for future
## kumble = getPlayerData(30176,dir=".",file="kumble1.csv",
## type="bowling",homeOrAway=[1],result=[1,2])
##
## #Get the Tendulkar's Performance against Australia in Australia
## tendulkar = getPlayerData(35320, opposition = 2,host=2,dir=".",
## file="tendulkarVsAusInAus.csv",type="batting")
The details below will introduce the different functions that are available in cricpy.
Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. dravid.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses
import cricpy.analytics as ca
#dravid =ca.getPlayerData(28114,dir="..",file="dravid.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
#acook =ca.getPlayerData(11728,dir="..",file="acook.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
import cricpy.analytics as ca
#lara =ca.getPlayerData(52337,dir="..",file="lara.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])253802
#kohli =ca.getPlayerData(253802,dir="..",file="kohli.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
The 3 plots below provide the following for Rahul Dravid
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("../dravid.csv","Rahul Dravid")
ca.batsmanMeanStrikeRate("../dravid.csv","Rahul Dravid")
ca.batsmanRunsRanges("../dravid.csv","Rahul Dravid")
import cricpy.analytics as ca
ca.batsman4s("../dravid.csv","Rahul Dravid")
ca.batsman6s("../dravid.csv","Rahul Dravid")
ca.batsmanDismissals("../dravid.csv","Rahul Dravid")
The plots below show the 3D scatter plot of Dravid Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease
import cricpy.analytics as ca
ca.battingPerf3d("../dravid.csv","Rahul Dravid")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1576: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df2['BF']=pd.to_numeric(df2['BF'])
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1577: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df2['Mins']=pd.to_numeric(df2['Mins'])
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1578: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df2['Runs']=pd.to_numeric(df2['Runs'])
The plot below gives the average runs scored by Dravid at different grounds. The plot also the number of innings at each ground as a label at x-axis. It can be seen Dravid did great in Rawalpindi, Leeds, Georgetown overseas and , Mohali and Bangalore at home
import cricpy.analytics as ca
ca.batsmanAvgRunsGround("../dravid.csv","Rahul Dravid")
This plot computes the average runs scored by Dravid against different countries. Dravid has an average of 50+ in England, New Zealand, West Indies and Zimbabwe.
import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("../dravid.csv","Rahul Dravid")
The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Dravid’s Tendulkar’s highest tendencies are computed and plotted using K-Means
import cricpy.analytics as ca
ca.batsmanRunsLikelihood("../dravid.csv","Rahul Dravid")
The following batsmen have been very prolific in test cricket and will be used for teh analyses
The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs
This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency
import cricpy.analytics as ca
ca.batsmanPerfBoxHist("../dravid.csv","Rahul Dravid")
ca.batsmanPerfBoxHist("../acook.csv","Alastair Cook")
ca.batsmanPerfBoxHist("../lara.csv","Brian Lara")
ca.batsmanPerfBoxHist("../kohli.csv","Virat Kohli")
The plot below shows the contribution of Dravid, Cook, Lara and Kohli in matches won and lost. It can be seen that in matches where India has won Dravid and Kohli have scored more and must have been instrumental in the win
For the 2 functions below you will have to use the getPlayerDataSp() function as shown below. I have commented this as I already have these files
import cricpy.analytics as ca
#dravidsp = ca.getPlayerDataSp(28114,tdir=".",tfile="dravidsp.csv",ttype="batting")
#acooksp = ca.getPlayerDataSp(11728,tdir=".",tfile="acooksp.csv",ttype="batting")
#larasp = ca.getPlayerDataSp(52337,tdir=".",tfile="larasp.csv",ttype="batting")
#kohlisp = ca.getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
import cricpy.analytics as ca
ca.batsmanContributionWonLost("../dravidsp.csv","Rahul Dravid")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:412: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## won['status']="won"
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:413: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## lost['status']="lost"
ca.batsmanContributionWonLost("../acooksp.csv","Alastair Cook")
ca.batsmanContributionWonLost("../larasp.csv","Brian Lara")
ca.batsmanContributionWonLost("../kohlisp.csv","Virat Kohli")
From the plot below it can be seen
Dravid has a higher median overseas than at home.Cook, Lara and Kohli have a lower median of runs overseas than at home.
This function also requires the use of getPlayerDataSp() as shown above
import cricpy.analytics as ca
ca.batsmanPerfHomeAway("../dravidsp.csv","Rahul Dravid")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1111: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## home['venue']="Home"
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1112: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## away['venue']="Overseas"
ca.batsmanPerfHomeAway("../acooksp.csv","Alastair Cook")
ca.batsmanPerfHomeAway("../larasp.csv","Brian Lara")
ca.batsmanPerfHomeAway("../kohlisp.csv","Virat Kohli")
Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Lara’s performance seems to have been quite good before his retirement(wonder why retired so early!). Kohli’s performance has been steadily improving over the years
import cricpy.analytics as ca
ca.batsmanMovingAverage("../dravid.csv","Rahul Dravid")
ca.batsmanMovingAverage("../acook.csv","Alastair Cook")
ca.batsmanMovingAverage("../lara.csv","Brian Lara")
ca.batsmanMovingAverage("../kohli.csv","Virat Kohli")
This function provides the cumulative average runs of the batsman over the career. Dravid averages around 48, Cook around 44, Lara around 50 and Kohli shows a steady improvement in his cumulative average. Kohli seems to be getting better with time.
import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("../dravid.csv","Rahul Dravid")
ca.batsmanCumulativeAverageRuns("../acook.csv","Alastair Cook")
ca.batsmanCumulativeAverageRuns("../lara.csv","Brian Lara")
ca.batsmanCumulativeAverageRuns("../kohli.csv","Virat Kohli")
Lara has a terrific strike rate of 52+. Cook has a better strike rate over Dravid. Kohli’s strike rate has improved over the years.
import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("../dravid.csv","Rahul Dravid")
ca.batsmanCumulativeStrikeRate("../acook.csv","Alastair Cook")
ca.batsmanCumulativeStrikeRate("../lara.csv","Brian Lara")
ca.batsmanCumulativeStrikeRate("../kohli.csv","Virat Kohli")
##17 Future Runs forecast Here are plots that forecast how the batsman will perform in future. Currently ARIMA has been used for the forecast
import cricpy.analytics as ca
ca.batsmanPerfForecast("../dravid.csv","Rahul Dravid")
## ARIMA Model Results
## ==============================================================================
## Dep. Variable: D.runs No. Observations: 284
## Model: ARIMA(5, 1, 0) Log Likelihood -1522.837
## Method: css-mle S.D. of innovations 51.488
## Date: Mon, 29 Oct 2018 AIC 3059.673
## Time: 19:16:22 BIC 3085.216
## Sample: 07-04-1996 HQIC 3069.914
## - 01-24-2012
## ================================================================================
## coef std err z P>|z| [0.025 0.975]
## --------------------------------------------------------------------------------
## const -0.1336 0.884 -0.151 0.880 -1.867 1.599
## ar.L1.D.runs -0.7729 0.058 -13.322 0.000 -0.887 -0.659
## ar.L2.D.runs -0.6234 0.071 -8.753 0.000 -0.763 -0.484
## ar.L3.D.runs -0.5199 0.074 -7.038 0.000 -0.665 -0.375
## ar.L4.D.runs -0.3490 0.071 -4.927 0.000 -0.488 -0.210
## ar.L5.D.runs -0.2116 0.058 -3.665 0.000 -0.325 -0.098
## Roots
## =============================================================================
## Real Imaginary Modulus Frequency
## -----------------------------------------------------------------------------
## AR.1 0.5789 -1.1743j 1.3093 -0.1771
## AR.2 0.5789 +1.1743j 1.3093 0.1771
## AR.3 -1.3617 -0.0000j 1.3617 -0.5000
## AR.4 -0.7227 -1.2257j 1.4230 -0.3348
## AR.5 -0.7227 +1.2257j 1.4230 0.3348
## -----------------------------------------------------------------------------
## 0
## count 284.000000
## mean -0.306769
## std 51.632947
## min -106.653589
## 25% -33.835148
## 50% -8.954253
## 75% 21.024763
## max 223.152901
##
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:646: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
## if issubdtype(paramsdtype, float):
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:650: FutureWarning: Conversion of the second argument of issubdtype from `complex` to `np.complexfloating` is deprecated. In future, it will be treated as `np.complex128 == np.dtype(complex).type`.
## elif issubdtype(paramsdtype, complex):
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:577: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
## if issubdtype(paramsdtype, float):
The plot below compares the Relative cumulative average runs of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 30 - 100 inninfs - Lara leads followed by Dravid Range 100+ innings - Kohli races ahead of the rest
import cricpy.analytics as ca
frames = ["../dravid.csv","../acook.csv","../lara.csv","../kohli.csv"]
names = ["Dravid","A Cook","Brian Lara","V Kohli"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show
Brian Lara towers over the Dravid, Cook and Kohli. However you will notice that Kohli’s strike rate is going up
import cricpy.analytics as ca
frames = ["../dravid.csv","../acook.csv","../lara.csv","../kohli.csv"]
names = ["Dravid","A Cook","Brian Lara","V Kohli"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)
The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted
import cricpy.analytics as ca
ca.battingPerf3d("../dravid.csv","Rahul Dravid")
ca.battingPerf3d("../acook.csv","Alastair Cook")
ca.battingPerf3d("../lara.csv","Brian Lara")
ca.battingPerf3d("../kohli.csv","Virat Kohli")
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
dravid = ca.batsmanRunsPredict("../dravid.csv",newDF,"Dravid")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1402: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df['BF']=pd.to_numeric(df['BF'])
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1403: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df['Runs']=pd.to_numeric(df['Runs'])
print(dravid)
## BF Mins Runs
## 0 10.000000 30.000000 0.519667
## 1 37.857143 70.714286 13.821794
## 2 65.714286 111.428571 27.123920
## 3 93.571429 152.142857 40.426046
## 4 121.428571 192.857143 53.728173
## 5 149.285714 233.571429 67.030299
## 6 177.142857 274.285714 80.332425
## 7 205.000000 315.000000 93.634552
## 8 232.857143 355.714286 106.936678
## 9 260.714286 396.428571 120.238805
## 10 288.571429 437.142857 133.540931
## 11 316.428571 477.857143 146.843057
## 12 344.285714 518.571429 160.145184
## 13 372.142857 559.285714 173.447310
## 14 400.000000 600.000000 186.749436
The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.
The following 3 bowlers have had an excellent career and will be used for the analysis
How do Glenn McGrath, Kapil Dev and James Anderson compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.
This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line
import cricpy.analytics as ca
#mcgrath =ca.getPlayerData(6565,dir=".",file="mcgrath.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#kapil =ca.getPlayerData(30028,dir=".",file="kapil.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#anderson =ca.getPlayerData(8608,dir=".",file="anderson.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
This plot below plots the frequency of wickets taken for each of the bowlers
import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("../mcgrath.csv","Glenn McGrath")
ca.bowlerWktsFreqPercent("../kapil.csv","Kapil Dev")
ca.bowlerWktsFreqPercent("../anderson.csv","James Anderson")
The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken
import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("../mcgrath.csv","Glenn McGrath")
ca.bowlerWktsRunsPlot("../kapil.csv","Kapil Dev")
ca.bowlerWktsRunsPlot("../anderson.csv","James Anderson")
The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur
import cricpy.analytics as ca
ca.bowlerAvgWktsGround("../mcgrath.csv","Glenn McGrath")
ca.bowlerAvgWktsGround("../kapil.csv","Kapil Dev")
ca.bowlerAvgWktsGround("../anderson.csv","James Anderson")
The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team
import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("../mcgrath.csv","Glenn McGrath")
ca.bowlerAvgWktsOpposition("../kapil.csv","Kapil Dev")
ca.bowlerAvgWktsOpposition("../anderson.csv","James Anderson")
From th eplot below it can be see James Anderson has had a solid performance over the years averaging about wickets
import cricpy.analytics as ca
ca.bowlerMovingAverage("../mcgrath.csv","Glenn McGrath")
ca.bowlerMovingAverage("../kapil.csv","Kapil Dev")
ca.bowlerMovingAverage("../anderson.csv","James Anderson")
The plots below give the cumulative average wickets taken by the bowlers. mcGrath plateaus around 2.4 wickets, Kapil Dev’s performance deteriorates over the years. Anderson holds on rock steady around 2 wickets
import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("../mcgrath.csv","Glenn McGrath")
ca.bowlerCumulativeAvgWickets("../kapil.csv","Kapil Dev")
ca.bowlerCumulativeAvgWickets("../anderson.csv","James Anderson")
The plots below give the cumulative average economy rate of the bowlers. McGrath’s was very expensive early in his career conceding about 2.8 runs per over which drops to around 2.5 runs towards the end. Kapil Dev’s economy rate drops from 3.6 to 2.8. Anderson is probably more expensive than the other 2.
import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("../mcgrath.csv","Glenn McGrath")
ca.bowlerCumulativeAvgEconRate("../kapil.csv","Kapil Dev")
ca.bowlerCumulativeAvgEconRate("../anderson.csv","James Anderson")
import cricpy.analytics as ca
ca.bowlerPerfForecast("../mcgrath.csv","Glenn McGrath")
## ARIMA Model Results
## ==============================================================================
## Dep. Variable: D.Wickets No. Observations: 236
## Model: ARIMA(5, 1, 0) Log Likelihood -480.815
## Method: css-mle S.D. of innovations 1.851
## Date: Sun, 28 Oct 2018 AIC 975.630
## Time: 09:28:32 BIC 999.877
## Sample: 11-12-1993 HQIC 985.404
## - 01-02-2007
## ===================================================================================
## coef std err z P>|z| [0.025 0.975]
## -----------------------------------------------------------------------------------
## const 0.0037 0.033 0.113 0.910 -0.061 0.068
## ar.L1.D.Wickets -0.9432 0.064 -14.708 0.000 -1.069 -0.818
## ar.L2.D.Wickets -0.7254 0.086 -8.469 0.000 -0.893 -0.558
## ar.L3.D.Wickets -0.4827 0.093 -5.217 0.000 -0.664 -0.301
## ar.L4.D.Wickets -0.3690 0.085 -4.324 0.000 -0.536 -0.202
## ar.L5.D.Wickets -0.1709 0.064 -2.678 0.008 -0.296 -0.046
## Roots
## =============================================================================
## Real Imaginary Modulus Frequency
## -----------------------------------------------------------------------------
## AR.1 0.5630 -1.2761j 1.3948 -0.1839
## AR.2 0.5630 +1.2761j 1.3948 0.1839
## AR.3 -0.8433 -1.0820j 1.3718 -0.3554
## AR.4 -0.8433 +1.0820j 1.3718 0.3554
## AR.5 -1.5981 -0.0000j 1.5981 -0.5000
## -----------------------------------------------------------------------------
## 0
## count 236.000000
## mean -0.005142
## std 1.856961
## min -3.457002
## 25% -1.433391
## 50% -0.080237
## 75% 1.446149
## max 5.840050
As discussed above the next 2 charts require the use of getPlayerDataSp()
import cricpy.analytics as ca
#mcgrathsp =ca.getPlayerDataSp(6565,tdir=".",tfile="mcgrathsp.csv",ttype="bowling")
#kapilsp =ca.getPlayerDataSp(30028,tdir=".",tfile="kapilsp.csv",ttype="bowling")
#andersonsp =ca.getPlayerDataSp(8608,tdir=".",tfile="andersonsp.csv",ttype="bowling")
The plot below is extremely interesting Glenn McGrath has been more instrumental in Australia winning than Kapil and Anderson as seems to have taken more wickets when Australia won.
import cricpy.analytics as ca
ca.bowlerContributionWonLost("../mcgrathsp.csv","Glenn McGrath")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1856: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## won['status']="won"
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:1857: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## lost['status']="lost"
ca.bowlerContributionWonLost("../kapilsp.csv","Kapil Dev")
ca.bowlerContributionWonLost("../andersonsp.csv","James Anderson")
McGrath and Kapil Dev have performed better overseas than at home. Anderson has performed about the same home and overseas
import cricpy.analytics as ca
ca.bowlerPerfHomeAway("../mcgrathsp.csv","Glenn McGrath")
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:2373: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## home['venue']="Home"
## C:\Users\Ganesh\ANACON~1\lib\site-packages\cricpy\analytics.py:2374: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## away['venue']="Overseas"
ca.bowlerPerfHomeAway("../kapilsp.csv","Kapil Dev")
ca.bowlerPerfHomeAway("../andersonsp.csv","James Anderson")
The Relative cumulative economy rate shows that McGrath has the best economy rate followed by Kapil Dev and then Anderson.
import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)
McGrath has been economical regardless of the number of wickets taken. Kapil Dev has been slightly more expensive when he takes more wickets
import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlingER(frames,names)
The plot below shows that McGrath has the best overall cumulative average wickets. Kapil’s leads Anderson till about 150 innings after which Anderson takes over
import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)
The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.
Here are the main findings from the analysis above
The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing