“Gold medals aren’t really made of gold. They’re made of sweat, determination, & a hard-to-find alloy called guts.” Dan Gable
“It doesn’t matter whether you are pursuing success in business, sports, the arts, or life in general: The bridge between wishing and accomplishing is discipline” Harvey Mackay
“I won’t predict anything historic. But nothing is impossible.” Michael Phelps
In this post, I introduce 2 new functions in my Python package ‘cricpy’ (cricpy v0.20) see Introducing cricpy:A python package to analyze performances of cricketers which enable granular analysis of batsmen and bowlers. They are
Note All the existing cricpy functions can be used on this smaller fine-grained data set for a closer analysis of players
This post has been published in Rpubs and can be accessed at Cricpy performs granular analysis of players
You can download a PDF version of this post at Cricpy performs granular analysis of players
Note:I have also updated cricpy template with the latest changes. See cricpy-template
The following functions analyze Rahul Dravid during 3 different periods of his illustrious career. a) 1st Jan 2001-1st Jan 2002 b) 1st Jan 2004-1st Jan 2005 c) 1st Jan 2009-1st Jan 2010
import cricpy.analytics as ca
# Get the homeOrAway dataset for Dravid in matches
# Note:Since I have already got the data I reuse the CSV file
#df=ca.getPlayerDataHA(28114,tfile="dravidTestHA.csv",matchType="Test")
# Get Dravid's data for 2001-02
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
## from pandas.core import datetools
df1=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTest2001.csv",startDate="2001-01-01",endDate="2002-01-01")
# Get Dravid's data for 2004-05
df2=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTest2004.csv", startDate="2004-01-01",endDate="2005-01-01")
# Get Dravid's data for 2009-10
df3=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTest2009.csv",startDate="2009-01-01",endDate="2010-01-01")
`
Note: Any of the cricpy functions can be used on the fine-grained subset of data as below.
import cricpy.analytics as ca
ca.batsmanAvgRunsGround("dravidTest2001.csv","Dravid-2001")
ca.batsmanAvgRunsGround("dravidTest2004.csv","Dravid-2004")
ca.batsmanAvgRunsGround("dravidTest2009.csv","Dravid-2009")
import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("dravidTest2001.csv","Dravid-2001")
ca.batsmanAvgRunsOpposition("dravidTest2004.csv","Dravid-2004")
ca.batsmanAvgRunsOpposition("dravidTest2009.csv","Dravid-2009")
The plot below compares Dravid’s cumulative strike rate and cumulative average during 3 different stages of his career
import cricpy.analytics as ca
frames=["dravidTest2001.csv","dravidTest2004.csv","dravidTest2009.csv"]
names=["Dravid-2001","Dravid-2004","Dravid-2009"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)
ca.relativeBatsmanCumulativeStrikeRate(frames,names)
The analysis below looks at Kohli’s performance against England in ‘away’ venues (England) in 2014 and 2018
import cricpy.analytics as ca
# Get the homeOrAway data for Kohli in Test matches
#df=ca.getPlayerDataHA(253802,tfile="kohliTestHA.csv",type="batting",matchType="Test")
# Get the homeOrAway data for Kohli in Test matches
df=ca.getPlayerDataHA(253802,tfile="kohliTestHA.csv",type="batting",matchType="Test")
# Get the subset if data of Kohli's performance against England in England in 2014
## Working...
df=ca.getPlayerDataOppnHA(infile="kohliTestHA.csv",outfile="kohliTestEng2014.csv", opposition=["England"],homeOrAway=["away"],startDate="2014-01-01",endDate="2015-01-01")
# Get the subset if data of Kohli's performance against England in England in 2018
df1=ca.getPlayerDataOppnHA(infile="kohliTestHA.csv",outfile="kohliTestEng2018.csv",
opposition=["England"],homeOrAway=["away"],startDate="2018-01-01",endDate="2019-01-01")
Kohli had a miserable outing to England in 2014 with a string of low scores. In 2018 Kohli pulls himself out of the morass
import cricpy.analytics as ca
ca.batsmanAvgRunsGround("kohliTestEng2014.csv","Kohli-Eng-2014")
ca.batsmanAvgRunsGround("kohliTestEng2018.csv","Kohli-Eng-2018")
Kohli’s cumulative average runs in 2014 is in the low 15s, while in 2018 it is 70+. Kohli stamps his class back again and undoes the bad memories of 2014
import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("kohliTestEng2014.csv", "Kohli-Eng-2014")
ca.batsmanCumulativeAverageRuns("kohliTestEng2018.csv", "Kohli-Eng-2018")
The analyses below compares the performances of Sourav Ganguly, Rahul Dravid and VVS Laxman against Australia, South Africa, and England in ‘away’ venues between 01 Jan 2002 to 01 Jan 2008
import cricpy.analytics as ca
#Get the HA data for Ganguly, Dravid and Laxman
#df=ca.getPlayerDataHA(28779,tfile="gangulyTestHA.csv",type="batting",matchType="Test")
#df=ca.getPlayerDataHA(28114,tfile="dravidTestHA.csv",type="batting",matchType="Test")
#df=ca.getPlayerDataHA(30750,tfile="laxmanTestHA.csv",type="batting",matchType="Test")
# Slice the data
df=ca.getPlayerDataOppnHA(infile="gangulyTestHA.csv",outfile="gangulyTestAES2002-08.csv" ,opposition=["Australia", "England", "South Africa"], homeOrAway=["away"],startDate="2002-01-01",endDate="2008-01-01")
df=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTestAES2002-08.csv" ,opposition=["Australia", "England", "South Africa"], homeOrAway=["away"],startDate="2002-01-01",endDate="2008-01-01")
df=ca.getPlayerDataOppnHA(infile="laxmanTestHA.csv",outfile="laxmanTestAES2002-08.csv",opposition=["Australia", "England", "South Africa"], homeOrAway=["away"],startDate="2002-01-01",endDate="2008-01-01")
Plot the relative cumulative average runs and relative cumative strike rate of Ganguly, Dravid and Laxman
-Dravid towers over Laxman and Ganguly with respect to cumulative average runs. - Ganguly has a superior strike rate followed by Laxman and then Dravid
import cricpy.analytics as ca
frames=["gangulyTestAES2002-08.csv","dravidTestAES2002-08.csv","laxmanTestAES2002-08.csv"]
names=["GangulyAusEngSA2002-08","DravidAusEngSA2002-08","LaxmanAusEngSA2002-08"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)
ca.relativeBatsmanCumulativeStrikeRate(frames,names)
Compare the performances of Rohit Sharma, Joe Root and Kane williamson in away & neutral venues against Australia, West Indies and Soouth Africa
import cricpy.analytics as ca
# Get the ODI HA data for Rohit, Root and Williamson
#df=ca.getPlayerDataHA(34102,tfile="rohitODIHA.csv",type="batting",matchType="ODI")
#df=ca.getPlayerDataHA(303669,tfile="joerootODIHA.csv",type="batting",matchType="ODI")
#df=ca.getPlayerDataHA(277906,tfile="williamsonODIHA.csv",type="batting",matchType="ODI")
# Subset the data for specific opposition in away and neutral venues
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
## from pandas.core import datetools
df=ca.getPlayerDataOppnHA(infile="rohitODIHA.csv",outfile="rohitODIAusWISA.csv"
,opposition=["Australia", "West Indies", "South Africa"],
homeOrAway=["away","neutral"])
df=ca.getPlayerDataOppnHA(infile="joerootODIHA.csv",outfile="joerootODIAusWISA.csv"
,opposition=["Australia", "West Indies", "South Africa"],
homeOrAway=["away","neutral"])
df=ca.getPlayerDataOppnHA(infile="williamsonODIHA.csv",outfile="williamsonODIAusWiSA.csv",opposition=["Australia", "West Indies", "South Africa"], homeOrAway=["away","neutral"])
The relative cumulative strike rate of all 3 are comparable
import cricpy.analytics as ca
frames=["rohitODIAusWISA.csv","joerootODIAusWISA.csv","williamsonODIAusWiSA.csv"]
names=["Rohit-ODI-AusWISA","Joe Root-ODI-AusWISA","Williamson-ODI-AusWISA"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)
ca.relativeBatsmanCumulativeStrikeRate(frames,names)
Plot the performances of Dhoni against Australia, West Indies, South Africa and England
import cricpy.analytics as ca
# Get the HA T20 data for Dhoni
#df=ca.getPlayerDataHA(28081,tfile="dhoniT20HA.csv",type="batting",matchType="T20")
#Subset the data
df=ca.getPlayerDataOppnHA(infile="dhoniT20HA.csv",outfile="dhoniT20AusWISAEng.csv",opposition=["Australia", "West Indies", "South Africa","England"], homeOrAway=["all"])
Note You can use any of cricpy’s functions against the fine grained data
import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("dhoniT20AusWISAEng.csv","Dhoni")
ca.batsmanAvgRunsGround("dhoniT20AusWISAEng.csv","Dhoni")
ca.batsmanCumulativeStrikeRate("dhoniT20AusWISAEng.csv","Dhoni")
ca.batsmanCumulativeAverageRuns("dhoniT20AusWISAEng.csv","Dhoni")
Compute the performances of Kumble, Warne and Maralitharan against New Zealand, West Indies, South Africa and England in pitches that are not ‘home’ pithes
import cricpy.analytics as ca
# Get the bowling data for Kumble, Warne and Muralitharan in Test matches
#df=ca.getPlayerDataHA(30176,tfile="kumbleTestHA.csv",type="bowling",matchType="Test")
#df=ca.getPlayerDataHA(8166,tfile="warneTestHA.csv",type="bowling",matchType="Test")
#df=ca.getPlayerDataHA(49636,tfile="muraliTestHA.csv",type="bowling",matchType="Test")
# Subset the data
df=ca.getPlayerDataOppnHA(infile="kumbleTestHA.csv",outfile="kumbleTest-NZWISAEng.csv",opposition=["New Zealand", "West Indies", "South Africa","England"],
homeOrAway=["away"])
df=ca.getPlayerDataOppnHA(infile="warneTestHA.csv",outfile="warneTest-NZWISAEng.csv"
,opposition=["New Zealand", "West Indies", "South Africa","England"], homeOrAway=["away"])
df=ca.getPlayerDataOppnHA(infile="muraliTestHA.csv",outfile="muraliTest-NZWISAEng.csv"
,opposition=["New Zealand", "West Indies", "South Africa","England"], homeOrAway=["away"])
import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("kumbleTest-NZWISAEng.csv","Kumble-NZWISAEng-AN")
ca.bowlerAvgWktsOpposition("warneTest-NZWISAEng.csv","Warne-NZWISAEng-AN")
ca.bowlerAvgWktsOpposition("muraliTest-NZWISAEng.csv","Murali-NZWISAEng-AN")
import cricpy.analytics as ca
ca.bowlerAvgWktsGround("kumbleTest-NZWISAEng.csv","Kumble")
ca.bowlerAvgWktsGround("warneTest-NZWISAEng.csv","Warne")
ca.bowlerAvgWktsGround("muraliTest-NZWISAEng.csv","Murali")
import cricpy.analytics as ca
frames=["kumbleTest-NZWISAEng.csv","warneTest-NZWISAEng.csv","muraliTest-NZWISAEng.csv"]
names=["Kumble","Warne","Murali"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)
ca.relativeBowlerCumulativeAvgWickets(frames,names)
import cricpy.analytics as ca
# Get the HA data for Bumrah in ODI in bowling
#df=ca.getPlayerDataHA(625383,tfile="bumrahODIHA.csv",type="bowling",matchType="ODI")
# Slice the data for periods 2016, 2017 and 2018
df=ca.getPlayerDataOppnHA(infile="bumrahODIHA.csv",outfile="bumrahODI2016.csv",
startDate="2016-01-01",endDate="2017-01-01")
df=ca.getPlayerDataOppnHA(infile="bumrahODIHA.csv",outfile="bumrahODI2017.csv",
startDate="2017-01-01",endDate="2018-01-01")
df=ca.getPlayerDataOppnHA(infile="bumrahODIHA.csv",outfile="bumrahODI2018.csv",
startDate="2018-01-01",endDate="2019-01-01")
import cricpy.analytics as ca
frames=["bumrahODI2016.csv","bumrahODI2017.csv","bumrahODI2018.csv"]
names=["Bumrah-2016","Bumrah-2017","Bumrah-2018"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)
ca.relativeBowlerCumulativeAvgWickets(frames,names)
import cricpy.analytics as ca
# Get the HA bowling data for Shakib, Bumrah and Jadeja
#df=ca.getPlayerDataHA(56143,tfile="shakibT20HA.csv",type="bowling",matchType="T20")
#df=ca.getPlayerDataHA(625383,tfile="bumrahT20HA.csv",type="bowling",matchType="T20")
#df=ca.getPlayerDataHA(234675,tfile="jadejaT20HA.csv",type="bowling",matchType="T20")
# Slice the data for performances against Sri Lanka, Australia, South Africa and England
df=ca.getPlayerDataOppnHA(infile="shakibT20HA.csv",outfile="shakibT20-SLAusSAEng.csv" ,opposition=["Sri Lanka","Australia", "South Africa","England"],
homeOrAway=["all"])
df=ca.getPlayerDataOppnHA(infile="bumrahT20HA.csv",outfile="bumrahT20-SLAusSAEng.csv",opposition=["Sri Lanka","Australia", "South Africa","England"],
homeOrAway=["all"])
df=ca.getPlayerDataOppnHA(infile="jadejaT20HA.csv",outfile="jadejaT20-SLAusSAEng.csv" ,opposition=["Sri Lanka","Australia", "South Africa","England"], homeOrAway=["all"])
import cricpy.analytics as ca
frames=["shakibT20-SLAusSAEng.csv","bumrahT20-SLAusSAEng.csv","jadejaT20-SLAusSAEng.csv"]
names=["Shakib-SLAusSAEng","Bumrah-SLAusSAEng","Jadeja-SLAusSAEng"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)
ca.relativeBowlerCumulativeAvgWickets(frames,names)
By getting the homeOrAway data for players using the profileNo, you can slice and dice the data based on your choice of opposition, whether you want matches that were played at home/away/neutral venues. Finally by specifying the period for which the data has to be subsetted you can create fine grained analysis.
Hope you have a great time with cricpy!!!