“Gold medals aren’t really made of gold. They’re made of sweat, determination, & a hard-to-find alloy called guts.” Dan Gable

“It doesn’t matter whether you are pursuing success in business, sports, the arts, or life in general: The bridge between wishing and accomplishing is discipline” Harvey Mackay

“I won’t predict anything historic. But nothing is impossible.” Michael Phelps

Introduction

In this post, I introduce 2 new functions in my Python package ‘cricpy’ (cricpy v0.20) see Introducing cricpy:A python package to analyze performances of cricketers which enable granular analysis of batsmen and bowlers. They are

  1. Step 1: getPlayerDataHA - This function is a wrapper around getPlayerData(), getPlayerDataOD() and getPlayerDataTT(), and adds an extra column ‘homeOrAway’ which says whether the match was played at home/away/neutral venues. A CSV file is created with this new column.
  2. Step 2: getPlayerDataOppnHA - This function allows you to slice & dice the data for batsmen and bowlers against specific oppositions, at home/away/neutral venues and between certain periods. This reducedsubset of data can be used to perform analyses. A CSV file is created as an output based on the parameters of opposition, home or away and the interval of time

Note All the existing cricpy functions can be used on this smaller fine-grained data set for a closer analysis of players

This post has been published in Rpubs and can be accessed at Cricpy performs granular analysis of players

You can download a PDF version of this post at Cricpy performs granular analysis of players

Note:I have also updated cricpy template with the latest changes. See cricpy-template

1. Analyzing Rahul Dravid at 3 different stages of his career

The following functions analyze Rahul Dravid during 3 different periods of his illustrious career. a) 1st Jan 2001-1st Jan 2002 b) 1st Jan 2004-1st Jan 2005 c) 1st Jan 2009-1st Jan 2010

import cricpy.analytics as ca
# Get the homeOrAway dataset for Dravid in matches
# Note:Since I have already got the data I reuse the CSV file
#df=ca.getPlayerDataHA(28114,tfile="dravidTestHA.csv",matchType="Test")

# Get Dravid's data for 2001-02
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
##   from pandas.core import datetools
df1=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTest2001.csv",startDate="2001-01-01",endDate="2002-01-01")

# Get Dravid's data for 2004-05
df2=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTest2004.csv", startDate="2004-01-01",endDate="2005-01-01")

# Get Dravid's data for 2009-10
df3=ca.getPlayerDataOppnHA(infile="dravidTestHA.csv",outfile="dravidTest2009.csv",startDate="2009-01-01",endDate="2010-01-01")

`

1a. Plot the performance of Dravid at venues during 2001,2004,2009

Note: Any of the cricpy functions can be used on the fine-grained subset of data as below.

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("dravidTest2001.csv","Dravid-2001")

ca.batsmanAvgRunsGround("dravidTest2004.csv","Dravid-2004")

ca.batsmanAvgRunsGround("dravidTest2009.csv","Dravid-2009")

1b. Plot the performance of Dravid against different oppositions during 2001,2004,2009

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("dravidTest2001.csv","Dravid-2001")