Comparing Adaptability to Pitch Variety

Spinners make up an important part of any cricket team. Having a good spinner in test cricket can win you games you otherwise wouldn’t, particularly if you’re bowling on day 5. However, at the international level the pitches being played on differ from country to country so bowlers need to be able to adapt to the different conditions to have success.

Bowling strike rate will be used in this case study to measure success. It is a measure of how many balls on average a given spinner bowls before they take a wicket. The lower the bowling strike rate is, the more successful a spinner is.

This case study seeks to compare some of the highest wicket taking spinners’ success playing in different countries to see if any trends or insights emerge. There has also been an attempt to include spinners from many countries in an attempt to improve diversity.

Data

Data used in this case study was sourced from the R package cricketdata. The package contains functions to source individual player data from all matches played from ESPNCricinfo. Please see the following citations for the cricketdata package and the R statistical program.

cricketdata - Hyndman R, Gray C, Gupta S, Hyndman T, Rafique H, Tran J (2023). cricketdata: International Cricket Data. R package version 0.2.3, https://CRAN.R-project.org/package=cricketdata.

R program - R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Data Processing

First step was to load in the packages that will be used for this analysis:

library(tidyverse)
library(cricketdata)
library(janitor)
library(skimr)

Data was obtained using the fetch_player_data function from the cricketdata package. In order to use the function you need to know the Cricinfo player ID for the specific player. Thankfully, the cricketdata package also has a function for this. I will work through the process here for Australian spinner Shane Warne to provide an example:

find_player_id("Shane Warne")
## # A tibble: 1 × 5
##      ID Name                   searchstring Country Played                      
##   <dbl> <chr>                  <chr>        <chr>   <chr>                       
## 1  8166 SK Warne (Shane Warne) Shane Warne  AUS     Test matches player (1991/9…

This reveals that the player ID for Shane Warne is 8166. This can be used to obtain the individual player data for Shane Warne:

s_warne <- fetch_player_data(8166, matchtype = "test", activity = "bowling")

This generates a table of the bowling statistics of Shane Warne for his entire test career. However, the data in this table needs to be manipulated to suit the needs of this study. First, some table formatting to make things easier:

s_warne1 <- s_warne %>% 
  clean_names() %>% 
  filter(, overs != "DNB" & overs !=
           "TDNB") %>%
  mutate(overs = as.double(overs), mdns
         = as.integer(mdns), wkts =
           as.integer(wkts), econ =
           as.double(econ))

This code changes all column names to the same format with all lowercase for ease of use. It then filters out the occasions where the spinner didn’t bowl for whatever reason, we’re only interested in their impact when they did bowl. Finally, it changes the data type of certain columns from string to numeric which will be important for calculations.

Next, since strike rate is calculated from the number of balls bowled divided by the number of wickets taken, a column will be added for balls bowled which can be calculated from the “overs” column since 1 over = 6 balls.

s_warne2 <- s_warne1 %>% 
  mutate(balls =
           ((trunc(overs)*6)+((overs-trunc(overs))*10)))

This code accounts for how the number of overs are recorded. For example, if a bowler is recorded to have bowled 5.5 overs this does not mean that they bowled 5 and a half overs (33 balls total), it actually means that they bowled 5 overs (30 balls) and 5 balls for a total of 35 balls.

This study seeks to observe differences in bowler success depending on pitch conditions due to where the match is played. The player data obtained only reveals the ground that the match took place. A new column for the country the match took place needs to be generated to aggregate by country. So a code to assign the country based on the ground needed to be developed. The following code lists the different grounds the spinner played at through their career:

unique(s_warne2$ground)
##  [1] "Sydney"        "Adelaide"      "Colombo (SSC)" "Moratuwa"     
##  [5] "Melbourne"     "W.A.C.A"       "Christchurch"  "Wellington"   
##  [9] "Auckland"      "Manchester"    "Lord's"        "Nottingham"   
## [13] "Leeds"         "Birmingham"    "The Oval"      "Hobart"       
## [17] "Brisbane"      "Johannesburg"  "Cape Town"     "Durban"       
## [21] "Karachi"       "Rawalpindi"    "Lahore"        "Bridgetown"   
## [25] "St John's"     "Port of Spain" "Kingston"      "Gqeberha"     
## [29] "Centurion"     "Chennai"       "Eden Gardens"  "Bengaluru"    
## [33] "Kandy"         "Galle"         "Harare"        "Hamilton"     
## [37] "Wankhede"      "Colombo (PSS)" "Sharjah"       "Darwin"       
## [41] "Cairns"        "Nagpur"        "Fatullah"      "Chattogram"

This list can be used to manually generate a code that assigns the correct country. The code can then be reused for the next player’s data and updated with additional grounds that the next player had played at that the previous player had not. This is the completed code to assign countries for this study:

s_warne3 <- s_warne2 %>% 
  mutate(country = (ifelse(ground == "Perth" | ground == "Adelaide" | ground == "Brisbane" | ground == "Sydney" | ground == "W.A.C.A" | ground == "Melbourne" | ground == "Hobart" | ground == "Darwin" | ground == "Cairns" | ground == "Canberra", "Australia", ifelse(ground == "Manchester" | ground == "Lord's" | ground == "Nottingham" | ground == "Leeds" | ground== "Birmingham" | ground == "The Oval" | ground == "Cardiff" | ground == "Chester-le-Street" | ground == "Southampton", "England", ifelse(ground == "Christchurch" | ground == "Wellington" | ground == "Auckland" | ground == "Hamilton" | ground == "Dunedin" | ground == "Napier" | ground == "Mount Maunganui", "New Zealand", ifelse(ground == "Bridgetown" | ground == "St John's" | ground == "Port of Spain"| ground == "Kingston" | ground == "Providence" | ground == "Gros Islet" | ground == "Roseau" | ground == "Georgetown" | ground == "Basseterre" | ground == "North Sound", "West Indies", ifelse(ground == "Harare" | ground == "Bulawayo", "Zimbabwe", ifelse(ground == "Fatullah" | ground == "Chattogram" | ground == "Mirpur" | ground == "Bogra" | ground == "Dhaka" | ground == "Khulna", "Bangladesh", ifelse(ground == "Kandy" | ground == "Galle" | ground == "Colombo (SSC)" | ground == "Colombo (PSS)" | ground == "Moratuwa" | ground == "Colombo RPS" | ground == "Pallekele", "Sri Lanka", ifelse(ground == "Karachi" | ground == "Rawalpindi" | ground == "Lahore" | ground == "Faisalabad" | ground == "Peshawar" | ground == "Sialkot" | ground == "Multan", "Pakistan", ifelse(ground == "Sharjah" | ground == "Dubai (DICS)" | ground == "Abu Dhabi", "UAE", ifelse(ground == "Johannesburg" | ground == "Cape Town" | ground == "Durban" | ground == "Gqeberha" | ground == "Centurion" | ground == "Bloemfontein" | ground == "St George's" | ground == "Potchefstroom", "South Africa", "India"))))))))))))

The data can now be distilled into a summary table and the process repeated for each individual spinner. The summary tables of each spinner can then be combined into a table consisting of all spinners included in this study.

Data Analysis

The following code generates the table for an individual player that will be merged with all other players’ tables that are included in this study:

s_warne_merge <- s_warne3 %>% 
  group_by(country) %>%
  summarise(balls_agg = sum(balls),
            wkts_agg = sum(wkts),
            econ_agg = mean(econ)) %>%
  mutate(s_r = balls_agg/wkts_agg, player = "S Warne")

This generates a table with the data aggregated by country and includes the success measure for this study - bowling strike rate (s_r). It also generates a column for the player name so data can be aggregated by player after the merge. The code for the final merged table with all spinners included in this study is as follows:

spin_bowlers <- bind_rows(s_warne_merge, m_murali_merge, a_kumble_merge, d_vettori_merge, g_swann_merge, h_singh_merge, n_lyon_merge, r_ashwin_merge, r_benaud_merge, r_herath_merge, s_macgill_merge, h_tayfield_merge, l_gibbs_merge, k_maharaj_merge, y_shah_merge)

This table now enables the desired comparison between spinners across different conditions in each country. The following graph shows the strike rates of spinners in each country they have played in:

spin_bowlers %>% ggplot() +
  geom_col(mapping = aes(x = player, y
                         = s_r, fill =
                           player)) +
  facet_wrap(~country) + 
  theme_bw() +
  theme(axis.text.x =
          element_text(angle = 90,
                       vjust = 0.5,
                       hjust = 1)) +
  scale_fill_manual(values = spin_bowlers_fill) +
  labs(title = "Spinner Strike Rates by
       Country", caption = "Based on
       data from ESPNCricinfo") +
  ylab("Strike Rate") + 
  xlab("Player") + 
  guides(fill = "none")

Some initial insights from this visual include

These observations can be backed up with the numbers showing the combined strike rate of all the bowlers in each country.

spin_bowlers %>% group_by(country) %>% summarize(s_r = sum(balls_agg)/sum(wkts_agg))
## # A tibble: 11 × 2
##    country        s_r
##    <chr>        <dbl>
##  1 Australia     66.8
##  2 Bangladesh    46.0
##  3 England       66.0
##  4 India         59.1
##  5 New Zealand   71.8
##  6 Pakistan      70.4
##  7 South Africa  66.8
##  8 Sri Lanka     52.8
##  9 UAE           56.7
## 10 West Indies   66.4
## 11 Zimbabwe      62.6

The numbers actually show Pakistan and New Zealand as the countries with the overall highest strike rates. Pakistan proving more difficult for spinners is surprising since it is in the subcontinent which has always been associated with spin friendly pitches. In fact, India, Sri Lanka, and Bangladesh are all spin friendly.

We can also look at the lowest strike rates in each country.

low_sr <- spin_bowlers %>% 
  filter(, country == "Australia") %>%
  filter(, s_r == min(s_r)) %>%
  bind_rows((spin_bowlers %>% 
               filter(, country == "Bangladesh") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "England") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "India") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "New Zealand") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "Pakistan") %>% filter(, s_r == min(s_r))), (spin_bowlers %>%  filter(, country == "South Africa") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%  filter(, country == "Sri Lanka") %>%  filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, country == "UAE") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%  filter(, country == "West Indies") %>%  filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, country == "Zimbabwe") %>%   filter(, s_r == min(s_r)))) %>%  print.AsIs()
##         country balls_agg wkts_agg econ_agg      s_r        player
## 1     Australia      7196      135 3.232593 53.30370     S Macgill
## 2    Bangladesh       487       15 2.885000 32.46667      A Kumble
## 3       England      2317       48 3.021000 48.27083 M Muralidaran
## 4         India     17628      383 2.949291 46.02611      R Ashwin
## 5   New Zealand       705       23 3.811250 30.65217        N Lyon
## 6      Pakistan       766       15 3.124000 51.06667     S Macgill
## 7  South Africa      1455       30 2.485556 48.50000      R Benaud
## 8     Sri Lanka      1904       48 2.954375 39.66667       S Warne
## 9           UAE       414       16 2.002500 25.87500       S Warne
## 10  West Indies       803       22 2.270000 36.50000     K Maharaj
## 11     Zimbabwe       696       19 2.507500 36.63158      R Herath

Again, some interesting insights:

We can repeat the process for the highest strike rates

high_sr <- spin_bowlers %>% 
  filter(, country == "Australia") %>%
  filter(, s_r == max(s_r)) %>%
  bind_rows((spin_bowlers %>% 
               filter(, country == "Bangladesh") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "England") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "India") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "New Zealand") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "Pakistan") %>% filter(, s_r == max(s_r))), (spin_bowlers %>%  filter(, country == "South Africa") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%  filter(, country == "Sri Lanka") %>%  filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, country == "UAE") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%  filter(, country == "West Indies") %>%  filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, country == "Zimbabwe") %>%   filter(, s_r == max(s_r)))) %>% print.AsIs()
##         country balls_agg wkts_agg econ_agg       s_r        player
## 1     Australia       806        5 3.608571 161.20000     K Maharaj
## 2    Bangladesh       162        1 2.200000 162.00000       L Gibbs
## 3       England      2619       25 2.694500 104.76000      R Benaud
## 4         India       762        6 4.403333 127.00000     K Maharaj
## 5   New Zealand       977        8 3.422500 122.12500      R Herath
## 6      Pakistan       486        0 4.250000       Inf       H Singh
## 7  South Africa       196        1 4.120000 196.00000        Y Shah
## 8     Sri Lanka      2417       26 3.598750  92.96154       H Singh
## 9           UAE      1547       15 3.157500 103.13333        N Lyon
## 10  West Indies       618        3 2.112500 206.00000      R Herath
## 11     Zimbabwe      2019       26 2.372500  77.65385 M Muralidaran

With this we can now look at the differences between the highest and lowest strike rates.

data.frame(country = high_sr$country, s_r_diff = high_sr$s_r - low_sr$s_r)
##         country  s_r_diff
## 1     Australia 107.89630
## 2    Bangladesh 129.53333
## 3       England  56.48917
## 4         India  80.97389
## 5   New Zealand  91.47283
## 6      Pakistan       Inf
## 7  South Africa 147.50000
## 8     Sri Lanka  53.29487
## 9           UAE  77.25833
## 10  West Indies 169.50000
## 11     Zimbabwe  41.02227

This gives some insight into difficulties in adapting - at least for the spinners included in this study. A small difference suggests that the gap between the least successful bowler and the most successful bowler is at a minimum which would suggest that even the bowlers who struggled with the conditions weren’t that much worse than the most successful. Conversely, a large difference suggests that while some bolwers had success, at least one bowler performed very poorly in those conditions.

Finally, we can look at each individual players highest and lowest strike rates.

player_min_sr <- spin_bowlers %>%
  filter(, player == "A Kumble") %>%
  filter(, s_r == min(s_r)) %>%
  bind_rows((spin_bowlers %>% 
               filter(, player == "D Vettori") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "G Swann") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "H Singh") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "H Tayfield") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "K Maharaj") %>% filter(, s_r == min(s_r))), (spin_bowlers %>%  filter(, player == "L Gibbs") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%  filter(, player == "M Muralidaran") %>%  filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, player == "N Lyon") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%  filter(, player == "R Ashwin") %>%  filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, player == "R Benaud") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, player == "R Herath") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, player == "S Macgill") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, player == "S Warne") %>%   filter(, s_r == min(s_r))), (spin_bowlers %>%   filter(, player == "Y Shah") %>%   filter(, s_r == min(s_r))))

player_max_sr <- spin_bowlers %>%
  filter(, player == "A Kumble") %>%
  filter(, s_r == max(s_r)) %>%
  bind_rows((spin_bowlers %>% 
               filter(, player == "D Vettori") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "G Swann") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "H Singh") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "H Tayfield") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "K Maharaj") %>% filter(, s_r == max(s_r))), (spin_bowlers %>%  filter(, player == "L Gibbs") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%  filter(, player == "M Muralidaran") %>%  filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, player == "N Lyon") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%  filter(, player == "R Ashwin") %>%  filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, player == "R Benaud") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, player == "R Herath") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, player == "S Macgill") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, player == "S Warne") %>%   filter(, s_r == max(s_r))), (spin_bowlers %>%   filter(, player == "Y Shah") %>%   filter(, s_r == max(s_r))))

bind_cols(player = player_min_sr$player, min_country = player_min_sr$country, min_sr = player_min_sr$s_r, max_country = player_max_sr$country, max_sr = player_max_sr$s_r, sr_diff = player_max_sr$s_r-player_min_sr$s_r)
## # A tibble: 15 × 6
##    player        min_country  min_sr max_country  max_sr sr_diff
##    <chr>         <chr>         <dbl> <chr>         <dbl>   <dbl>
##  1 A Kumble      Bangladesh     32.5 New Zealand   104.     72.0
##  2 D Vettori     Bangladesh     36.8 Pakistan      240     203. 
##  3 G Swann       Sri Lanka      45.4 Australia      98.5    53.1
##  4 H Singh       West Indies    56.1 Pakistan      Inf     Inf  
##  5 H Tayfield    Australia      55.8 England        79.1    23.3
##  6 K Maharaj     West Indies    36.5 Australia     161.    125. 
##  7 L Gibbs       Pakistan       67.3 Bangladesh    162      94.7
##  8 M Muralidaran Bangladesh     40.6 Australia     131      90.4
##  9 N Lyon        New Zealand    30.7 Pakistan      110.     78.8
## 10 R Ashwin      Sri Lanka      41.2 South Africa  110      68.8
## 11 R Benaud      South Africa   48.5 England       105.     56.3
## 12 R Herath      Zimbabwe       36.6 West Indies   206     169. 
## 13 S Macgill     Bangladesh     34.1 Sri Lanka      66.4    32.3
## 14 S Warne       UAE            25.9 India          81.0    55.2
## 15 Y Shah        Sri Lanka      49   South Africa  196     147

And we can count how many times each country appears as these spinners most or least successful locations

merge((player_min_sr %>% count(country, name = "min_sr_count")), (player_max_sr %>% count(country, name = "max_sr_count")), by.x = "country", by.y = "country", all = TRUE)
##         country min_sr_count max_sr_count
## 1     Australia            1            3
## 2    Bangladesh            4            1
## 3       England           NA            2
## 4         India           NA            1
## 5   New Zealand            1            1
## 6      Pakistan            1            3
## 7  South Africa            1            2
## 8     Sri Lanka            3            1
## 9           UAE            1           NA
## 10  West Indies            2            1
## 11     Zimbabwe            1           NA

The difference in strike rates between a spinner’s worst country and their best country show which spinners were able to best adapt to every condition they played in and which spinners struggled in at least one country. The spinner with the least difference was Hugh Tayfield but it should be stated that we only have data of him playing in 4 countries (Australia, England, New Zealand, and South Africa) so he didn’t need to adapt as much as others who played in many more countries. The worst (outside of Harbhajan Singh who has that infinite strike rate in Pakistan) was Daniel Vettori who, like most, found success in Bangladesh but, like most, struggled with the conditions in Pakistan. The count of each country appearing as a spinner’s most successful or least successful location showed similar results to what has been found previously. Namely that Bangladesh and Sri Lanka were the best locations for several spinners and Pakistan and Australia were the worst locations for many spinners.

Conclusion

The goal of this study was to compare adaptability across the different conditions posed by different countries of some successful spinners. The study is limited to only the spinners included in this study which has the potential to skew results. The most successful, highest wicket taking spinners should be the most adaptable and should hopefully provide a best case scenario for spinners bowling in different conditions. Choosing spinners from many different countries was also desirable to increase diversity in the pitches the spinners grew up playing on.

Overall, Sri Lanka was consistently among the most spin friendly countries in all metrics. Any good spinner will likely enjoy bowling there. The surprise of this study is Pakistan. Being on the subcontinent you might expect it to be spin friendly but it was consistently one of the hardest conditions for spin according to the data in this study. Perhaps this is reflected when considering some of Pakistan’s best bowlers. Many would immediately think of Waqar Younis, Wasim Akram, and Shoaib Akhtar who are all pace bowlers. Finally, the difference in each players best and worst strike rates gave insight into the most adaptable players. Tayfield, Macgill, Swann, Warne, Benaud, and Ashwin all had relatively low differences in their strike rates, all of them were able to find some success in all of the locations they played. But even for them there were some locations that were more challenging than others.

Spin bowlers are an important part of a test team. The best bowlers, who are never forgotten, are the ones that find a way to succeed consistently in as many conditions as possible to keep their team on top.