Spinners make up an important part of any cricket team. Having a good spinner in test cricket can win you games you otherwise wouldn’t, particularly if you’re bowling on day 5. However, at the international level the pitches being played on differ from country to country so bowlers need to be able to adapt to the different conditions to have success.
Bowling strike rate will be used in this case study to measure success. It is a measure of how many balls on average a given spinner bowls before they take a wicket. The lower the bowling strike rate is, the more successful a spinner is.
This case study seeks to compare some of the highest wicket taking spinners’ success playing in different countries to see if any trends or insights emerge. There has also been an attempt to include spinners from many countries in an attempt to improve diversity.
Data
Data used in this case study was sourced from the R package
cricketdata
. The package contains functions to source
individual player data from all matches played from ESPNCricinfo. Please
see the following citations for the cricketdata
package and
the R statistical program.
cricketdata
- Hyndman R, Gray C, Gupta S, Hyndman T,
Rafique H, Tran J (2023). cricketdata: International Cricket
Data. R package version 0.2.3, https://CRAN.R-project.org/package=cricketdata.
R program - R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Data Processing
First step was to load in the packages that will be used for this analysis:
library(tidyverse)
library(cricketdata)
library(janitor)
library(skimr)
Data was obtained using the fetch_player_data
function
from the cricketdata
package. In order to use the function
you need to know the Cricinfo player ID for the specific player.
Thankfully, the cricketdata
package also has a function for
this. I will work through the process here for Australian spinner Shane
Warne to provide an example:
find_player_id("Shane Warne")
## # A tibble: 1 × 5
## ID Name searchstring Country Played
## <dbl> <chr> <chr> <chr> <chr>
## 1 8166 SK Warne (Shane Warne) Shane Warne AUS Test matches player (1991/9…
This reveals that the player ID for Shane Warne is 8166. This can be used to obtain the individual player data for Shane Warne:
s_warne <- fetch_player_data(8166, matchtype = "test", activity = "bowling")
This generates a table of the bowling statistics of Shane Warne for his entire test career. However, the data in this table needs to be manipulated to suit the needs of this study. First, some table formatting to make things easier:
s_warne1 <- s_warne %>%
clean_names() %>%
filter(, overs != "DNB" & overs !=
"TDNB") %>%
mutate(overs = as.double(overs), mdns
= as.integer(mdns), wkts =
as.integer(wkts), econ =
as.double(econ))
This code changes all column names to the same format with all lowercase for ease of use. It then filters out the occasions where the spinner didn’t bowl for whatever reason, we’re only interested in their impact when they did bowl. Finally, it changes the data type of certain columns from string to numeric which will be important for calculations.
Next, since strike rate is calculated from the number of balls bowled divided by the number of wickets taken, a column will be added for balls bowled which can be calculated from the “overs” column since 1 over = 6 balls.
s_warne2 <- s_warne1 %>%
mutate(balls =
((trunc(overs)*6)+((overs-trunc(overs))*10)))
This code accounts for how the number of overs are recorded. For example, if a bowler is recorded to have bowled 5.5 overs this does not mean that they bowled 5 and a half overs (33 balls total), it actually means that they bowled 5 overs (30 balls) and 5 balls for a total of 35 balls.
This study seeks to observe differences in bowler success depending on pitch conditions due to where the match is played. The player data obtained only reveals the ground that the match took place. A new column for the country the match took place needs to be generated to aggregate by country. So a code to assign the country based on the ground needed to be developed. The following code lists the different grounds the spinner played at through their career:
unique(s_warne2$ground)
## [1] "Sydney" "Adelaide" "Colombo (SSC)" "Moratuwa"
## [5] "Melbourne" "W.A.C.A" "Christchurch" "Wellington"
## [9] "Auckland" "Manchester" "Lord's" "Nottingham"
## [13] "Leeds" "Birmingham" "The Oval" "Hobart"
## [17] "Brisbane" "Johannesburg" "Cape Town" "Durban"
## [21] "Karachi" "Rawalpindi" "Lahore" "Bridgetown"
## [25] "St John's" "Port of Spain" "Kingston" "Gqeberha"
## [29] "Centurion" "Chennai" "Eden Gardens" "Bengaluru"
## [33] "Kandy" "Galle" "Harare" "Hamilton"
## [37] "Wankhede" "Colombo (PSS)" "Sharjah" "Darwin"
## [41] "Cairns" "Nagpur" "Fatullah" "Chattogram"
This list can be used to manually generate a code that assigns the correct country. The code can then be reused for the next player’s data and updated with additional grounds that the next player had played at that the previous player had not. This is the completed code to assign countries for this study:
s_warne3 <- s_warne2 %>%
mutate(country = (ifelse(ground == "Perth" | ground == "Adelaide" | ground == "Brisbane" | ground == "Sydney" | ground == "W.A.C.A" | ground == "Melbourne" | ground == "Hobart" | ground == "Darwin" | ground == "Cairns" | ground == "Canberra", "Australia", ifelse(ground == "Manchester" | ground == "Lord's" | ground == "Nottingham" | ground == "Leeds" | ground== "Birmingham" | ground == "The Oval" | ground == "Cardiff" | ground == "Chester-le-Street" | ground == "Southampton", "England", ifelse(ground == "Christchurch" | ground == "Wellington" | ground == "Auckland" | ground == "Hamilton" | ground == "Dunedin" | ground == "Napier" | ground == "Mount Maunganui", "New Zealand", ifelse(ground == "Bridgetown" | ground == "St John's" | ground == "Port of Spain"| ground == "Kingston" | ground == "Providence" | ground == "Gros Islet" | ground == "Roseau" | ground == "Georgetown" | ground == "Basseterre" | ground == "North Sound", "West Indies", ifelse(ground == "Harare" | ground == "Bulawayo", "Zimbabwe", ifelse(ground == "Fatullah" | ground == "Chattogram" | ground == "Mirpur" | ground == "Bogra" | ground == "Dhaka" | ground == "Khulna", "Bangladesh", ifelse(ground == "Kandy" | ground == "Galle" | ground == "Colombo (SSC)" | ground == "Colombo (PSS)" | ground == "Moratuwa" | ground == "Colombo RPS" | ground == "Pallekele", "Sri Lanka", ifelse(ground == "Karachi" | ground == "Rawalpindi" | ground == "Lahore" | ground == "Faisalabad" | ground == "Peshawar" | ground == "Sialkot" | ground == "Multan", "Pakistan", ifelse(ground == "Sharjah" | ground == "Dubai (DICS)" | ground == "Abu Dhabi", "UAE", ifelse(ground == "Johannesburg" | ground == "Cape Town" | ground == "Durban" | ground == "Gqeberha" | ground == "Centurion" | ground == "Bloemfontein" | ground == "St George's" | ground == "Potchefstroom", "South Africa", "India"))))))))))))
The data can now be distilled into a summary table and the process repeated for each individual spinner. The summary tables of each spinner can then be combined into a table consisting of all spinners included in this study.
Data Analysis
The following code generates the table for an individual player that will be merged with all other players’ tables that are included in this study:
s_warne_merge <- s_warne3 %>%
group_by(country) %>%
summarise(balls_agg = sum(balls),
wkts_agg = sum(wkts),
econ_agg = mean(econ)) %>%
mutate(s_r = balls_agg/wkts_agg, player = "S Warne")
This generates a table with the data aggregated by country and includes the success measure for this study - bowling strike rate (s_r). It also generates a column for the player name so data can be aggregated by player after the merge. The code for the final merged table with all spinners included in this study is as follows:
spin_bowlers <- bind_rows(s_warne_merge, m_murali_merge, a_kumble_merge, d_vettori_merge, g_swann_merge, h_singh_merge, n_lyon_merge, r_ashwin_merge, r_benaud_merge, r_herath_merge, s_macgill_merge, h_tayfield_merge, l_gibbs_merge, k_maharaj_merge, y_shah_merge)
This table now enables the desired comparison between spinners across different conditions in each country. The following graph shows the strike rates of spinners in each country they have played in:
spin_bowlers %>% ggplot() +
geom_col(mapping = aes(x = player, y
= s_r, fill =
player)) +
facet_wrap(~country) +
theme_bw() +
theme(axis.text.x =
element_text(angle = 90,
vjust = 0.5,
hjust = 1)) +
scale_fill_manual(values = spin_bowlers_fill) +
labs(title = "Spinner Strike Rates by
Country", caption = "Based on
data from ESPNCricinfo") +
ylab("Strike Rate") +
xlab("Player") +
guides(fill = "none")
Some initial insights from this visual include
These observations can be backed up with the numbers showing the combined strike rate of all the bowlers in each country.
spin_bowlers %>% group_by(country) %>% summarize(s_r = sum(balls_agg)/sum(wkts_agg))
## # A tibble: 11 × 2
## country s_r
## <chr> <dbl>
## 1 Australia 66.8
## 2 Bangladesh 46.0
## 3 England 66.0
## 4 India 59.1
## 5 New Zealand 71.8
## 6 Pakistan 70.4
## 7 South Africa 66.8
## 8 Sri Lanka 52.8
## 9 UAE 56.7
## 10 West Indies 66.4
## 11 Zimbabwe 62.6
The numbers actually show Pakistan and New Zealand as the countries with the overall highest strike rates. Pakistan proving more difficult for spinners is surprising since it is in the subcontinent which has always been associated with spin friendly pitches. In fact, India, Sri Lanka, and Bangladesh are all spin friendly.
We can also look at the lowest strike rates in each country.
low_sr <- spin_bowlers %>%
filter(, country == "Australia") %>%
filter(, s_r == min(s_r)) %>%
bind_rows((spin_bowlers %>%
filter(, country == "Bangladesh") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "England") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "India") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "New Zealand") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "Pakistan") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "South Africa") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "Sri Lanka") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "UAE") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "West Indies") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, country == "Zimbabwe") %>% filter(, s_r == min(s_r)))) %>% print.AsIs()
## country balls_agg wkts_agg econ_agg s_r player
## 1 Australia 7196 135 3.232593 53.30370 S Macgill
## 2 Bangladesh 487 15 2.885000 32.46667 A Kumble
## 3 England 2317 48 3.021000 48.27083 M Muralidaran
## 4 India 17628 383 2.949291 46.02611 R Ashwin
## 5 New Zealand 705 23 3.811250 30.65217 N Lyon
## 6 Pakistan 766 15 3.124000 51.06667 S Macgill
## 7 South Africa 1455 30 2.485556 48.50000 R Benaud
## 8 Sri Lanka 1904 48 2.954375 39.66667 S Warne
## 9 UAE 414 16 2.002500 25.87500 S Warne
## 10 West Indies 803 22 2.270000 36.50000 K Maharaj
## 11 Zimbabwe 696 19 2.507500 36.63158 R Herath
Again, some interesting insights:
We can repeat the process for the highest strike rates
high_sr <- spin_bowlers %>%
filter(, country == "Australia") %>%
filter(, s_r == max(s_r)) %>%
bind_rows((spin_bowlers %>%
filter(, country == "Bangladesh") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "England") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "India") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "New Zealand") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "Pakistan") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "South Africa") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "Sri Lanka") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "UAE") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "West Indies") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, country == "Zimbabwe") %>% filter(, s_r == max(s_r)))) %>% print.AsIs()
## country balls_agg wkts_agg econ_agg s_r player
## 1 Australia 806 5 3.608571 161.20000 K Maharaj
## 2 Bangladesh 162 1 2.200000 162.00000 L Gibbs
## 3 England 2619 25 2.694500 104.76000 R Benaud
## 4 India 762 6 4.403333 127.00000 K Maharaj
## 5 New Zealand 977 8 3.422500 122.12500 R Herath
## 6 Pakistan 486 0 4.250000 Inf H Singh
## 7 South Africa 196 1 4.120000 196.00000 Y Shah
## 8 Sri Lanka 2417 26 3.598750 92.96154 H Singh
## 9 UAE 1547 15 3.157500 103.13333 N Lyon
## 10 West Indies 618 3 2.112500 206.00000 R Herath
## 11 Zimbabwe 2019 26 2.372500 77.65385 M Muralidaran
With this we can now look at the differences between the highest and lowest strike rates.
data.frame(country = high_sr$country, s_r_diff = high_sr$s_r - low_sr$s_r)
## country s_r_diff
## 1 Australia 107.89630
## 2 Bangladesh 129.53333
## 3 England 56.48917
## 4 India 80.97389
## 5 New Zealand 91.47283
## 6 Pakistan Inf
## 7 South Africa 147.50000
## 8 Sri Lanka 53.29487
## 9 UAE 77.25833
## 10 West Indies 169.50000
## 11 Zimbabwe 41.02227
This gives some insight into difficulties in adapting - at least for the spinners included in this study. A small difference suggests that the gap between the least successful bowler and the most successful bowler is at a minimum which would suggest that even the bowlers who struggled with the conditions weren’t that much worse than the most successful. Conversely, a large difference suggests that while some bolwers had success, at least one bowler performed very poorly in those conditions.
Finally, we can look at each individual players highest and lowest strike rates.
player_min_sr <- spin_bowlers %>%
filter(, player == "A Kumble") %>%
filter(, s_r == min(s_r)) %>%
bind_rows((spin_bowlers %>%
filter(, player == "D Vettori") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "G Swann") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "H Singh") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "H Tayfield") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "K Maharaj") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "L Gibbs") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "M Muralidaran") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "N Lyon") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "R Ashwin") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "R Benaud") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "R Herath") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "S Macgill") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "S Warne") %>% filter(, s_r == min(s_r))), (spin_bowlers %>% filter(, player == "Y Shah") %>% filter(, s_r == min(s_r))))
player_max_sr <- spin_bowlers %>%
filter(, player == "A Kumble") %>%
filter(, s_r == max(s_r)) %>%
bind_rows((spin_bowlers %>%
filter(, player == "D Vettori") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "G Swann") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "H Singh") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "H Tayfield") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "K Maharaj") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "L Gibbs") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "M Muralidaran") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "N Lyon") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "R Ashwin") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "R Benaud") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "R Herath") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "S Macgill") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "S Warne") %>% filter(, s_r == max(s_r))), (spin_bowlers %>% filter(, player == "Y Shah") %>% filter(, s_r == max(s_r))))
bind_cols(player = player_min_sr$player, min_country = player_min_sr$country, min_sr = player_min_sr$s_r, max_country = player_max_sr$country, max_sr = player_max_sr$s_r, sr_diff = player_max_sr$s_r-player_min_sr$s_r)
## # A tibble: 15 × 6
## player min_country min_sr max_country max_sr sr_diff
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 A Kumble Bangladesh 32.5 New Zealand 104. 72.0
## 2 D Vettori Bangladesh 36.8 Pakistan 240 203.
## 3 G Swann Sri Lanka 45.4 Australia 98.5 53.1
## 4 H Singh West Indies 56.1 Pakistan Inf Inf
## 5 H Tayfield Australia 55.8 England 79.1 23.3
## 6 K Maharaj West Indies 36.5 Australia 161. 125.
## 7 L Gibbs Pakistan 67.3 Bangladesh 162 94.7
## 8 M Muralidaran Bangladesh 40.6 Australia 131 90.4
## 9 N Lyon New Zealand 30.7 Pakistan 110. 78.8
## 10 R Ashwin Sri Lanka 41.2 South Africa 110 68.8
## 11 R Benaud South Africa 48.5 England 105. 56.3
## 12 R Herath Zimbabwe 36.6 West Indies 206 169.
## 13 S Macgill Bangladesh 34.1 Sri Lanka 66.4 32.3
## 14 S Warne UAE 25.9 India 81.0 55.2
## 15 Y Shah Sri Lanka 49 South Africa 196 147
And we can count how many times each country appears as these spinners most or least successful locations
merge((player_min_sr %>% count(country, name = "min_sr_count")), (player_max_sr %>% count(country, name = "max_sr_count")), by.x = "country", by.y = "country", all = TRUE)
## country min_sr_count max_sr_count
## 1 Australia 1 3
## 2 Bangladesh 4 1
## 3 England NA 2
## 4 India NA 1
## 5 New Zealand 1 1
## 6 Pakistan 1 3
## 7 South Africa 1 2
## 8 Sri Lanka 3 1
## 9 UAE 1 NA
## 10 West Indies 2 1
## 11 Zimbabwe 1 NA
The difference in strike rates between a spinner’s worst country and their best country show which spinners were able to best adapt to every condition they played in and which spinners struggled in at least one country. The spinner with the least difference was Hugh Tayfield but it should be stated that we only have data of him playing in 4 countries (Australia, England, New Zealand, and South Africa) so he didn’t need to adapt as much as others who played in many more countries. The worst (outside of Harbhajan Singh who has that infinite strike rate in Pakistan) was Daniel Vettori who, like most, found success in Bangladesh but, like most, struggled with the conditions in Pakistan. The count of each country appearing as a spinner’s most successful or least successful location showed similar results to what has been found previously. Namely that Bangladesh and Sri Lanka were the best locations for several spinners and Pakistan and Australia were the worst locations for many spinners.
Conclusion
The goal of this study was to compare adaptability across the different conditions posed by different countries of some successful spinners. The study is limited to only the spinners included in this study which has the potential to skew results. The most successful, highest wicket taking spinners should be the most adaptable and should hopefully provide a best case scenario for spinners bowling in different conditions. Choosing spinners from many different countries was also desirable to increase diversity in the pitches the spinners grew up playing on.
Overall, Sri Lanka was consistently among the most spin friendly countries in all metrics. Any good spinner will likely enjoy bowling there. The surprise of this study is Pakistan. Being on the subcontinent you might expect it to be spin friendly but it was consistently one of the hardest conditions for spin according to the data in this study. Perhaps this is reflected when considering some of Pakistan’s best bowlers. Many would immediately think of Waqar Younis, Wasim Akram, and Shoaib Akhtar who are all pace bowlers. Finally, the difference in each players best and worst strike rates gave insight into the most adaptable players. Tayfield, Macgill, Swann, Warne, Benaud, and Ashwin all had relatively low differences in their strike rates, all of them were able to find some success in all of the locations they played. But even for them there were some locations that were more challenging than others.
Spin bowlers are an important part of a test team. The best bowlers, who are never forgotten, are the ones that find a way to succeed consistently in as many conditions as possible to keep their team on top.