MATH1324 Assignment 2

Statistical Investigation of Strike rate of Left handed and Right handed Batsman

Akshay Prasannan - S3818611

Last updated: 29 October, 2020

Introduction

*Cricket is 2nd most popular sport after football with over 1 billion fans across the world and the world cup is the biggest stage as in context of cricket. (1)

*Each Criket team consist of 11 players divided into batsman, bowlers fielders and one keeper.

*Indian premier league is a popular professional Twenty over cricket leagues conducted in India every year from 2008 which attracts about 269 million viewers

*A bating style of a batsman can either be left handed or right handed per his natural ability

*Stike Rate is one of the key factor considered while assessing the performance of a batsman.

Introduction Cont.

*Study reveals about 10.6% of world population is left handed. (2)

*When comparing the above data with the the the batting players in cricket the research on top 8 international teams put forwards that 30% of the players occupying 1-6th position are left handed.(3)

*This study investigates the data obtained from kaggle (4) https://www.kaggle.com/ramjidoolla/ipl-data-set to eveluate the reltionship of left handed batsman and right handed batman’s strikerate.

Problem Statement

*Determine whether the difference betweent left handed and right handed batsman’s Batting Strikerate is statistically significant.

*A two sample t-test will be conducted to study the difference in population mean and to test the left handed and right handed batsman’s strikerate is statistical significance

Data

*The data for this study is extracted from kaggle: https://www.kaggle.com/ramjidoolla/ipl-data-set

*The dataset Players and most_runs_average_strikerate were used to conduct this study.

*The two data sets were merged together using merge() function.

*The new data set created (PlayerSR) consist of 10 variables. Out of these variable , the following two variables will be used for this:

1 Batting_hand: Factor describling the batting player’s batting style (Right hand and Left hand)

  1. Strike rate : The number of runs scored by batsman per 100 balls played

Data Cont.

The variable Batting_hand was converted into factors.

The levels of the factors are as follows:

  1. Left hand - left handed batting style
  2. Right hand - Right handed batting style

Descriptive Statistics and Visualisation

*For general overview Histograms and boxplot of Right handed and Left handed batman’s stikerate were drawn to have a general overview

*The heads of all the variables and first few observations are given below

head(PlayerSR) -> table5
knitr::kable(table5)
Player_Name total_runs out numberofballs average strikerate DOB Batting_Hand Bowling_Skill Country
A Ashish Reddy 280 15 191 18.66667 146.59686 1991-02-24 Right_Hand Right-arm medium India
A Chandila 4 1 7 4.00000 57.14286 1983-12-05 Right_Hand Right-arm offbreak India
A Chopra 53 5 71 10.60000 74.64789 1977-09-19 Right_Hand Right-arm offbreak India
A Choudhary 25 2 20 12.50000 125.00000 NA Right_Hand Left-arm fast-medium NA
A Dananjaya 4 0 5 NA 80.00000 NA Right_Hand Right-arm offbreak NA
A Flintoff 62 2 53 31.00000 116.98113 1977-12-06 Right_Hand Right-arm fast-medium England
PlayerSR_right <- PlayerSR %>% filter (PlayerSR$Batting_Hand == "Right_Hand")
PlayerSR_Left <- PlayerSR %>% filter (PlayerSR$Batting_Hand == "Left_Hand")

PlayerSR_Left$strikerate %>%  hist(col="blue",xlim=c(0,200),
                                   xlab="Strikerate - Left handed batsman",
                                  main="Strikerate - Left handed batsman")

PlayerSR_right$strikerate %>%  hist(col="blue",xlim=c(0,200),
                                    xlab="Strikerate - Right handed",
                                  main="Strikerate - Right handed batsman")

PlayerSR %>% boxplot(strikerate ~ Batting_Hand, data = .,  ylab = "Strike Rate", col="yellow",main="Strike rate of left and right handed batsman")

Decsriptive Statistics Cont.

*Summary of variable used here of analysis- strikerate left handed and right handed batsman

PlayerSR %>% group_by(Batting_Hand) %>% summarise(Min = min(strikerate,na.rm = TRUE),
                                           Q1 = quantile(strikerate,probs = .25,na.rm = TRUE),
                                           Median = median(strikerate, na.rm = TRUE),
                                           Q3 = quantile(strikerate,probs = .75,na.rm = TRUE),
                                           Max = max(strikerate,na.rm = TRUE),
                                           Mean = mean(strikerate, na.rm = TRUE),
                                           SD = sd(strikerate, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(strikerate))) -> table1
knitr::kable(table1)
Batting_Hand Min Q1 Median Q3 Max Mean SD n Missing
Left_Hand 0 98.14815 119.4805 133.2732 172.7273 112.0160 33.27622 133 0
Right_Hand 0 78.98981 109.3220 129.4414 250.0000 103.1476 41.41030 383 0

Hypothesis Testing

qqPlot(PlayerSR_Left$strikerate, distribution = "norm")

## [1]   2 126
qqPlot(PlayerSR_right$strikerate, distribution = "norm")

## [1]  58 172

Hypthesis Testing Cont.

*Proceeded to check the Homogeneity of variance, or the assumption of equal variance, using the Levene’s test.

*The Levene’s test has the following statistical hypotheses:

\[H_0: \mu_1^2 = \mu_2^2 \]

\[H_A: \mu_1^2 \ne \mu_2^2\]

leveneTest(strikerate ~ Batting_Hand, data= PlayerSR)

*Levene’s test returned a p value of 0.003.

*Implies that Levene’s test is statistically significant i.e p<0.05, therefore it is not safe to assume equal variance

*Now we will test Two-sample t-test - Assuming Unequal Variance

*The two-sample t-test has the following statistical hypotheses

\[H_0: \mu_1 = \mu_2 \]

\[H_A: \mu_1 \ne \mu_2\]

t.test(
  strikerate ~ Batting_Hand,
  data = PlayerSR,
  var.equal = FALSE,
  alternative = "two.sided"
  )
## 
##  Welch Two Sample t-test
## 
## data:  strikerate by Batting_Hand
## t = 2.4785, df = 283.79, p-value = 0.01377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   1.825448 15.911488
## sample estimates:
##  mean in group Left_Hand mean in group Right_Hand 
##                 112.0160                 103.1476

*p value is 0.01377, which is less than <0.05 . Therefore we reject the null hypothesis

Discussion

*A two-sample t-test was used to test for a significant difference between the mean Strike rate of Right handed and left handed batsman.

*The test of normality of strike rate of both right handed and left handed batsman using Q Q plot.

*Both the distribution displayed non-normality upon inspection of the normal Q-Q plot

*Since the data consist of sample size greater that 30, the central limit theorem ensured that the t-test could be applied

*Levene’s test were conducted to check homogeneity of variance. The test indicated that equal variance could not be assumed as the p values was less that 0.05.

*The results of the two-sample t-test assuming unequal variance was constructed and found statistical significant in difference between the mean Strike rate of left handed and right handed batsman. t(df = 283) = 2.4785 and p = 0.01377. I for the difference in means [1.825448 15.911488]

*The results of the investigation suggest that Left handed have significantly higher average strike rate than that of Right handed batsman

References

1.ICC [internet] https://www.icc-cricket.com/media-releases/759733

2.Human handedness: A meta-analysis’ is published in Psychological Bulletin.DOI (10.1037/bul0000229)

3.ICC [internet] https://www.icc-cricket.com/media-releases/759733

  1. Data https://www.kaggle.com/ramjidoolla/ipl-data-set