week10

my_data <- read.csv('C:/Users/dell/Downloads/Ball_By_Ball.csv')

Selecting Binary column

I consider “Out_type,” which represents the type of dismissal for a batsman in a cricket match. We can convert this variable into a binary format where “Out_type” indicates whether the batsman was dismissed (1) or not dismissed (0).

# Remove rows with missing values in the "Out_type" column and make it binary
my_data <- my_data[complete.cases(my_data$Out_type), ]
my_data$Out_type <- ifelse(my_data$Out_type == 1, 1, 0)

Selecting Exploratory variables

When using “Out_type” as the response variable in a logistic regression model, I consider the following possible exploratory variables from my dataset to explain the factors influencing the type of dismissal:

Striker_Batting_Position: The position at which the striker bats in the order. This variable can influence the type of dismissal, as openers might face different scenarios compared to middle-order or lower-order batsmen.

Bowler_Wicket: The number of wickets taken by the bowler during their spell. This variable can help determine whether a bowler with more wickets is more likely to dismiss a batsman in a specific way.

Match_Location: The geographical location or venue where the match is played. Different pitches and ground dimensions can lead to varying types of dismissals.

Building the logistic regression model using these variable

# Preparing the data frame with selected variables
selected_data <- my_data[c("Out_type", "Striker_Batting_Position", "Bowler_Wicket")]

# Logistic regression model
logistic_model <- glm(Out_type ~ Striker_Batting_Position + Bowler_Wicket, data = selected_data, family = "binomial")

## Warning: glm.fit: algorithm did not converge

# Summarize the model
summary(logistic_model)

## 
## Call:
## glm(formula = Out_type ~ Striker_Batting_Position + Bowler_Wicket, 
##     family = "binomial", data = selected_data)
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)
## (Intercept)              -2.657e+01  1.882e+03  -0.014    0.989
## Striker_Batting_Position  1.327e-13  4.496e+02   0.000    1.000
## Bowler_Wicket            -3.277e-13  4.698e+03   0.000    1.000
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 136589  degrees of freedom
## Residual deviance: 7.9244e-07  on 136587  degrees of freedom
##   (13861 observations deleted due to missingness)
## AIC: 6
## 
## Number of Fisher Scoring iterations: 25

Interpreting the results

The results of the logistic regression model for “Out_type” using “Striker_Batting_Position” and “Bowler_Wicket” as explanatory variables are as follows:

Intercept: The intercept, representing the log-odds of the response variable when both explanatory variables are zero, is -26.57. However, it is not statistically significant (p-value = 0.989), which suggests that the model doesn’t provide any meaningful information about the baseline log-odds.

Striker_Batting_Position: The coefficient for “Striker_Batting_Position” is approximately 1.33e-13. This value is extremely close to zero, indicating that there is no relationship between “Striker_Batting_Position” and the log-odds of the response variable “Out_type.” The p-value is 1.000, indicating that this variable is not statistically significant.

Bowler_Wicket: The coefficient for “Bowler_Wicket” is approximately -3.28e-13. Similar to “Striker_Batting_Position,” this coefficient is very close to zero, suggesting no relationship between “Bowler_Wicket” and the log-odds of “Out_type.” The p-value is 1.000, indicating that this variable is not statistically significant.

Overall, the model seems to perform poorly. The coefficients for both explanatory variables are extremely close to zero, and their p-values are very high. This suggests that neither “Striker_Batting_Position” nor “Bowler_Wicket” is a meaningful predictor of the binary outcome “Out_type.” The null deviance and residual deviance indicate that the model does not fit the data well, and there is a large amount of missing data (13861 observations deleted).

The AIC (Akaike Information Criterion) is low, which is expected when the model performs poorly. It implies that this model is not a good fit for the data.

In summary, the logistic regression model with these explanatory variables does not provide meaningful insights or predictability for the “Out_type” variable.

Transformation for any explanatory variable,

#logarithmic transformation to Runs_Scored
my_data$Log_Runs_Scored <- log(my_data$Runs_Scored)

# Create a scatter plot
plot(my_data$Log_Runs_Scored, my_data$Striker_Batting_Position, 
     xlab = "Log(Runs_Scored)", ylab = "Striker_Batting_Position",
     main = "Scatter Plot of Log(Runs_Scored) vs. Striker_Batting_Position")

The above code applies a logarithmic transformation to “Runs_Scored” and then creates a scatter plot to explore how the transformed variable “Log_Runs_Scored” relates to “Striker_Batting_Position.” The scatter plot allows to visualize the potential relationship between these variables after the transformation.