library(ggthemes)
library(ggrepel)
## Loading required package: ggplot2
library(AmesHousing)
library(boot)
library(broom)
library(lindia)
Data_set <- "/Users/ba/Documents/IUPUI/Masters/First Sem/Statistics/Dataset/PitchingPost.csv"
Pitching_Data <- read.csv(Data_set)
Pitching_Data$played_in_WS <- ifelse(Pitching_Data$round == "WS", 1, 0)
Pitching_Data |>
  ggplot(aes(x=IPouts,y=played_in_WS))+
  geom_jitter(width = 0, height = 0.1, shape = 'O', size = 3) +
  geom_point()+
  geom_smooth(method = "lm",se=FALSE)+
  theme_economist()
## `geom_smooth()` using formula = 'y ~ x'

model <- glm(played_in_WS ~ IPouts, family = binomial, data = Pitching_Data)
model
## 
## Call:  glm(formula = played_in_WS ~ IPouts, family = binomial, data = Pitching_Data)
## 
## Coefficients:
## (Intercept)       IPouts  
##    -2.22412      0.02834  
## 
## Degrees of Freedom: 3749 Total (i.e. Null);  3748 Residual
## Null Deviance:       2953 
## Residual Deviance: 2915  AIC: 2919

Interpretation:

Since logistic regression models the log-odds, the interpretation in terms of probability is non-linear:

ci <- confint(model, "IPouts", level = 0.95)
## Waiting for profiling to be done...
ci
##      2.5 %     97.5 % 
## 0.01949196 0.03703962

Interpretation:

This confidence interval indicates that we are 95% confident that the true coefficient for IPouts lies between approximately 0.01949 and 0.03704. The interval does not include zero, which strongly suggests that the relationship between IPouts (Innings Pitched Outs) and the likelihood of a player having played in the World Series is statistically significant. This reaffirms the finding that more innings pitched (more outs) are associated with a higher likelihood of playing in the World Series.

The coefficient indicates the change in the log-odds of having played in the World Series for each additional out pitched. Specifically, the interval suggests that for each additional out pitched, the log-odds of playing in the World Series increases by between approximately 0.0195 and 0.0370.