library(ggthemes)
library(ggrepel)
## Loading required package: ggplot2
library(AmesHousing)
library(boot)
library(broom)
library(lindia)
Data_set <- "/Users/ba/Documents/IUPUI/Masters/First Sem/Statistics/Dataset/PitchingPost.csv"
Pitching_Data <- read.csv(Data_set)
Pitching_Data$played_in_WS <- ifelse(Pitching_Data$round == "WS", 1, 0)
Pitching_Data |>
ggplot(aes(x=IPouts,y=played_in_WS))+
geom_jitter(width = 0, height = 0.1, shape = 'O', size = 3) +
geom_point()+
geom_smooth(method = "lm",se=FALSE)+
theme_economist()
## `geom_smooth()` using formula = 'y ~ x'
model <- glm(played_in_WS ~ IPouts, family = binomial, data = Pitching_Data)
model
##
## Call: glm(formula = played_in_WS ~ IPouts, family = binomial, data = Pitching_Data)
##
## Coefficients:
## (Intercept) IPouts
## -2.22412 0.02834
##
## Degrees of Freedom: 3749 Total (i.e. Null); 3748 Residual
## Null Deviance: 2953
## Residual Deviance: 2915 AIC: 2919
Interpretation:
Since logistic regression models the log-odds, the interpretation in terms of probability is non-linear:
For small increases in IPouts, the probability of having played in the World Series increases, but the relationship is not straightforward because it’s on a log-odds scale.
To interpret the change in probability, one could calculate the odds ratio, which is exp(0.02834) ~ 1.0287. This means that each additional out pitched increases the odds of having played in the World Series by approximately 2.87%.
ci <- confint(model, "IPouts", level = 0.95)
## Waiting for profiling to be done...
ci
## 2.5 % 97.5 %
## 0.01949196 0.03703962
Interpretation:
This confidence interval indicates that we are 95% confident that the true coefficient for IPouts lies between approximately 0.01949 and 0.03704. The interval does not include zero, which strongly suggests that the relationship between IPouts (Innings Pitched Outs) and the likelihood of a player having played in the World Series is statistically significant. This reaffirms the finding that more innings pitched (more outs) are associated with a higher likelihood of playing in the World Series.
The coefficient indicates the change in the log-odds of having played in the World Series for each additional out pitched. Specifically, the interval suggests that for each additional out pitched, the log-odds of playing in the World Series increases by between approximately 0.0195 and 0.0370.