Contents:

1. ANOVA - comparing Earned Runs of teams (Boston Red Sox, Cleveland Guardians, New York Yankees)

2. Linear Regression Model - Wins (Dependent Variable) and Earned Runs (Independent Variable). W~ER

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggthemes)
library(ggrepel)
library(effsize)
library(pwrss)
## 
## Attaching package: 'pwrss'
## The following object is masked from 'package:stats':
## 
##     power.t.test
Data_set <- "/Users/ba/Documents/IUPUI/Masters/First Sem/Statistics/Dataset/PitchingPost.csv"
Pitching_Data <- read.csv(Data_set)
Regression_Data <-
  Pitching_Data |>
  filter(is.finite(ERA),
         is.finite(BAOpp))

ANOVA: ER ~ TeamID

Regression_Data |>
  filter(teamID == "BOS" | teamID == "CLE" | teamID == "NYA") |>
  group_by(teamID) |>
  ggplot(aes(x=teamID,y=ER,fill=teamID,color=teamID))+
  geom_boxplot()+
  theme_economist()

ANOVA_test <- aov(ER~teamID,data=Regression_Data)
summary(ANOVA_test)
##               Df Sum Sq Mean Sq F value Pr(>F)  
## teamID        31    163   5.251   1.342 0.0982 .
## Residuals   3670  14365   3.914                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: The p-value is 0.0982, which is greater than the commonly used significance level of 0.05. Therefore, we fail to reject the null hypothesis at the 0.05 significance level. These asterisks indicate the level of significance of the p-value. The more asterisks, the lower the p-value and the greater the significance. In this case, there are no asterisks, indicating that the result is not statistically significant at the conventional levels (0.05, 0.01, 0.001).

Linear Regression: W (Wins) ~ ER (Earned Runs)

model <- lm(W~ER, data=Regression_Data)
model
## 
## Call:
## lm(formula = W ~ ER, data = Regression_Data)
## 
## Coefficients:
## (Intercept)           ER  
##    0.208688     0.005726

Interpretation:

  • Intercept (0.208688): This is the estimated value of W when the explanatory variable ER is zero. In other words, when ER is zero, the estimated value of W is approximately 0.208688.

  • ER (0.005726): This coefficient represents the change in the response variable W for a one-unit increase in the explanatory variable ER, holding all other variables constant. So, for each unit increase in ER, the estimated value of W increases by approximately 0.005726.

In simpler terms, the intercept represents the baseline value of W, and the coefficient for ER represents how much W is expected to change for each unit increase in ER, assuming all other factors remain constant.