Interpretation: The p-value is 0.0982, which is greater than the commonly used significance level of 0.05. Therefore, we fail to reject the null hypothesis at the 0.05 significance level. These asterisks indicate the level of significance of the p-value. The more asterisks, the lower the p-value and the greater the significance. In this case, there are no asterisks, indicating that the result is not statistically significant at the conventional levels (0.05, 0.01, 0.001).

Explanation on Continuous and Categorical Variables:

‘ER’ is inherently a quantitative measure representing the number of runs that a pitcher allows that are not the result of errors or passed balls. As a continuous variable, ‘ER’ provides a direct metric of a pitcher’s performance that can be influenced by various factors including team strategy, opposing team strength, and in-game circumstances. Analyzing ‘ER’ continuously allows us to utilize regression techniques to estimate and understand variations in pitcher performance across different teams.

The objective was to quantify the impact of playing for different teams on the earned runs a pitcher is likely to allow. This could uncover insights such as whether certain teams are associated with better or worse pitching performances due to factors like defensive support, ballpark factors, or team management strategies.
‘TeamID’ classifies the data into groups based on the team for which each pitcher played. This variable is categorical because each team is distinct and represents a unique grouping within the dataset. By using ‘TeamID’ as a categorical variable, we can perform ANOVA to compare the mean earned runs across different teams, providing insights into team-level differences and effects.

The use of ‘TeamID’ as an explanatory variable in ANOVA allows us to assess whether there are statistically significant differences in pitching outcomes (i.e., earned runs allowed) among teams. This can help identify if certain teams consistently foster better or poorer pitching performances, which could be useful for team management and strategic planning.

Linear Regression: W (Wins) ~ ER (Earned Runs)

model <- lm(W~ER, data=Regression_Data)
model

## 
## Call:
## lm(formula = W ~ ER, data = Regression_Data)
## 
## Coefficients:
## (Intercept)           ER  
##    0.208688     0.005726

Interpretation:

Intercept (0.208688): This is the estimated value of W when the explanatory variable ER is zero. In other words, when ER is zero, the estimated value of W is approximately 0.208688.
ER (0.005726): This coefficient represents the change in the response variable W for a one-unit increase in the explanatory variable ER, holding all other variables constant. So, for each unit increase in ER, the estimated value of W increases by approximately 0.005726.

In simpler terms, the intercept represents the baseline value of W, and the coefficient for ER represents how much W is expected to change for each unit increase in ER, assuming all other factors remain constant.

Week - 8

2024-03-11

Contents:

1. ANOVA - comparing Earned Runs of teams (Boston Red Sox, Cleveland Guardians, New York Yankees)

2. Linear Regression Model - Wins (Dependent Variable) and Earned Runs (Independent Variable). W~ER

ANOVA: ER ~ TeamID

Linear Regression: W (Wins) ~ ER (Earned Runs)