Final Project: Analysis of Pitcher Fatigue Over Seasons

Description: Through this project I aim to investigate whether pitchers who play more games in a season exhibit signs of fatigue over the course of their careers. By analyzing a dataset of post-season pitching performances, I will be focusing on key performance metrics such as strikeout rates, walks, and earned runs. My objectives for this project will be to quantify changes in performance metrics to identify signs of fatigue among pitchers, identify the relationship between the number of games played each year and changes in performance metrics.

Explanation about the performance metrics we are focusing in this project.

  1. Strikeouts: This metric typically indicates a pitcher’s dominance. A decreasing trend could suggest a decline in a pitcher’s ability to overpower hitters, which might be due to fatigue, injury, or loss of skill.
  2. Walks: An increasing in walks can often be a sign of a pitcher losing control or command, which could be related to fatigue, especially if it correlates with an increase in games played or innings pitched.
  3. Earned Runs: Earned runs are a critical metric for pitchers. An upward trend in earned runs per game over time could imply that the pitcher is becoming less effective, which could be due to fatigue or decline in skill.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(readr)
pitching_data <- read_csv("/Users/ba/Documents/IUPUI/Masters/First Sem/Statistics/Dataset/PitchingPost.csv")
## Rows: 3750 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): playerID, round, teamID, lgID
## dbl (26): yearID, W, L, G, GS, CG, SHO, SV, IPouts, H, ER, HR, BB, SO, BAOpp...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pitching_summary <- pitching_data %>%
  group_by(playerID, yearID) %>%
  summarize(
    Total_Games = sum(G),
    Total_SO = sum(SO),
    Total_BB = sum(BB),
    Total_ER = sum(ER),
    SO_per_Game = Total_SO / Total_Games,
    BB_per_Game = Total_BB / Total_Games,
    ER_per_Game = Total_ER / Total_Games,
    .groups = 'drop'
  )
head(pitching_summary)
## # A tibble: 6 × 9
##   playerID yearID Total_Games Total_SO Total_BB Total_ER SO_per_Game BB_per_Game
##   <chr>     <dbl>       <dbl>    <dbl>    <dbl>    <dbl>       <dbl>       <dbl>
## 1 abadfe01   2014           1        0        0        0         0          0   
## 2 abbotpa…   2000           2        4        6        4         2          3   
## 3 abbotpa…   2001           2        5       13        8         2.5        6.5 
## 4 abreubr…   2019           1        0        2        2         0          2   
## 5 abreubr…   2022          10       19        4        0         1.9        0.4 
## 6 aceveal…   2009           4        2        3        2         0.5        0.75
## # ℹ 1 more variable: ER_per_Game <dbl>

We have grouped the data based on ‘playerID’ and ‘yearID’ so that we can summarize the data for each player for each year. We are trying to find the following data for each player.

  1. The total number of games played by the pitcher in each year. (Total_Games)
  2. The total number of strikeouts made by the pitcher in each year. (Total_SO)
  3. The total number of walks issued by the pitcher in each year. (Total_BB)
  4. The total number of earned runs allowed by the pitcher in each year. (Total_ER)
  5. The average number of strikeouts per game. (SO_per-Game)
  6. The average number of walks per game. (BB_per_Game)
  7. The average number of earned runs per game. (ER_per_Game)
career_games <- pitching_summary %>%
  group_by(playerID) %>%
  summarize(Total_Career_Games = sum(Total_Games)) %>%
  arrange(desc(Total_Career_Games)) %>%
  slice_head(n = 3)  

print(career_games)
## # A tibble: 3 × 2
##   playerID  Total_Career_Games
##   <chr>                  <dbl>
## 1 riverma01                 65
## 2 janseke01                 59
## 3 madsory01                 57
selected_players_data <- pitching_summary %>%
  filter(playerID %in% c("riverma01", "janseke01", "madsory01"))
ggplot(selected_players_data, aes(x = yearID)) +
  geom_line(aes(y = SO_per_Game, color = "Strikeouts"), size = 1.5) +
  geom_line(aes(y = BB_per_Game, color = "Walks"), size = 1.5) +
  geom_line(aes(y = ER_per_Game, color = "Earned Runs"), size = 1.5) +
  facet_wrap(~ playerID, scales = "free_y") +
  labs(title = "Annual Performance Metrics for Selected Players",
       subtitle = "Trend analysis of Strikeouts, Walks, and Earned Runs per Game",
       x = "Year",
       y = "Performance per Game",
       color = "Metric Type") +
  theme_minimal() +
  theme(legend.position = "bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

  1. Player “janseke01”:
    • There are noticeable spikes in the earned runs per game, which may suggest a temporary decline in performance or could also indicate an injury or other temporary issues during that particular season.

    • There also appears to be a decrease in walks per game, indicating improved control or recovery from any previous issues.

    • And Strikeout rates for this player are quite volatile; if these spikes correspond with the spike in earned runs, we can assume that while trying to be more aggressive with strikeouts, this might have led to more hits and subsequently earned runs.

  2. Player “madsory01”:
    • This player exhibits a sharp increase in earned runs per game, which could be a potential sign of issues of fatigue, injury or a decline in performance due to other factors.

    • This player also shows a significant drop in strikeouts per game in the same period, reinforcing the potential decline in performance or the could because of an injury.

  3. Player “riverma01’:
    • This player shows a peak in strikeouts followed by a sharp increase in both walks and earned runs per game. This might be because of a period of overexertion or potential injury that affected the performance.

    • Following this peak, there’s a notable decline in all performance metrics, which might indicate a phase of recovery or adjustment in pitching style to compensate for any physical limitations.

reshaped_data <- selected_players_data %>%
  select(playerID, yearID, Total_Games, Total_SO, Total_BB, Total_ER) %>%
  gather(key = "Metric", value = "Value", -playerID, -yearID)

ggplot(reshaped_data, aes(x = yearID, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  facet_wrap(~ playerID, scales = "free_x") +
  labs(title = "Total Games, Strikeouts, Walks, and Earned Runs per Year",
       x = "Year",
       y = "Total Count",
       fill = "Metric") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1), 
        legend.position = "bottom")

  1. Player “janseke01”:
    • There are years with high games played. However, this doesn’t always correspond to an increase in strikeouts.

    • Strikeouts peaked and then fluctuated, this could be because of changes in performance or playing style, or it could be due to other factors such as injuries or changes in the team’s pitching strategy.

    • Walks and earned runs seem to be relatively lower than strikeouts, suggesting better control and pitching effectiveness.

  2. Player “madsory01”:
    • The number of games played peaks, which might indicate heavy workload.

    • Earned runs and strikeouts appear to peak around the same period, which might suggest an aggressive pitching style that results in both high strikeouts and high earned runs, or it could be indicating a particularly challenging season.

    • The variability in the number of walks and earned runs over different years suggest fluctuating control and effectiveness.

  3. Player “riverma01”:
    • Walks and earned runs do not show a consistent trend with games played, but there are spikes that could be associated with specific seasons or conditions.

——– Inference —————

For all these players, there doesn’t seem to be a consistent trend where an increase in games played correlates with an increase in other metrics. This could suggest number of games is not the only factor affecting performance. Peaks in strikeouts and earned runs do not always align, indicating that a good year for strikeouts is not necessarily a bad year for earned runs, and vice versa.

  • To confirm this lets analyze the correlation between number of games and performance metrics
correlation_results <- selected_players_data %>%
  select(Total_Games, SO_per_Game, BB_per_Game, ER_per_Game) %>%
  cor()

knitr::kable(correlation_results, caption = "Correlation Matrix between Number of Games and Performance Metrics for Selected Players")
Correlation Matrix between Number of Games and Performance Metrics for Selected Players
Total_Games SO_per_Game BB_per_Game ER_per_Game
Total_Games 1.0000000 0.2029000 0.1593766 0.2666918
SO_per_Game 0.2029000 1.0000000 0.3832299 0.3038527
BB_per_Game 0.1593766 0.3832299 1.0000000 0.2716628
ER_per_Game 0.2666918 0.3038527 0.2716628 1.0000000
  1. Total_Games and SO_per_Game: There is a weak positive correlation between the total games played and strikeouts per game. This suggests that as the number of games increases, there is a slight tendency for strikeouts per game to increase, although this relationship is not strong.
  2. Total_Games and BB_per_Game: There is a very weak positive correlation between the total games played and walks per game. This means that playing more games has a small positive association with the number of walks per game, but this relationship is not significant.
  3. Total_Games and ER_per_Game: There is a weak positive correlation between the total games played and earned runs per game. This could suggest that players who play more games might give up slightly more earned runs per game, which could be a sign of fatigue. However, the correlation is not strong, and further investigation would be needed.

After exploring the correlation between the total number of games played and key performance metrics—strikeouts, walks, and earned runs per game—it became evident that while relationships exist, they could be more complex than what simple correlation coefficients reveal. So let’s further introduce Gaussian Generalized Linear Models to see if we can draw more insights.

  • Generalized Linear Models
  1. Gaussian GLM for Strikeouts per Game
glm_SO_gaussian <- glm(SO_per_Game ~ Total_Games, family = gaussian(), data = selected_players_data)
summary(glm_SO_gaussian)
## 
## Call:
## glm(formula = SO_per_Game ~ Total_Games, family = gaussian(), 
##     data = selected_players_data)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.02219    0.22485   4.546 0.000111 ***
## Total_Games  0.03162    0.02992   1.057 0.300428    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.3679645)
## 
##     Null deviance: 9.9778  on 27  degrees of freedom
## Residual deviance: 9.5671  on 26  degrees of freedom
## AIC: 55.392
## 
## Number of Fisher Scoring iterations: 2

Interpretation of the Output:

  • Intercept (β₀): The intercept, estimated at 1.02219, is statistically significant (p < 0.001). This suggests that the expected value of strikeouts per game (SO_per_Game) when the number of games played (Total_Games) is zero is about 1.02. This value, however, is more of a theoretical intercept because it’s unlikely that any actual data point (game) has zero games played.

  • Coefficient for Total_Games (β₁): The coefficient for Total_Games is 0.03162 and is not statistically significant (p = 0.300428). This indicates a slight increase in the number of strikeouts per game with each additional game played, although this increase is not statistically robust.

  • Dispersion Parameter: The estimated dispersion parameter for the Gaussian family is about 0.3679645, which suggests variance in the response variable (strikeouts per game) around the predicted mean.

  • Deviance Information:

    • Null Deviance: Represents the goodness of fit of a model with just the intercept. Here, it’s 9.9778 on 27 degrees of freedom.

    • Residual Deviance: After including the predictor (Total_Games), the residual deviance is 9.5671 on 26 degrees of freedom. The slight reduction in deviance suggests that Total_Games provides little additional explanatory power over just using the mean.

    • AIC: The Akaike Information Criterion value is 55.392, which provides a measure for model comparison where lower values suggest a better model relative to alternatives.

  • Thoughts: The coefficient for Total_Games is not statistically significant, suggesting that the increase in strikeouts per game with more games played is not reliably different from zero in this dataset. Therefore, based on this model alone, we cannot conclude that there are clear signs of fatigue manifesting as decreased ability to strike out batters. The positive direction of the coefficient paradoxically suggests that players are striking out more batters per game as they play more games, contrary to the typical expectation if fatigue were influencing performance negatively.

  1. Gaussian GLM for Walks per Game
glm_BB_gaussian <- glm(BB_per_Game ~ Total_Games, family = gaussian(), data = selected_players_data)
summary(glm_BB_gaussian)
## 
## Call:
## glm(formula = BB_per_Game ~ Total_Games, family = gaussian(), 
##     data = selected_players_data)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.17529    0.09395   1.866   0.0734 .
## Total_Games  0.01029    0.01250   0.823   0.4179  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.06424807)
## 
##     Null deviance: 1.7140  on 27  degrees of freedom
## Residual deviance: 1.6704  on 26  degrees of freedom
## AIC: 6.5254
## 
## Number of Fisher Scoring iterations: 2

Interpretation of the Output:

  • Intercept (β₀): The intercept is estimated at 0.17529, indicating that when the total number of games played (Total_Games) is zero, the expected value of walks per game (BB_per_Game) is about 0.175. This is statistically significant with a p-value of 0.0734, which is marginally significant and might warrant attention in contexts where a higher alpha threshold is acceptable.

  • Coefficient for Total_Games (β₁): The coefficient for Total_Games is 0.01029, which is not statistically significant (p = 0.4179). This suggests a very slight, and statistically uncertain, increase in the number of walks per game as the number of games increases.

  • Dispersion Parameter: The estimated dispersion parameter for the Gaussian family is approximately 0.06424807, indicating the variance around the predicted mean is relatively small, suggesting a tighter clustering of the residual values around the mean.

  • Deviance Information:

    • Null Deviance: 1.7140 on 27 degrees of freedom, indicating the variance in BB_per_Game explained by the mean alone.

    • Residual Deviance: 1.6704 on 26 degrees of freedom, showing a very slight improvement with the inclusion of Total_Games as a predictor.

    • AIC: 6.5254, which is a measure of the relative quality of the statistical model for the given set of data. Lower AIC values generally indicate a model more efficient at explaining the variability while penalizing excessive model complexity.

  • Thoughts: While the coefficient for Total_Games is positive, suggesting an increase in walks as more games are played, the lack of statistical significance implies that this effect is uncertain within the dataset used. The marginal significance of the intercept might suggest that the baseline level of walks per game is consistent across players when not considering the number of games played. Theoretically, if fatigue were a significant factor, one might expect to see an increase in walks as pitchers lose control over their pitches. However, the data does not support a strong conclusion here, as the relationship is not statistically significant.

  1. Gaussian GLM for Earned Runs per Game
glm_ER_gaussian <- glm(ER_per_Game ~ Total_Games, family = gaussian(), data = selected_players_data)
summary(glm_ER_gaussian)
## 
## Call:
## glm(formula = ER_per_Game ~ Total_Games, family = gaussian(), 
##     data = selected_players_data)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.118265   0.069371   1.705     0.10
## Total_Games 0.013026   0.009232   1.411     0.17
## 
## (Dispersion parameter for gaussian family taken to be 0.03502587)
## 
##     Null deviance: 0.98040  on 27  degrees of freedom
## Residual deviance: 0.91067  on 26  degrees of freedom
## AIC: -10.461
## 
## Number of Fisher Scoring iterations: 2

Interpretation of the Output:

  • Intercept (β₀): The intercept is estimated at 0.118265, suggesting that the expected value of earned runs per game (ER_per_Game) when no games have been played is approximately 0.118. This value is not statistically significant (p = 0.10), indicating that the baseline level of earned runs is not significantly different from zero at the 95% confidence level.

  • Coefficient for Total_Games (β₁): The coefficient for Total_Games is 0.013026, which indicates a slight increase in earned runs per game with each additional game played. This coefficient is also not statistically significant (p = 0.17), suggesting that the data does not provide strong evidence of a linear relationship between the number of games played and earned runs per game.

  • Dispersion Parameter: The estimated dispersion parameter for the Gaussian family is approximately 0.03502587, which points to a relatively small variance around the mean predictions, suggesting tight clustering of the response variable.

  • Deviance Information:

    • Null Deviance: 0.98040 on 27 degrees of freedom, indicating the variance in ER_per_Game that is explained by the mean alone.

    • Residual Deviance: 0.91067 on 26 degrees of freedom, which shows only a slight improvement upon including Total_Games as a predictor.

    • AIC: -10.461, indicating a model that is relatively efficient in terms of explaining the variability in the data with minimal complexity.

  • Thoughts: The lack of statistical significance for the coefficient associated with Total_Games implies that there is no strong evidence to support a definitive increase in earned runs per game as the season progresses, which could have been expected if fatigue were a significant factor affecting pitchers. In theory, if fatigue significantly impacted pitchers, one might expect an increase in earned runs due to decreased pitching effectiveness. However, the analysis does not confirm this, possibly due to the low statistical power or the small effect size that the sample size fails to detect.

Finally, lets plot residuals to check for normality and homoscedasticity

par(mfrow = c(2, 2))
plot(glm_SO_gaussian)

1. Residuals vs. Fitted Plot

  • Purpose: This plot helps check for non-linear patterns and equal variance of residuals.

  • Interpretation: The residuals are evenly scattered around the horizontal line at zero, suggesting no obvious patterns that would indicate non-linearity or issues with the model’s linearity assumption. The distribution of residuals also appears consistent across the range of fitted values, indicating that the residuals have equal variance (homoscedasticity).

2. Q-Q Plot of Residuals

  • Purpose: Assesses the normality of residuals.

  • Interpretation: The Q-Q plot displays the quantiles of residuals against the expected quantiles if they were normally distributed. The points largely adhere to the line, with minor deviations at the tails. This generally supports the assumption that residuals are normally distributed, which is crucial for the validity of regression analysis.

3. Scale-Location Plot

  • Purpose: Used to check if residuals are spread equally along the ranges of predictors, confirming homoscedasticity.

  • Interpretation: The plot does not show any patterns that would suggest changing variance in residuals across the range of predicted values. The flat trend line across the plot supports the assumption of homoscedasticity.

4. Residuals vs. Leverage Plot

  • Purpose: Identifies influential cases that might disproportionately influence the regression estimates.

  • Interpretation: No individual data points are showing high leverage or large residuals. This indicates that no single data point is disproportionately influencing the model’s predictions. Cook’s distance for all observations is well within acceptable limits, suggesting no concerns about influential outliers.

THOUGHTS: These diagnostic plots indicate that the Gaussian GLM for strikeouts per game satisfactorily meets the necessary assumptions for linear regression. The residuals exhibit homoscedasticity and normality, and there are no influential outliers affecting the model, which corroborates the reliability of the statistical analysis. This well-behaved diagnostic performance also supports the interpretation that any lack of significant findings regarding the relationship between total games played and strikeouts per game is likely reflective of the true nature of the data rather than a model deficiency or data anomaly.

par(mfrow = c(2, 2))
plot(glm_BB_gaussian)

1. Residuals vs. Fitted Plot

  • Purpose: To check for any non-linear patterns, homoscedasticity, and outliers.

  • Interpretation: The residuals scatter around the zero line uniformly across the range of predicted values, indicating good homoscedasticity. There’s no apparent pattern suggesting any non-linearity or systematic bias in the model predictions. This plot shows that the linear model assumption of constant variance is appropriate for this dataset.

2. Q-Q Plot of Residuals

  • Purpose: To check the normality of residuals.

  • Interpretation: The points largely align along the theoretical line, suggesting that the residuals are normally distributed. There are minor deviations at the ends, but these are not significant enough to suggest any substantial departure from normality.

3. Scale-Location Plot

  • Purpose: To verify that residuals are spread equally along the ranges of predictors.

  • Interpretation: This plot shows a fairly constant spread across the range of fitted values, indicating that the variance of the residuals is consistent (homoscedasticity). There’s no apparent pattern that suggests changing variance, which supports the assumptions behind the regression model.

4. Residuals vs. Leverage Plot

  • Purpose: To identify influential cases that might disproportionately influence the model.

  • Interpretation: No points significantly stand out in terms of having high leverage or high residuals, which would have indicated influential points. All data points have Cook’s distances well below the common threshold of concern (e.g., 0.5), indicating that there are no unduly influential points in this analysis.

THOUGHTS: These diagnostic plots for the Gaussian GLM on walks per game suggest that the model’s assumptions are being met adequately. The residuals are well-behaved and do not show any signs of violating the assumptions necessary for a valid linear regression analysis:

  • The residuals are distributed normally.

  • The variance of the residuals is consistent across the range of predictions.

  • There is no evidence of influential outliers that could skew the model’s results.

This thorough examination supports the model’s reliability and the validity of its predictions. However, given that the model’s outputs did not reveal significant relationships between the total number of games played and walks per game, this suggests that other factors not included in the model may be influencing walks or that the effect of games on walks is inherently weak or non-existent in the dataset.

par(mfrow = c(2, 2))
plot(glm_ER_gaussian)

1. Residuals vs. Fitted Plot

  • Purpose: To check for non-linear patterns, homoscedasticity, and outliers.

  • Interpretation: The residuals are evenly distributed around the zero line, suggesting no obvious patterns that would indicate issues with the model’s linearity assumption. The scatter of residuals appears uniform across the range of predicted values, supporting the assumption of homoscedasticity (constant variance across the range of predicted values).

2. Q-Q Plot of Residuals

  • Purpose: To assess the normality of residuals.

  • Interpretation: This plot shows that the residuals mostly align with the theoretical line, indicating that they are approximately normally distributed. A few deviations at the extremes suggest slight departures from normality, but these are not severe enough to be a major concern.

3. Scale-Location Plot

  • Purpose: To verify that residuals are spread equally across all levels of fitted values.

  • Interpretation: The plot shows a fairly constant spread of residuals across the range of fitted values, which suggests homoscedasticity. There is no visible pattern indicating varying spread, which is good for the validity of the regression assumptions.

4. Residuals vs. Leverage Plot

  • Purpose: To identify any influential cases that might disproportionately influence the regression estimates.

  • Interpretation: No points are showing unusually high leverage or significant residuals that could indicate problematic influential observations. The Cook’s distance for all observations is within acceptable limits, suggesting that no single point is unduly influencing the model’s overall fit.

THOUGHTS: The diagnostic checks for the ER_per_Game model, in conjunction with previously analyzed models for strikeouts and walks, consistently indicate robust model fits across various performance metrics. This further underscores the conclusions drawn from the GLM analyses that the total number of games played does not significantly influence the performance metrics studied (strikeouts, walks, earned runs), with respect to signs of player fatigue.

CONCLUSION

From the time series plots, we observed trends and fluctuations in performance metrics — strikeouts per game, walks per game, and earned runs per game. While there were noticeable variations in these metrics over time, the relationship between total games played and signs of fatigue was not clearly established. Correlation analysis revealed only weak positive correlations between the number of games and performance metrics, suggesting a slight increase in strikeouts, walks, and earned runs with more games played, but these correlations were not strong enough to definitively indicate fatigue.

The subsequent analysis using Gaussian Generalized Linear Models (GLMs) further explored these relationships. The GLMs aimed to rigorously quantify the effect of total games played on each performance metric while accounting for the Gaussian distribution of residuals. The results from these models showed that the coefficients for total games played, in relation to all three performance metrics (strikeouts, walks, and earned runs), were not statistically significant. This indicates that, even with a more sophisticated analytical approach, there is no strong statistical evidence to suggest that increased game play significantly affects these performance metrics in a way that would indicate fatigue.

Moreover, diagnostic checks on the GLM models (including residuals vs. fitted values, Q-Q plots for normality, scale-location plots for homoscedasticity, and residuals vs. leverage plots for influential data points) confirmed that the assumptions necessary for valid linear regression analysis were satisfied. This supports the reliability of the model outcomes and suggests that the lack of significant findings is not due to statistical anomalies but likely reflects the true nature of the data.

In conclusion, our comprehensive analysis incorporating time series plots, correlation studies, and Gaussian GLM did not find clear and consistent evidence that fatigue developed based on the total number of games played. This is likely due to the complex nature of fatigue which is multifactorial and can be influenced by more than just the number of games played. Other elements such as innings pitched, number of pitches per game, days of rest, training routines, nutrition, and even psychological factors can contribute to a player’s fatigue level. Additionally, the lack of contextual data in our analysis means that factors such as player injuries, pitching intensity, travel schedules, or mid-season breaks, which could significantly affect fatigue and performance, were not accounted for. This underscores the need for a more holistic approach to studying athletic performance and fatigue, incorporating a wider range of variables to fully understand the dynamics at play.