In the MLB, it is against the rules to use a foreign substance by pitchers. Only rosin is allowed to maintain the integrity of the game. Illegal foreign substances include pine tar, spider tack, or any sticky stuff. These are illegal because previous research has found that spin rates increase dramatically when these foreign substances are used by pitchers. In the recent years, the MLB has advocated for pitchers to be checked before, during, and after the games for foreign substances.
One case in particular, is the controversy of Joe Musgrove’s ears during the 2022 wild card series between the San Diego Padres and the New York Mets. In the bottom of the 6th inning, Buck Showalter, manager of the Mets, talked to the umpires to check Joe Musgrove for any foreign substances. Throughout the game, there was focus on Musgrove’s ears and how his spin rates and velocity were higher than average. In addition, Musgrove was giving the Mets a hard time and was keeping them scoreless.
When the umpires checked Musgrove, they rubbed his ears and found no foreign substances. Musgrove continued to pitch and continued to dominate on the mound. In this report, I will investigate whether it was possible Musgrove was cheating and compare this game to his regular season performance.
This data was gathered from a Kaggle dataset that was modified from Baseball Savant. This is a public source and is in association with the MLB for statcast and other baseball statistics. They gather their data directly from MLB Advanced Media, L.P.
Changed the format of the game_date from string to Date.
## Rows: 3120 Columns: 17
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (11): game_date, pitcher_name, pitcher_home_away, batter, batter_stance,...
## dbl (6): release_spin_rate, spin_axis, effective_speed, exit_velocity, at_b...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## New names:
## * pitcher -> pitcher...8
## * fielder_2 -> fielder_2...42
## * pitcher -> pitcher...60
## * fielder_2 -> fielder_2...61
##
## Rows: 3121 Columns: 92
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (17): pitch_type, game_date, player_name, events, description, des, game...
## dbl (67): release_speed, release_pos_x, release_pos_z, batter, pitcher...8, ...
## lgl (8): spin_dir, spin_rate_deprecated, break_angle_deprecated, break_leng...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6 x 17
## game_date pitcher_name pitcher_home_away batter batter_stance opponent
## <date> <chr> <chr> <chr> <chr> <chr>
## 1 2022-10-21 Musgrove, Joe away Bohm, Alec R PHI
## 2 2022-10-21 Musgrove, Joe away Castellanos~ R PHI
## 3 2022-10-21 Musgrove, Joe away Castellanos~ R PHI
## 4 2022-10-21 Musgrove, Joe away Castellanos~ R PHI
## 5 2022-10-21 Musgrove, Joe away Harper, Bry~ L PHI
## 6 2022-10-21 Musgrove, Joe away Harper, Bry~ L PHI
## # ... with 11 more variables: pitch_name <chr>, release_spin_rate <dbl>,
## # spin_axis <dbl>, effective_speed <dbl>, pitch_result <chr>,
## # exit_velocity <dbl>, game_type <chr>, season_type <chr>,
## # at_bat_outcome <chr>, at_bat_number <dbl>, strikes <dbl>
I will investigate the average spin rate for each pitch type including: slider, curveball, 4-seam fastball, changeup, sinker, and cutter. I will compare the spin rates with the individual Mets game and the entire 2022 season.
mets <- dat%>%
filter(game_date == '2022-10-09')%>%
arrange(at_bat_number, strikes, at_bat_outcome)
head(mets)
mets_avg_spin <- mets%>%
group_by(pitch_name)%>%
summarise(avg_spin_rate = mean(release_spin_rate))%>%
mutate(season_type = 'Mets Game')
reg_spin_rate <- dat%>%
filter(season_type == 'Regular Season')%>%
group_by(pitch_name)%>%
summarise(avg_spin_rate = mean(release_spin_rate))%>%
mutate(season_type = 'Regular Season')
mets_vs_reg_spinrate <- rbind(mets_avg_spin, reg_spin_rate)
The following plot shows an increase in spin rates for all of Musgrove’s pitches. On average, Musgrove had a higher spin rate than his season’s average. This will obviously open some eyes to check for anything fishy or in other words, sticky.
We see that Slider had a significant jump in RPM’s during the Mets game. Sliders and Curveballs are called breaking balls and have the most movement, therefore, they will have the highest spin rates. We see this and both pitches had an increase in spin rates during the Mets game.
I will compare the spin rates between the regular season and the post season.
The following plots shows that the average spin rate during the post season was higher than the regular season. This does take in account of the Mets game, however, we found that the averages in the post season were still higher.
This shows that it was not only the Mets game, but the other playoff games displayed a higher spin rate for all pitches.
I created a line graph showing over time the average spin rates for each pitch. We see that there are fluctuations in the spin rates, however, upon closer examination we see a large positive trend at the end of September, where the playoffs begin.
#### Distribution of Pitches and Spin Rates
I used boxplots and histograms to show how Musgrove’s pitches were distributed by spin rate. I was able to see the difference in distributions between the regular season and the post season as well as the regular season and the Mets game. I found that the regular season distribution looks to have normality and this will be useful for a statistical analysis later.
We saw that there is a significant difference in average spin rates between the regular season and post season. To give more evidence, we will run a t.test to compare the averages of the Mets game with the regular season and the post season with the regular season.
I ran a log transformation to consider the assumption of normality for a smaller sample size for the post season and the Mets game. A wilcox test was also done to account for these assumptions being ignored.
Below we find that the post season curveball average spin rate is statistically greater than the regular season:
##
## One Sample t-test
##
## data: postseason_curve$release_spin_rate
## t = 9.3494, df = 60, p-value = 1.281e-13
## alternative hypothesis: true mean is greater than 2721.774
## 95 percent confidence interval:
## 2817.258 Inf
## sample estimates:
## mean of x
## 2838.033
##
## One Sample t-test
##
## data: log(postseason_curve$release_spin_rate)
## t = 9.3649, df = 60, p-value = 1.207e-13
## alternative hypothesis: true mean is greater than 7.909039
## 95 percent confidence interval:
## 7.942928 Inf
## sample estimates:
## mean of x
## 7.950287
##
## Wilcoxon signed rank test with continuity correction
##
## data: postseason_curve$release_spin_rate
## V = 1791, p-value = 6.415e-10
## alternative hypothesis: true location is greater than 2721.774
Below we find that the post season slider average spin rate is statistically greater than the regular season:
##
## One Sample t-test
##
## data: postseason_slider$release_spin_rate
## t = 14.033, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 2714.747
## 95 percent confidence interval:
## 2884.354 Inf
## sample estimates:
## mean of x
## 2907.346
##
## One Sample t-test
##
## data: log(postseason_slider$release_spin_rate)
## t = 14.309, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 7.906454
## 95 percent confidence interval:
## 7.966465 Inf
## sample estimates:
## mean of x
## 7.974423
##
## Wilcoxon signed rank test with continuity correction
##
## data: postseason_slider$release_spin_rate
## V = 1373, p-value = 2.412e-10
## alternative hypothesis: true location is greater than 2714.747
Below we find that the mets game curveball average spin rate is statistically greater than the regular season:
##
## One Sample t-test
##
## data: mets_curve$release_spin_rate
## t = 11.272, df = 12, p-value = 4.841e-08
## alternative hypothesis: true mean is greater than 2721.774
## 95 percent confidence interval:
## 2865.861 Inf
## sample estimates:
## mean of x
## 2892.923
##
## One Sample t-test
##
## data: log(mets_curve$release_spin_rate)
## t = 11.595, df = 12, p-value = 3.542e-08
## alternative hypothesis: true mean is greater than 7.909039
## 95 percent confidence interval:
## 7.960509 Inf
## sample estimates:
## mean of x
## 7.969858
##
## Wilcoxon signed rank exact test
##
## data: mets_curve$release_spin_rate
## V = 91, p-value = 0.0001221
## alternative hypothesis: true location is greater than 2721.774
Below we find that the mets game slider average spin rate is statistically greater than the regular season:
##
## One Sample t-test
##
## data: mets_slider$release_spin_rate
## t = 16.745, df = 16, p-value = 7.255e-12
## alternative hypothesis: true mean is greater than 2714.747
## 95 percent confidence interval:
## 2937.96 Inf
## sample estimates:
## mean of x
## 2963.941
##
## One Sample t-test
##
## data: log(mets_slider$release_spin_rate)
## t = 17.436, df = 16, p-value = 3.922e-12
## alternative hypothesis: true mean is greater than 7.906454
## 95 percent confidence interval:
## 7.9853 Inf
## sample estimates:
## mean of x
## 7.994073
##
## Wilcoxon signed rank exact test
##
## data: mets_slider$release_spin_rate
## V = 153, p-value = 7.629e-06
## alternative hypothesis: true location is greater than 2714.747
Below we find that the postseason average spin rate is statistically greater than the regular season:
##
## One Sample t-test
##
## data: mets$release_spin_rate
## t = 6.7065, df = 85, p-value = 1.041e-09
## alternative hypothesis: true mean is greater than 2587.204
## 95 percent confidence interval:
## 2698.011 Inf
## sample estimates:
## mean of x
## 2734.547
##
## One Sample t-test
##
## data: log(mets$release_spin_rate)
## t = 6.2379, df = 85, p-value = 8.303e-09
## alternative hypothesis: true mean is greater than 7.858333
## 95 percent confidence interval:
## 7.896819 Inf
## sample estimates:
## mean of x
## 7.910808
##
## Wilcoxon signed rank test with continuity correction
##
## data: mets$release_spin_rate
## V = 3213, p-value = 3.764e-09
## alternative hypothesis: true location is greater than 2587.204
Overall, we found in all cases that the average spin rates during the post season and the mets game were statistically greater than the regular season average.
Based on the recurring pattern and analysis of higher spin rates during the Mets game compared to the regular season, we can believe that Joe Musgrove could have been using a foreign substance to increase his pitching spin rate. The statistics and the visuals show these increases can infer a substance use.
However, we know that the umpires did a substance check on Joe Musgrove and they found nothing suspicious. Is this just complacency by the umpires? Or did they miss a spot? We will never know unless Musgrove confesses to tell the truth. All we know is that Musgrove’s spin rates were higher than average. Therefore, he was either cheating or he was just having a great game as an ace.
There are other factors that can attribute to this higher spin rate. For example, there were other MLB players indicating that players use “Red Hot” on their ears to get them psyched up and pitch or perform better. Another factor is that it was the playoffs and these are the most important games for players. Musgrove must have been focused and more psyched to do his best, therefore, we see higher numbers during the playoffs.
An argument that goes against the use of foreign substances is that there was not an enormous change in spin rate in Musgrove’s game compared to recent studies on the use of foreign substances. Another study found that spin rates increased from 1600 rpm to 2200 rpm, a 600 rpm increase. This is a huge difference compared to Musgrove’s increase of 250 rpm. Therefore, it could be the case that there was no foreign substance.
In conclusion, it is possible that Joe Musgrove was cheating based on the higher numbers across the board. However, we must consider the situation and other factors. Cheating is never good for the game, but it is always great to see a pitcher perform at their best.