Abstract
Baseball is a game of stats, more so in recent years with the advent of “big data”. A wealth of baseball data is freely available and contains spatial information connected to location coordinates for individual batted balls captured by high-speed cameras. This work focuses on modeling and visualization of baseball data via point pattern analysis. We use kernel smoothing to produce spatial heat maps of batted balls for the Major League Baseball players and study the evolution of these patterns over time. After rejecting the hypothesis of complete spatial randomness, we employ kernel smoothing to model and visualize the space-varying intensity function of the resulting non-homogeneous Poisson process. The procedure includes a routine for the selection of optimal smoothing parameters. The resulting heat maps accurately identify the unique patterns in which hitters hit baseball on the baseball field.Statistics have been playing a significant role in baseball as various types of data are collected during every game for the purpose of analysis, evaluation and prediction of the players’ performance. Batting average and on-base percentage are some examples of the statistics that are used to evaluate the offensive performance of a hitter whereas earned run average and the number of strikeouts are some examples of the statistics that help explain the performance of a pitcher. However, these statistics are not sufficient to fully understand the player’s performance. Therefore, the use and analysis of spatial statistics improve the analysis of players’ performance because they offer additional information about those players’ skills. Moreover, the visualizations of spatial statistics such as heat maps provide better insights about player’s ability and performance. Cross and Sylvan (2015) used a covariance function to model spatial batting ability and developed a method for producing more accurate heat maps than the traditional ones through the application of geostatistics. The heat maps reveal the strengths and weaknesses of a hitter based on the locations of pitches inside and outside the strike zone. Heat maps are used not just for hitters but also for pitchers. Wilcox and Mannshardt (2013) developed a model for the intensity function for pitchers based on the location of pitches and identified the spatial pattern in which pitchers throw his pitches from heat maps based on those intensity functions. The objective of this paper is to analyze the point pattern data of the batted balls of various players for different seasons and understand their offensive performance through the analysis of heat maps.
# importing necessary R packages
library(tidyverse)
library(dplyr)
library(baseballr)
library(sportyR)
library(splancs)
library(spatstat)
library(RColorBrewer)
library(kableExtra)
library(rmarkdown)
library(ggpubr)
I used the data of the players who won the Silver Slugger Award between 2018 and 2022. They are Freddie Freeman, Kyle Schwarber, Bryce Harper, Paul Goldschmidt, Aaron Judge, Mookie Betts, Josh Bell, Jose Ramirez, and Francisco Lindor. This award is given to the best offensive players at their positions and they were selected by the managers and coaches around the league (Casella, 2022). I did not extract the data from the 2020 season since it was a shortened season and the sample size is significantly smaller other seasons, which made it difficult to compare with the data from other years. The function scrape_statcast_savant_batter() from the baseballr package helps extract the data for each of these players from baseballsavant.mlb.com by entering the corresponding batter ID, and the start date and end date of each of 2018, 2019, 2021 and 2022 seasons.
After some data cleaning, I extracted the variables hc_x and hc_y which are the x- and y-coordinates of the batted balls for each at-bat (observation), respectively, from the dataset of each player. Then I created new variables hc_x_adjusted and hc_y_adjusted, which are the modified values of hc_x and hc_y, using the formula discussed by Bill Petti (2017):
\[hc\_x\_adjusted= hc\_x - 125.42, \tag{1}\] \[hc\_y\_adjusted = 198.27 - hc\_y, \tag{2}\]
This adjustment of x- and y-coordinates flips the points around and make the origin (0,0) the home plate of the baseball field, which would simplify the coding in R for producing the point pattern dataset and heat maps in the later sections.
Before generating the point pattern data of batted balls, I first look at the spatial data of batted balls as they are using scatter plots. I chose and compared the batted balls of three players, Paul Goldschmidt, Freddie Freeman and Josh Bell. Goldschmidt is a right-handed hitter, Freeman is a left-handed hitter and Bell is a switch hitter, who can hit from both sides of the batter’s box.
Batted balls for Freeman, Bell and Goldschmidt from 2018, 2019, 2021, 2022 seasons on xy-plane
Figure 1 shows the scatter plot of the batted balls for Paul Goldschmidt, Freddie Freeman and Josh Bell from 2018, 2019, 2021 and 2022 seasons. The x-axis represents the x-coordinates of the batted balls and the y-axis is the y-coordinates of the batted balls. The V-shaped lines represent the foul lines of the baseball field. Each color of points represents different outcomes for the batted ball. Based on these scatter plots, there is a cluster of points on the left side of the infield of the baseball field for Goldschmidt for each season. On the other hand, there is a cluster of points on the right side of the infield of the baseball field for Freeman. As for the data for Bell for each season, there is a cluster of points on both sides of the infields of the baseball field, which implies that Bell’s data will be split and analyzed as a left-handed hitter and right-handed hitter separately in the later section.
In order to analyze how the players hit during the four seasons with spatial statistics, the point pattern dataset for each player was created using the function ppp() from the Spatstat package. This function requires the user to enter x- and y-coordinates of the batted balls for each at-bat as well as the ranges of the x-coordinates and y-coordinates. In order for the point pattern dataset to contain all the points of the batted balls from the original dataset of the players, the range of the x-coordinates was determined based on the leftmost and rightmost points of the batted balls. Similarly, the range of the y-coordinates was determined based on the coordinates of the highest and lowest points in the batted balls. For example in the dataset for Freddie Freeman, the x-coordinates of the leftmost point and rightmost point are, -109.48 and 118.50, respectively. Therefore, the range of the x-coordinates is \([-110,119]\). Likewise, the y-coordinates of the highest and lowest points are 188.23 and -21.77, respectively. Therefore, the range of the y-coordinates is \([-22,189]\). As a result of choosing the values for the ranges of x- and y-coordinates, the dimension of the point pattern dataset for Freddie Freeman is 229 units by 211 units. These point pattern dataset for the players will be used for the quadrat test for completely spatial randomness and for producing heat maps in the later sections.
In this section, I will discuss the methods of how the point pattern dataset of the batted balls are analyzed.
The first analysis of the point pattern dataset of the batted balls is the test for complete spatial randomness. In order to conduct this test, the study area where the points are plotted is divided into the squares of equal size. Then, for each square, the number of points are counted to compute the intensity. After computing the intensity, the test statistic is calculated using the following formula:
\[\frac{(m-1)s^2}{\bar{x}}, \tag{3}\]
where m is the number of quadrats, \(s^2\) is the observed variance and the \(\bar{x}\) is the observed mean. Then, this test statistic is compared with the critical value which is the \(\chi^2\)-distribution with m-1 degrees of freedom.
The three-dimensional test for complete spatial randomness involves time as the third dimension. Therefore, the study area is divided into prisms of equal size. For this test, there are 36 prisms of equal size because the study area for each season of data is divided into nine squares of equal size just as the two-dimensional test and there are four seasons of data. The calculation of the test statistic is also different from the two-dimensional test because the expected average in the formula for the intensity is the total number of the observations divided by 36 (the number of study areas). After computing the intensity, the test statistic is calculated by the formula:
\[\sum_{i=1}^{3}\sum_{j=1}^{3}\sum_{t=1}^{4} \frac{(x_{i,j,t} - \bar{x})^2}{\bar{x}}, \tag{4}\]
where \(x_{i,j,t}\) is the number of points in the study area on the \(i^{th}\) row and \(j^{th}\) column at year t where i,j =1,2,3 and t = 2018, 2019, 2021, 2022.
Unlike the test statistic for the two-dimensional test, the test statistic for the three-dimensional test is the sum of the intensity from each study area. Then, I compare this statistic with the critical value which is the \(\chi^2\)-distribution with (i-1)(j-1)(t-1) degrees of freedom.
Heat maps are one of the visualization tools for the analysis of spatial data that help identify the location of the cluster of observations. In order to create heat maps using R, I first need to determine the bandwidth for the kernel estimation for the intensity in the point pattern dataset for each player. The function bw.diggle() in the spatstat package calculates the bandwidth that minimizes the mean squared error.
The algorithm in this function utilizes the method of Berman and Diggle (1989) to calculate the quantity
\[M(\sigma) = \frac{MSE(\sigma)}{\lambda^2} - g(0), \tag{5}\] as a function of bandwidth where where is \(MSE(\sigma)\) the mean squared error at bandwidth \(\sigma\), \(\lambda\) is the mean intensity, and g is the pair correlation function.
Then, this bandwidth is used to create density functions with the point pattern data. After creating these density functions, I plot them using plot() to produce heatmaps.
Using the point pattern dataset for Freddie Freeman, we test for complete spatial randomness. Figure 2 below shows the plots point pattern dataset of the batted balls for Freddie Freeman for each season on a 3x3 grid. Clearly, there are overlapping points in each plot.
Figure 2: Point pattern data of batted balls for Freeman on the 3x3 grid
Table 1 below summarizes the test for complete spatial randomness for the point pattern data for Freddie Freeman for each year.
kbl(Table1_FF, booktabs = T, caption ="Summary of Quadrat Test for Point Pattern Dataset for Freddie Freeman on the 3x3 Grid") %>% kable_styling(latex_options =c("HOLD_position", "striped"))
| Year | 2018 | 2019 | 2021 | 2022 |
| Test Statistic | 295.1 | 239.25 | 355.21 | 316.88 |
| P-value | 0 | 0 | 0 | 0 |
| Reject the Null / Fail to Reject the Null | Reject | Reject | Reject | Reject |
The critical value for this test is 9.448 and this value is significantly less than the test statistics computed. In addition, the p-value for this test is 0 for each point pattern dataset. Based on these results, we reject the null hypothesis that the point pattern dataset for Freddie Freeman from 2018, 2019, 2021 and 2022 seasons exhibit complete spatial randomness. Therefore, there is a spatial pattern in the way he hit for each season.
Using (2), the test statistic is computed and it is 1218.778. This value is significantly greater than the critical value for this test, which is 21.026. Therefore, we reject the null hypothesis that the point pattern data for Freddie Freeman from 2018, 2019, 2021 and 2022 seasons exhibit complete spatial randomness with time. Therefore, there is a spatial pattern in the way he hit for each season.
In this section, I present the heat maps and discuss the spatial pattern of the batted balls for each player. Each player’s heat map is labeled with the year and the abbreviated name of the team/teams he played for during that year. The V-shaped lines represent the foul lines of the baseball field which will help us locate the cluster of points in the corresponding area on the actual baseball field.
Freddie Freeman
Freddie Freeman is a first baseman who played for the Atlanta Braves in 2018, 2019 and 2021, and for the Los Angeles Dodgers in 2022. He had won the Silver Slugger Award for 2019 and 2021 season and was a runner-up for the batting title in 2022 with a .325 batting average. Figure 3 shows the heat maps of the batted balls for Freeman.
# legends for the heatmaps
LegendColorsFF <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.181))
# heat maps
par(mfrow=c(2,2), mar = c(1, 1, 1, 0), adj = 0.5)
plot(FF_density_18, main = "2018 (ATL)", col = LegendColorsFF)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FF_density_19, main = "2019 (ATL)", col = LegendColorsFF)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FF_density_21, main = "2021 (ATL)", col = LegendColorsFF)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FF_density_22, main = "2022 (LAD)", col = LegendColorsFF)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 3: Heat maps of batted balls for Freddie Freeman from 2018, 2019, 2021, 2022 seasons
The orange and red areas on the right side of the infield of the baseball field in the maps indicate that Freeman tended to hit towards the right side of the infield of the baseball field for each of the seasons. Even though he played for the Dodgers after playing for the Braves in the three previous seasons, the way he hit baseball did not change drastically.
Kyle Schwarber
Kyle Schwarber is an outfielder who played for the Chicago Cubs in 2018 and 2019, the Washington Nationals and the Boston Red Sox in 2021, and the Philadelphia Phillies in 2022. He hit most home runs in the National League with 46 home runs and won the Silver Slugger Award in 2022. Figure 4 shows the heat maps of the batted balls for Schwarber.
# legends for the heatmaps
LegendColorsKS <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0, 0.15189))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(KS_density_18, main = "2018 (CHC)", col = LegendColorsKS)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(KS_density_19, main = "2019 (CHC)", col = LegendColorsKS)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(KS_density_21, main = "2021 (WSN & BOS)", col = LegendColorsKS)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(KS_density_22, main = "2022 (PHI)", col = LegendColorsKS)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 4: Heat maps of batted balls for Kyle Schwarber from 2018, 2019, 2021, 2022 seasons
The orange and red areas on the right side of the infield of the baseball field in the maps indicate that Schwarber tended to hit to the right side of the infield of the baseball field for each of the seasons. Even though he played for four different teams during these four seasons, there does not seem to be a significant change in the way he hit baseball.
Bryce Harper
Bryce Harper played for the Washington Nationals in 2018 and the Philadelphia Phillies in 2019, 2021 and 2022. He won the Silver Slugger Award in 2018 and is the National League most valuable player for 2021. Figure 5 show the heat maps of the batted balls for Harper.
# legends for the heatmaps
LegendColorsBH <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.20413))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(BH_density_18, main = "2018 (WSN)", col = LegendColorsBH)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(BH_density_19, main = "2019 (PHI)", col = LegendColorsBH)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(BH_density_21, main = "2021 (PHI)", col = LegendColorsBH)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(BH_density_22, main = "2022 (PHI)", col = LegendColorsBH)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 5: Heat maps of batted balls for Bryce Harper from 2018, 2019, 2021, 2022 seasons
The location of the cluster of points is harder to identify based on the heat map from 2021 season due to the smaller sample size compared to the other three seasons. However, the yellower areas in the baseball field on the map indicate that there is a cluster of points on the right side of the infield of the baseball field as well as in the left side of the outfield. This implies that Harper tended to hit to these areas in the actual baseball field for 2021 season. As for the other seasons, he tended to hit to the right side of the infield of the baseball field as there are clusters of points on the right side of the infield of the baseball field in the maps.
Overall, all of the left-handed hitters mentioned above generally hit toward the left side of the infield of the baseball field.
Paul Goldschmidt
Paul Goldschmidt is a first baseman who played for the Arizona Diamondbacks in 2018 and the St. Louis Cardinals in 2019, 2021 and 2022. He had won the Silver Slugger Award in 2018 and 2022 and was voted the National League Most Valuable Player in 2022. Figure 6 show the heat maps of the batted balls for Goldschmidt.
# legends for the heatmaps
LegendColorsPG <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.191))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(PG_density_18, main = "2018 (ARI)", col = LegendColorsPG)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(PG_density_19, main = "2019 (STL)", col = LegendColorsPG)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(PG_density_21, main = "2021 (STL)", col = LegendColorsPG)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(PG_density_22, main = "2022 (STL)", col = LegendColorsPG)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 6: Heat maps of batted balls for Goldschmidt from 2018, 2019, 2021, 2022 seasons
The location of the red and oranges areas in the heat maps imply that Goldschmidt tended to hit to the left side of the infield of the baseball field. The way he hit did not change between the 2018 and 2019 seasons even though he played for two different teams.
Mookie Betts
Mookie Betts is an outfielder who played for the Boston Red Sox in 2018 and 2019 and the Los Angeles Dodgers in 2021 and 2022. He had won the Silver Slugger Award in 2018, 2019 and 2022, and was the American League Most Valuable Player in 2018. Figure 7 show the heat maps of the batted balls for Betts.
# legends for the heatmaps
LegendColorsMB <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0, 0.21629))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(MB_density_18, main = "2018 (BOS)", col = LegendColorsMB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(MB_density_19, main = "2019 (BOS)", col = LegendColorsMB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(MB_density_21, main = "2021 (LAD)", col = LegendColorsMB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(MB_density_22, main = "2022 (LAD)", col = LegendColorsMB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 7: Heat maps of batted balls for Betts from 2018, 2019, 2021, 2022 seasons
Betts tended to hit to the left side of the baseball field as the red and orange areas indicate the clusters of points on the left side of the infield of the baseball field in the maps. The way he hit did not change significantly despite playing for two different teams.
Aaron Judge
Aaron Judge is an outfielder for the New York Yankees who was voted the American League Most Value Player for the 2022 season and holds the home run record in the American League with 62 home runs. He had won the Silver Slugger Award for the 2021 and 2022 seasons. Figure 8 show the heat maps of the batted balls for Judge.
# legends for the heatmaps
LegendColorsAJ <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.16728))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(AJ_density_18, main = "2018 (NYY)", col = LegendColorsAJ)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(AJ_density_19, main = "2019 (NYY)", col = LegendColorsAJ)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(AJ_density_21, main = "2021 (NYY)", col = LegendColorsAJ)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(AJ_density_22, main = "2022 (NYY)", col = LegendColorsAJ)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 8: Heat maps of batted balls for Judge from 2018, 2019, 2021, 2022 seasons
Just as Betts and Goldschmidt, Judge tended to hit to the left side of the infield of the baseball field for each of the four seasons as clusters of points are on the left side of the infield of the baseball field in the maps.
The right-handed hitters discussed above generally hit toward the left side of the infield of the baseball field.
The analysis of the heat maps for switch hitters are slightly different than those discussed previously. First, I examine the heat map for all batted balls for each of these switch hitters. Then, I split the dataset for each player based on the side of the batter’s box where he stands and discuss the resulting heat maps.
Josh Bell
Josh Bell is a first baseman who played for the Pittsburgh Pirates in 2018 and 2019 and the Washington Nationals in 2021 and part of 2022, and the San Diego Padres in the remainder of 2022. He won the Silver Slugger Award in 2022 as a designated hitter. Figure 9 below show the heat maps of the batted balls for Bell.
# legends for the heatmaps
LegendColorsJB <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.14603))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(JB_density_18, main = "2018 (PIT)", col = LegendColorsJB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_19, main = "2019 (PIT)", col = LegendColorsJB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_21, main = "2021 (WSN)", col = LegendColorsJB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_22, main = "2022 (WSN & SDP)", col = LegendColorsJB)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 9: Heat maps of batted balls for Bell from 2018, 2019, 2021, 2022 seasons
The red and orange areas on the both sides of the infield of the baseball field in the maps indicate that there are clusters of points on the both side of the infield of the baseball field. This implies that Bell hit evenly to both sides of the infield of the baseball field during these four seasons.
Francisco Lindor
Francisco Lindor is a shortstop for the Cleveland Indians between 2018 and 2019 and the New York Mets between 2021 and 2022. He had won the Silver Slugger Award in 2018 and 2019. Figure 10 below show the heat maps of the batted balls for Lindor.
# legends for the heatmaps
LegendColorsFL <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.25136))
# heat maps
par(par(mfrow=c(2,2)), mar = c(1, 1, 1, 0), adj = 0.5)
plot(FL_density_18, main = "2018 (CLE)", col = LegendColorsFL)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_19, main = "2019 (CLE)", col = LegendColorsFL)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_21, main = "2021 (NYM)", col = LegendColorsFL)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_22, main = "2022 (NYM)", col = LegendColorsFL)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 10: Heat maps of batted balls for Lindor from 2018, 2019, 2021, 2022 seasons
A cluster of points is more difficult to identify from the heat map from the 2021 season due to the smaller sample size compared to the other three seasons. However, the yellower areas on both sides of the baseball field in the map indicate that there is a cluster of points on both sides of the infield of the baseball field. This implies that Lindor evently hit to both sides of the infield of the baseball field during 2021 season. Likewise for the other three seasons, the red and orange areas in the maps indicate that Lindor evenly hit to both sides of the infield of the baseball field.
Jose Ramirez
Jose Ramirez is a third baseman for the Cleveland Indians/Guardians who had won the Silver Slugger Award in 2018 and 2022. Figure 11 below show the heat maps of the batted balls for Ramirez.
# legends for the heatmaps
LegendColorsJR <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.27217))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(JR_density_18, main = "2018 (CLE)", col = LegendColorsJR)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_19, main = "2019 (CLE)", col = LegendColorsJR)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_21, main = "2021 (CLE)", col = LegendColorsJR)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_22, main = "2022 (CLE)", col = LegendColorsJR)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 11: Heat maps of batted balls for Ramirez from 2018, 2019, 2021, 2022 seasons
A cluster of points is difficult to identify from the heat map from 2021 season due to the smaller sample size compared to the other three seasons. However, the yellower areas on both sides of the baseball field in the map indicate that there is a cluster of points on both sides of the infield area of the baseball field. As for the other three seasons, the red and orange areas in the maps indicate that Ramirez hit to both sides of the infield of the baseball field.
Josh Bell (as a left-handed hitter)
Now I analyze each switch hitter as a left-handed hitter. Figure 12 below show the heat maps of the batted balls for Bell when he hit from the left side of the batter’s box.
# legends for the heatmaps
LegendColorsJB_L <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.12737))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(JB_density_18_L, main = "2018 (PIT)", col = LegendColorsJB_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_19_L, main = "2019 (PIT)", col = LegendColorsJB_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_21_L, main = "2021 (WSN)", col = LegendColorsJB_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_22_L, main = "2022 (WSN & SDP)", col = LegendColorsJB_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 12: Heat maps of batted balls for Bell as a left-handed hitter from 2018, 2019, 2021, 2022 seasons
For each season, Bell tended to hit to the right side of the infield of the baseball field when he bats from the left side of the batter’s box as there are clusters of points on the the right side of the infield of the baseball field. Despite playing for three different teams across four seasons, the way he hit baseball did not change significantly.
Francisco Lindor (as a left-handed hitter)
Figure 13 below show the heat maps of the batted balls for Lindor when he hit from the left side of the batter’s box.
# legends for the heatmaps
LegendColorsFL_L <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.23904))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(FL_density_18_L, main = "2018 (CLE)", col = LegendColorsFL_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_19_L, main = "2019 (CLE)", col = LegendColorsFL_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_21_L, main = "2021 (NYM)", col = LegendColorsFL_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_22_L, main = "2022 (NYM)", col = LegendColorsFL_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 13: Heat maps of batted balls for Lindor as a left-handed hitter from 2018, 2019, 2021, 2022 seasons
A cluster of points is difficult to identify from the heat map from the 2021 season due to the smaller sample size compared to the other three seasons. However, the yellower areas on both sides of the baseball field in the map indicate that there is a cluster of points on the right side of the infield of the baseball field. This indicates that Lindor tended to hit to the right side of the infield of the baseball field in the 2021 season as a left-handed hitter. Similarly for the other three seasons, he tended to hit to the right side of the infield of the baseball field as there are clusters of points in the same area in each of the maps.
Jose Ramirez (as a left-handed hitter)
Figure 14 below show the heat maps of the batted balls for Ramirez when he hit from the left side of the batter’s box.
# legends for the heatmaps
LegendColorsJR_L <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.31029))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(JR_density_18_L, main = "2018 (CLE)", col = LegendColorsJR_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_19_L, main = "2019 (CLE)", col = LegendColorsJR_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_21_L, main = "2021 (CLE)", col = LegendColorsJR_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_22_L, main = "2022 (CLE)", col = LegendColorsJR_L)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 14: Heat maps of batted balls for Ramirez as a left-handed hitter from 2018, 2019, 2021, 2022 seasons
A cluster of points are difficult to identify from the heat map from 2021 season due to the smaller sample size compared to the other three seasons. However, the yellower area on the right side of the infield of the baseball field indicates that there is a cluster of points in that area. Therefore, Ramirez tended to hit to the right side of the baseball field in the 2021 season when he batted from the left side of the batter’s box. As for the other three seasons, Ramirez also tended to hit to the right side of the baseball field when he bats from the left side of the batter’s box as there are clusters of points on the the right side of the maps.
Josh Bell (as a right-handed hitter)
Figure 15 below show the heat maps of the batted balls for Bell when he hit from the right side of the batter’s box.
# legends for the heatmaps
LegendColorsJB_R <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.08734))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(JB_density_18_R, main = "2018 (PIT)", col = LegendColorsJB_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_19_R, main = "2019 (PIT)", col = LegendColorsJB_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_21_R, main = "2021 (WSN)", col = LegendColorsJB_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JB_density_22_R, main = "2018 (WSN & SDP)", col = LegendColorsJB_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 15: Heat maps of batted balls for Bell as a right-handed hitter from 2018, 2019, 2021, 2022 seasons
For each season, Bell tended to hit to the left side of the infield of the baseball field when he is a right-handed hitter as the clusters of points are located on the the left side of the infield of the baseball field in the maps.
Francisco Lindor (as a right-handed hitter)
Figure 16 below show the heat maps of the batted balls for Lindor when he hit from the right side of the batter’s box.
# legends for the heatmaps
LegendColorsFL_R <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.14599))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(FL_density_18_R, main = "2018 (CLE)", col = LegendColorsFL_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_19_R, main = "2019 (CLE)", col = LegendColorsFL_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_21_R, main = "2021 (NYM)", col = LegendColorsFL_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(FL_density_22_R, main = "2022 (NYM)", col = LegendColorsFL_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 16: Heat maps of batted balls for Lindor as a right-handed hitter from 2018, 2019, 2021, 2022 seasons
A cluster of points is difficult to identify from the heat map from 2021 season due to the smaller sample size compared to the other three seasons. However, the yellower area on the left side of the infield of the baseball field indicates that there is a cluster of points in the area. This implies that Lindor tended to hit to the left side of the baseball field when he is a right-handed hitter. As for the other three seasons, Lindor also tended to hit to the right side of the baseball field as there are clusters of points on the the right side of the infield of the baseball field of the heat maps.
Jose Ramirez (as a right-handed hitter)
Figure 17 below show the heat maps of the batted balls for Ramirez when he hit from the right side of the batter’s box.
# legends for the heatmaps
LegendColorsJR_R <- colourmap(col = heat.colors(12, rev = TRUE), range = c(0,0.09504))
# heat maps
par(par(mfrow=c(2,2)),mar = c(1, 1, 1, 0), adj = 0.5)
plot(JR_density_18_R, main = "2018 (CLE)", col = LegendColorsJR_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_19_R, main = "2019 (CLE)", col = LegendColorsJR_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_21_R, main = "2021 (CLE)", col = LegendColorsJR_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
plot(JR_density_22_R, main = "2022 (CLE)", col = LegendColorsJR_R)
segments(0, 0, -98, 98)
segments(0, 0, 98, 98)
Figure 17: Heat maps of batted balls for Ramirez as a right-handed hitter from 2018, 2019, 2021, 2022 seasons
A cluster of points is difficult to identify from the heat map from 2019 and 2021 season due to the smaller sample size compared to the other two seasons. However, the yellower regions on the left side of the infield of the baseball field in the map indicates that there are cluster of points on the left side of the baseball field in the maps. Therefore, Ramirez tended hit to the right side of the infield of the baseball field in the 2019 and 2021 seasons. As for the 2018 and 2022 seasons, Ramirez tended to hit to the left side of the baseball field when he hit as a right-handed hitter as there are clusters of points in the left-side of the infield of the baseball field in the heat maps.
By splitting the dataset and analyzing the resulting heat maps, I found that the switch hitters mentioned above hit just as the strictly left-handed and right-handed hitters.
The analysis of the point pattern dataset of the batted balls help understand the hitters’ performance and abilities. The failure to reject the null hypothesis in the test for complete spatial randomness indicated that there is a clustered patterns in batted balls. The heat maps helped identify where those clusters are on the baseball field. The left-handed hitters tend to hit to the right side of the infield of the baseball field whereas the right-handed hitters tend to hit to the left side of the infield of the baseball field. There were clusters of points on both sides of the infield of the baseball field for the point pattern data for switch hitters. However, after splitting the dataset by where they stand in the batter’s box, the clusters of points were found in the right side of the infield of the baseball field for the data as a left-handed hitter and the clusters of points were found in the left side of the infield of the baseball field for the data as a right-handed hitter. By comparing the heat maps from different seasons, I found that the ways in which the players hit baseball did not drastically change over time even when they played for different teams. The future work will consider the analysis of the batted balls more in depth by splitting the dataset by the pitches they hit or by the types of pitchers they faced (for non-switch hitters). In addition, I look forward to analyze the batted balls from the 2023 seasons and determine whether or not the new rule which bans the use of defensive shifts affect the way players hit baseballs by comparing them with the batted balls from the past seasons.
Berman, M. and Diggle, P. (1989) Estimating weighted integrals of the second-order intensity of a spatial point process. Journal of the Royal Statistical Society, series B 51, 81–92.
Casella, P. (2022, November 11). Here are the 2022 silver slugger winners. MLB.com. https://www.mlb.com/news/silver-slugger-award-winners-2022
Cross, J., & Sylvan, D. (2015). Modeling spatial batting ability using a known covariance matrix. Journal of Quantitative Analysis in Sports, 11, 155 - 167.
Diggle, P.J. (1985) A kernel method for smoothing point process data.Applied Statistics (Journal of the Royal Statistical Society, Series C) 34 (1985) 138–147.
Diggle, P.J. (2003) Statistical analysis of spatial point patterns, Second edition. Arnold.
Petti, B. (2018, January 23). Research notebook: New format for Statcast Data Export at baseball savant. The Hardball Times. Retrieved December 16, 2022, from https://tht.fangraphs.com/research-notebook-new-format-for-statcast-data-export-at-baseball-savant/
Wilcox, A.G., & Mannshardt, E. (2013). Baseball scouting reports via a marked point process for pitch types. https://repository.lib.ncsu.edu/bitstream/handle/1840.4/8556/mimeo2655_Wilcox.pdf?sequence=1&isAllowed=y