Data Cleaning and Wrangling
Load Packages
library(dplyr)
library(tidyverse)
library(ggplot2)
library(cowplot)
Load Data
PitchData <- read_csv("C:/Users/hjpar/OneDrive/Desktop/CalebPark/Game2Analysis/AnalyticsQuestionnairePitchData.csv")
Observe and Clean Data
# Pitch Data for Pitcher 7 and 1 as they are the starting pitchers
PitchData2 <- PitchData %>%
filter(GamePk == 2 & PitcherId %in% c(7, 1)) %>%
mutate(across(7, as.character)) %>%
# arrange the data to be in pitch order
arrange(Inning, IsTop) %>%
# get rid of all pick-off attempts
na.omit()
Pitcher Comparison
Bar Plot
# make a frequency table to see how many pitches for each pitch thrown by both pitchers
freq_table <- PitchData2 %>%
count(PitcherId, PitchType) %>%
group_by(PitchType)
freq_table
## # A tibble: 10 × 3
## # Groups: PitchType [6]
## PitcherId PitchType n
## <chr> <chr> <int>
## 1 1 CH 11
## 2 1 CU 12
## 3 1 FC 10
## 4 1 FF 46
## 5 1 SL 17
## 6 7 CU 13
## 7 7 FC 10
## 8 7 FF 21
## 9 7 SI 20
## 10 7 SL 11
# plot to display this table
freq_table %>%
ggplot(aes(x = PitcherId, y = n, fill = PitchType)) +
geom_col(stat = "identity", position = "dodge") +
ylab("Number of Pitches") +
theme_bw()

- It seems that Pitcher 1 possesses an arsenal that relies on lots of
fastballs to set up their breaking pitches whereas Pitcher 7 likes to
mix their pitches using all their pitches evenly
- The distribution is shown through the bar plot
Heat Map
# For Pitcher 1
p1 <- PitchData2 %>%
group_by('Pitch' = PitchType) %>%
filter(PitcherId == 1) %>%
ggplot(aes(x = TrajectoryLocationX, y = TrajectoryLocationZ)) +
stat_density_2d(aes(fill = after_stat(density)), geom = 'raster', contour = F, na.rm = TRUE) +
scale_fill_gradientn(colors = c("blue", "white", "red")) +
annotate("rect", xmin = -2.5, xmax = 2.5,
ymin = 0,ymax = 8,
fill= NA,color= "black",
alpha = .1) +
ylim(-1, 9) + xlim(-4, 4) +
theme_bw() +
theme_classic() +
xlab("Horizontal Pitch Location") +
ylab("Vertical Pitch Location") +
ggtitle("", subtitle = "Pitcher's Perspective") +
guides(fill = 'none')
# Pitcher 7
p2 <- PitchData2 %>%
group_by('Pitch' = PitchType) %>%
filter(PitcherId == 7) %>%
ggplot(aes(x = TrajectoryLocationX, y = TrajectoryLocationZ)) +
stat_density_2d(aes(fill = after_stat(density)), geom = 'raster', contour = F, na.rm = TRUE) +
scale_fill_gradientn(colors = c("blue", "white", "red")) +
annotate("rect", xmin = -1.5, xmax = 2.5,
ymin = 0,ymax = 5,
fill= NA,color= "black",
alpha = .1) +
ylim(-1.5, 6.5) + xlim(-3, 4) +
theme_bw() +
theme_classic() +
xlab("Horizontal Pitch Location") +
ylab("Vertical Pitch Location") +
ggtitle("", subtitle = "Pitcher's Perspective") +
guides(fill = 'none')
# put plots next to each other
plot_grid(p1, p2, labels = c('Pitcher 1', 'Pitcher 7'), label_size = 12)

- Pitch location difference can be seen in these heat maps. Pitcher 1
enjoys staying a little lower in the zone with a hint of pitches up in
the zone, most of the pitches up being fastballs
- Pitcher 7 has more middle-middle stuff and fills up the plate more
with a diverse arsenal of pitches. It seems that Pitcher 7 is a little
more spread out in terms of which zones their pitches go to compared to
Pitcher 1
Pitch Velocity Differences
PitchData2 %>%
filter(PitchType != 'CH' & PitchType != 'SI') %>%
ggplot(aes(PitcherId, ReleaseSpeed)) +
geom_jitter() +
facet_wrap(~ PitchType)

- Through these plots, all velocities can be mapped for every pitch by
both pitchers. Generally it seems that there is more fire power for
Pitcher 1 than 7, especially in the FB and FC category
- This probably means Pitcher 1 relies heavily on the fastball to set
up their breaking pitches whereas Pitcher 7 relies more on sequencing to
trick with their batters
Velo Through Innings
# data wrangling
Vel_Inn1 <- PitchData2 %>%
filter(PitcherId == "1" & PitchType == "FF") %>%
group_by(Inning, PitchType) %>%
summarize(
avg_vel = mean(ReleaseSpeed, na.rm = TRUE)
) %>%
mutate(PitcherId = "1")
Vel_Inn7 <- PitchData2 %>%
filter(PitcherId == "7" & PitchType == "FF") %>%
group_by(Inning, PitchType) %>%
summarize(
avg_vel = mean(ReleaseSpeed, na.rm = TRUE)
) %>%
mutate(PitcherId = "7")
# plot
ggplot() +
geom_line(Vel_Inn1, mapping = aes(x = Inning, y = avg_vel, color = PitcherId), linewidth = 2) +
geom_line(Vel_Inn7, mapping = aes(x = Inning, y = avg_vel, color = PitcherId), linewidth = 2) +
xlab("Inning") +
ylab("Average Velocity") +
scale_x_continuous(limits = c(1, 7),
breaks = seq(1, 7, 1)) +
labs(title = "Fastball Avg Velocity By Inning") +
theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5), axis.text = element_text(size = 12)) +
theme(legend.position = "bottom", legend.text = element_text(size = 12), axis.title = element_text(size = 14)) +
theme_bw()

- It can be observed through a line graph that Pitcher 1’s fastball
over the course of this one game has been faster than Pitcher 7
- One interesting note is that around inning 4 and 6 for Pitcher 1,
there is a dip in the velocity. This may be due to the Pitcher facing
the batting order second and third time through. It most likely means
their off-speed pitches were being mixed more in the later innings
compared to the first three innings