Data Cleaning and Wrangling

Load Packages

library(dplyr)
library(tidyverse)
library(ggplot2)
library(cowplot)

Load Data

PitchData <- read_csv("C:/Users/hjpar/OneDrive/Desktop/CalebPark/Game2Analysis/AnalyticsQuestionnairePitchData.csv")

Observe and Clean Data

# Pitch Data for Pitcher 7 and 1 as they are the starting pitchers
PitchData2 <- PitchData %>% 
  filter(GamePk == 2 & PitcherId %in% c(7, 1)) %>% 
  mutate(across(7, as.character)) %>% 
  # arrange the data to be in pitch order
  arrange(Inning, IsTop) %>% 
  # get rid of all pick-off attempts
  na.omit()

Pitcher Comparison

Bar Plot

# make a frequency table to see how many pitches for each pitch thrown by both pitchers
freq_table <- PitchData2 %>% 
  count(PitcherId, PitchType) %>% 
  group_by(PitchType)
freq_table
## # A tibble: 10 × 3
## # Groups:   PitchType [6]
##    PitcherId PitchType     n
##    <chr>     <chr>     <int>
##  1 1         CH           11
##  2 1         CU           12
##  3 1         FC           10
##  4 1         FF           46
##  5 1         SL           17
##  6 7         CU           13
##  7 7         FC           10
##  8 7         FF           21
##  9 7         SI           20
## 10 7         SL           11
# plot to display this table
freq_table %>% 
  ggplot(aes(x = PitcherId, y = n, fill = PitchType)) +
  geom_col(stat = "identity", position = "dodge") +
  ylab("Number of Pitches") +
  theme_bw()

  • It seems that Pitcher 1 possesses an arsenal that relies on lots of fastballs to set up their breaking pitches whereas Pitcher 7 likes to mix their pitches using all their pitches evenly
  • The distribution is shown through the bar plot

Heat Map

# For Pitcher 1
p1 <- PitchData2 %>%
  group_by('Pitch' = PitchType) %>% 
  filter(PitcherId == 1) %>% 
  ggplot(aes(x = TrajectoryLocationX, y = TrajectoryLocationZ)) +
    stat_density_2d(aes(fill = after_stat(density)), geom = 'raster', contour = F, na.rm = TRUE) +
    scale_fill_gradientn(colors = c("blue", "white", "red")) +
    annotate("rect", xmin = -2.5, xmax = 2.5,
             ymin = 0,ymax = 8,
             fill= NA,color= "black", 
             alpha = .1) +
    ylim(-1, 9) + xlim(-4, 4) + 
    theme_bw() +
    theme_classic() +
    xlab("Horizontal Pitch Location") +
    ylab("Vertical Pitch Location") +
    ggtitle("", subtitle = "Pitcher's Perspective") +
    guides(fill = 'none')

# Pitcher 7
p2 <- PitchData2 %>%
  group_by('Pitch' = PitchType) %>% 
  filter(PitcherId == 7) %>% 
  ggplot(aes(x = TrajectoryLocationX, y = TrajectoryLocationZ)) +
    stat_density_2d(aes(fill = after_stat(density)), geom = 'raster', contour = F, na.rm = TRUE) +
    scale_fill_gradientn(colors = c("blue", "white", "red")) +
    annotate("rect", xmin = -1.5, xmax = 2.5,
             ymin = 0,ymax = 5,
             fill= NA,color= "black", 
             alpha = .1) +
    ylim(-1.5, 6.5) + xlim(-3, 4) + 
    theme_bw() +
    theme_classic() +
    xlab("Horizontal Pitch Location") +
    ylab("Vertical Pitch Location") +
    ggtitle("", subtitle = "Pitcher's Perspective") +
    guides(fill = 'none')

# put plots next to each other
plot_grid(p1, p2, labels = c('Pitcher 1', 'Pitcher 7'), label_size = 12)

  • Pitch location difference can be seen in these heat maps. Pitcher 1 enjoys staying a little lower in the zone with a hint of pitches up in the zone, most of the pitches up being fastballs
  • Pitcher 7 has more middle-middle stuff and fills up the plate more with a diverse arsenal of pitches. It seems that Pitcher 7 is a little more spread out in terms of which zones their pitches go to compared to Pitcher 1

Pitch Velocity Differences

PitchData2 %>% 
  filter(PitchType != 'CH' & PitchType != 'SI') %>% 
    ggplot(aes(PitcherId, ReleaseSpeed)) +
    geom_jitter() +
    facet_wrap(~ PitchType)

  • Through these plots, all velocities can be mapped for every pitch by both pitchers. Generally it seems that there is more fire power for Pitcher 1 than 7, especially in the FB and FC category
  • This probably means Pitcher 1 relies heavily on the fastball to set up their breaking pitches whereas Pitcher 7 relies more on sequencing to trick with their batters

Velo Through Innings

# data wrangling
Vel_Inn1 <- PitchData2 %>% 
  filter(PitcherId == "1" & PitchType == "FF") %>% 
  group_by(Inning, PitchType) %>% 
  summarize(
    avg_vel = mean(ReleaseSpeed, na.rm = TRUE)
  ) %>% 
  mutate(PitcherId = "1")


Vel_Inn7 <- PitchData2 %>% 
  filter(PitcherId == "7" & PitchType == "FF") %>% 
  group_by(Inning, PitchType) %>% 
  summarize(
    avg_vel = mean(ReleaseSpeed, na.rm = TRUE)
  ) %>% 
  mutate(PitcherId = "7")

# plot
ggplot() +
  geom_line(Vel_Inn1, mapping = aes(x = Inning, y = avg_vel, color = PitcherId), linewidth = 2) +
  geom_line(Vel_Inn7, mapping = aes(x = Inning, y = avg_vel, color = PitcherId), linewidth = 2) +
  xlab("Inning") +
  ylab("Average Velocity") +
  scale_x_continuous(limits = c(1, 7),
                     breaks = seq(1, 7, 1)) +
  labs(title = "Fastball Avg Velocity By Inning") +
  theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5), axis.text = element_text(size = 12)) +
  theme(legend.position = "bottom", legend.text = element_text(size = 12), axis.title = element_text(size = 14)) +
  theme_bw()

  • It can be observed through a line graph that Pitcher 1’s fastball over the course of this one game has been faster than Pitcher 7
  • One interesting note is that around inning 4 and 6 for Pitcher 1, there is a dip in the velocity. This may be due to the Pitcher facing the batting order second and third time through. It most likely means their off-speed pitches were being mixed more in the later innings compared to the first three innings