Introduction: Welcome to my Code-Through. This explainer assignment will focus on one of my favorite packages to reference, ‘baseballr’, as well as how it can be used in analyzing a pitcher’s strengths and weaknesses. There is a lot of nuance when it comes to batting and how to generate power, but I adore looking at pitches to find what makes a good pitch and more importantly, what makes a good pitcher. Please begin by loading these packages. You may need to install these packages if you have not before.

Note Due to the function to scrape pitchers’ data being broken, we have to manually download the csv containing the data we will use for this assignment.

#Load Packages and csv data

library(baseballr) #loads MLB statcast data
library(lubridate) #Used to make times readable in baseballr's statcast data.
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr) #data manipulation
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2) #data visualization

url_2024 <- "https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfGT=R%7CPO%7CS%7C&player_type=pitcher&pitchers_lookup%5B%5D=601713&game_date_gt=2024-03-01&game_date_lt=2024-11-30&type=details"
url_2025 <- "https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfGT=R%7CPO%7CS%7C&player_type=pitcher&pitchers_lookup%5B%5D=601713&game_date_gt=2025-03-01&game_date_lt=2025-11-30&type=details"

What is this Code-Through About? This Code-Through is about analyzing the data pitchers provide each inning on the mound by looking at pitch location, strikeout percentage, and pitch usage in order to determine how pitchers change from season to season to improve their dominance on the mound.

Background As a fan of the San Diego Padres, Nick Pivetta has been a phenomenal player. What is most dymanic about his 2025 season is how he changed from being a 4-5 option (for reference, each team usually has 5 starting pitchers and your 1st pitcher is usually your best, or ace) in the 2024 season to being the ace of the San Diego Padres in 2025. This is reflected by his ERA, or earned run average, changing from 4.15 in 2024 to 2.88 in 2025.

Data Steps We will begin by loading Nick Pivetta’s data for the 2024 season. In the next block, use the same method and naming scheme to load his 2025 data.

#Example provided 
#Nick Pivetta 2024 pitch data
piv_2024 <- read.csv(url_2024) %>%
  mutate(season = 2024)

Hint Do not forget to use dplyr to only include pitches from the 2025 season in this next block.

#Nick Pivetta 2025 pitch data
piv_2025 <- read.csv(url_2025) %>%
  mutate(season = 2025)

Now combine the data in order for us to use easily. I added some cleaning and filtering so the data looks more organized.

#Combine the data here
pivetta <- bind_rows(piv_2024, piv_2025)

#Cleaning I added to make the df more recognizable
pivetta <- pivetta %>%
  filter(
    !is.na(pitch_type),
    !is.na(plate_x),
    !is.na(plate_z)
  )

Step 1 Use ggplot to create a heatmap of each of Nick Pivetta’s pitches between the 2024 and 2025 seasons. Note that the x and y values should include the location of his pitches. Set stat_bind2 = 30 to see where the pitches landed relative to a strike zone (will be added later). This is not supposed to be a strike zone specifically, rather a highlight of the density of Pivetta’s pitches. Note any observations you may have about the number and frequency of his pitches.

Hint ‘plate_x’ and ‘plate_z’ are the variables used to describe the horizontal and vertical location of his pitches respectively.

ggplot(pivetta, aes(plate_x, plate_z)) +
stat_bin2d(bins = 30) +
scale_fill_viridis_c() +
coord_fixed() +
facet_grid(season ~ pitch_type) +
labs(
title = "Nick Pivetta Heatmaps by Pitch Type and Season",
x = "Horizontal Location (plate_x)",
y = "Vertical Location (plate_z)"
) +
theme_minimal()

Step 2 To visualize a strike zone, I have created the data frame containing the dimensions of a strike zone. Due to each batter having a different height, the strike zone we produce is not the exact zone. However, it is good enough to have a visual.

Hints Use ‘geom_rect()’ to add the strike zone to the plot. Make sure you also set inherit.aes = FALSE to prevent any default settings from leaking into the plot (remember, we are trying to only include the rectangle).

#Provided data frame for strikezone
sz <- data.frame(
  xmin = -0.83,
  xmax =  0.83,
  ymin = 1.5,
  ymax = 3.5
)

#Create plot here
ggplot(pivetta, aes(plate_x, plate_z)) +
  stat_bin2d(bins = 30) +
  geom_rect(
    data = sz,
    aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
    inherit.aes = FALSE,   # <-- Important! Denies default settings for plot
    fill = NA,
    color = "black",
    linewidth = 0.8
  ) +
  scale_fill_viridis_c() +
  coord_fixed() +
  facet_grid(season ~ pitch_type) +
  labs(
    title = "Nick Pivetta Pitch Location Heatmaps (2024–2025)"
  ) +
  theme_minimal()

Step 3 In Step 2, you have likely noticed that Pivetta’s favorite pitch is his four-seam fastball. However, we want to confirm this by counting his pitch usage as a percentage as well as his strikeout percentage with each type of pitch. Since he threw more pitches in 2025 than 2024, this will help standardize his pitch usage to easily compare the two seasons.

Use group_by() to separate his pitches by season and pitch type. Then count the total number of pitches and strikeouts in order to create percentages of Pivetta’s pitch type and strikeouts.

Use geom_bar(stat = “identity”, position = “dodge”) to create a bar graph containing side-by-side bars for his pitch usage each season. Similarly, create a feature to note his strikeout percentage with each pitch (in this case I recommend a dot with a corresponding value).

Note your observations and compare your bar chart to the heatmaps created in Step 2. Does this graph corroborate with your observations?

pitch_summary <- pivetta %>%
  group_by(season, pitch_type) %>%
  summarise(
    n_pitches = n(),
    strikeouts = sum(events == "strikeout", na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(season) %>%
  mutate(
    pct_usage = n_pitches / sum(n_pitches),
    k_pct = strikeouts / n_pitches
  )

ggplot(pitch_summary, aes(x = pitch_type, y = pct_usage, fill = factor(season))) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = paste0(round(pct_usage*100), "%")),
            position = position_dodge(width = 0.9), vjust = -0.5) +
  geom_point(aes(y = k_pct), color = "red", size = 3,
             position = position_dodge(width = 0.9)) +
  geom_text(aes(y = k_pct, label = paste0(round(k_pct*100), "%")),
            color = "red",
            position = position_dodge(width = 0.9),
            vjust = -1) +
  scale_fill_manual(
    values = c("2024" = "saddlebrown", "2025" = "gold3")
  ) +
  labs(
    title = "Nick Pivetta Pitch Usage & Strikeout % by Season",
    x = "Pitch Type",
    y = "Proportion of Pitches",
    fill = "Season"
  ) +
  theme_minimal(base_size = 14)

Step 4 Lastly, we want a deeper insight into Pivetta’s fastball. As his most used pitch, it has to contribute to his leap in improvement between the 2024 and 2025 seasons.

Use filter(pitch_type == “FF”) to narrow your created data frame to only include fastballs. Save this to a new data frame ‘fastball_data’.

Use ggplot to create two heatmaps for his fastball between the 2024 and 2025 seasons. Use ‘stat_density_2d()’ with ‘geom = “raster”’ to ensure that the density of each map is standardized, while facet_wrap(~season) to include the heatmaps side-by-side.

Lastly, use scale_fill_viridis_c() with a palette of your choice to fill in the heatmap and add any details you would like.

How did Pivetta’s fastball change from 2024 to 2025? Do you think this change is significant towards his performance?

fastball_data <- pivetta %>%
  filter(pitch_type == "FF")  #FF is the code for a four-seam fastball, or a regular fastball

ggplot(fastball_data, aes(x = plate_x, y = plate_z)) +
  stat_density_2d(aes(fill = after_stat(density)), geom = "raster", contour = FALSE) +
  scale_fill_viridis_c(option = "magma") + # I chose a palette I liked, rather than a standard one
  coord_fixed() +
  facet_wrap(~season) +
  labs(
    title = "Nick Pivetta Fastball Strikezone Heatmaps (Standardized)",
    x = "Horizontal Location ",
    y = "Vertical Location ",
    fill = "Density"
  ) +
  theme_minimal(base_size = 14)