I’m currently a Baseball Analytics Intern with Archbishop Moeller High School. This is the data I have collected throughout my time working there for the 2025 Varsity Baseball Season.
Each row in the dataset represents a single pitch during a Moeller Varsity Baseball game in the 2025 season. Below is a detailed explanation of each variable:
Load the dataset and relevant packages for analysis.
rm(list = ls())
library(tidyverse)
library(viridis)
Varsity <- read_csv("Moeller_2025_Final_Season - Moeller_2025_Final_Season.csv")
Each row represents a single pitch during Moeller’s 2025 varsity baseball season.
Do pitch outcomes (PitchResult) vary based on the pitch count (Balls and Strikes)? We will use a bar graph to show the proportion of pitch results and how they vary based on the current pitch count.
Varsity_pitch_count <- Varsity %>%
mutate(CountType = paste(Balls, "Balls:", Strikes, "Strikes")) %>%
filter(!is.na(Balls) & !is.na(Strikes))
ggplot(Varsity_pitch_count, aes(x = CountType, fill = PitchResult)) +
geom_bar(position = "fill") +
labs(
title = "Pitch Outcomes by Count",
x = "Count",
y = "Proportion of Pitch Results"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_viridis_d()
This plot shows how the pitch result varies with the count. The first pitch of the at bat is about 50/50 of being a strike or a ball. In every other count the pitch result is more likely to result as a strike. On 3-0 counts hitters rarely swing. It is a typical “unwritten rule” in baseball to not swing on a 3-0 count.
Analyze how different pitch types influence the outcomes of pitches. This will be done by creating a heat map comparing pitch type to pitch result.
Varsity %>%
filter(complete.cases(.)) %>%
group_by(PitchType, PitchResult) %>%
summarise(Count = n(), .groups = "drop") %>%
group_by(PitchType) %>%
mutate(Proportion = Count / sum(Count)) %>%
ggplot(aes(x = PitchType, y = PitchResult, fill = Proportion)) +
geom_tile() +
scale_fill_viridis_c(trans = "log", labels = scales::percent_format()) +
labs(
title = "Pitch Type vs. Pitch Outcome",
x = "Pitch Type",
y = "Pitch Outcome",
fill = "Proportion"
) +
theme_minimal()
This heat map highlights the effectiveness and tendencies of different pitch types. The most effective pitch is a Cut Fastball. This is because the only pitcher who throws pitch is committed to LSU to play baseball. The least common pitch for hitters to not swing at is a Fastball. This is because a Fastball is the easiest pitch to hit.
Determine the most common attack zones for Fastballs thrown by Moeller pitchers. Create a density plot that compares the Attack Zone with the frequency of how many pitches are thrown in that zone. I choose to investigate a Fastball to see what the most common location is for a Fastball.
moeller_fastballs <- Varsity %>%
filter(PitcherTeam == "Moeller") %>%
filter(PitchType == "Fast Ball") %>%
filter(!is.na(AttackZone))
ggplot(moeller_fastballs, aes(x = AttackZone, fill = AttackZone)) +
geom_density(alpha = 0.5) +
scale_fill_viridis_d() +
labs(
title = "Density Plot of Attack Zone for Fastballs",
x = "Attack Zone",
y = "Density"
) +
theme_minimal()
This density plot shows the frequency of attack zones for Moeller Fastballs. The most targeted area is the Heart which means the most occurring common pitch in the data is a Fastball down the middle of the plate.