1. The Data

1.1 Introduction

I’m currently a Baseball Analytics Intern with Archbishop Moeller High School. This is the data I have collected throughout my time working there for the 2025 Varsity Baseball Season.

1.2 Description

Each row in the dataset represents a single pitch during a Moeller Varsity Baseball game in the 2025 season. Below is a detailed explanation of each variable:

  • PitchNo: Sequence number of the pitch in the dataset
  • Date: Date the pitch was thrown
  • Time: Time the pitch occurred
  • PAofInning: Plate appearance number within the inning
  • Pitcher: Name or identifier of the pitcher
  • PitcherHand: Pitcher’s throwing hand (Left or Right)
  • PitcherTeam: Team name of the pitcher
  • Batter: Name or identifier of the batter
  • Batter Hand: Batter’s hitting hand (Left or Right)
  • BatterTeam: Team name of the batter
  • Inning: Current inning during the pitch
  • Top/Bottom: Indicates whether the team is hitting in the top or bottom half of the inning
  • Outs: Number of outs at the time of the pitch
  • Balls: Number of balls in the count
  • Strikes: Number of strikes in the count
  • Count: Count as a string format (e.g., “2-1”)
  • PitchType: Type of pitch thrown (e.g., Fastball, Slider)
  • PitchResult: Immediate outcome of the pitch (e.g., Ball, Strike, Foul)
  • AtBatResult: Final result of the at-bat (e.g., Single, Strikeout, Walk)
  • PitchVelo: Velocity of the pitch in miles per hour (MPH)
  • Location: Categorical or numeric value representing pitch location
  • 1–9: Heart of the strike zone (always a strike).
  • 11–19: Shadow of the strike zone (could be a strike).
  • 21–29: Near strike zone (ball-like).
  • 31–39: Wild pitch, far outside the strike zone.
  • Note: Also segmented by left (1–7), middle (2–8), and right (3–9).
  • AttackZone: Heart, Shadow, Chase (Near strike zone), Waste (Wild pitch)

2. Data Preparation

2.1 Objective

Load the dataset and relevant packages for analysis.

2.2 R Code

rm(list = ls())
library(tidyverse)
library(viridis)

Varsity <- read_csv("Moeller_2025_Final_Season - Moeller_2025_Final_Season.csv")

2.3 Interpretation

Each row represents a single pitch during Moeller’s 2025 varsity baseball season.

3. Pitch Outcomes by Count

3.1 Objective

Do pitch outcomes (PitchResult) vary based on the pitch count (Balls and Strikes)? We will use a bar graph to show the proportion of pitch results and how they vary based on the current pitch count.

3.2 R Code

Varsity_pitch_count <- Varsity %>%
  mutate(CountType = paste(Balls, "Balls:", Strikes, "Strikes")) %>%
  filter(!is.na(Balls) & !is.na(Strikes))

ggplot(Varsity_pitch_count, aes(x = CountType, fill = PitchResult)) +
  geom_bar(position = "fill") +  
  labs(
    title = "Pitch Outcomes by Count",
    x = "Count",
    y = "Proportion of Pitch Results"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_viridis_d()

3.3 Interpretation

This plot shows how the pitch result varies with the count. The first pitch of the at bat is about 50/50 of being a strike or a ball. In every other count the pitch result is more likely to result as a strike. On 3-0 counts hitters rarely swing. It is a typical “unwritten rule” in baseball to not swing on a 3-0 count.

4. Pitch Type vs. Pitch Outcome

4.1 Objective

Analyze how different pitch types influence the outcomes of pitches. This will be done by creating a heat map comparing pitch type to pitch result.

4.2 R Code

Varsity %>%
  filter(complete.cases(.)) %>%
  group_by(PitchType, PitchResult) %>%
  summarise(Count = n(), .groups = "drop") %>%
  group_by(PitchType) %>%
  mutate(Proportion = Count / sum(Count)) %>%
  ggplot(aes(x = PitchType, y = PitchResult, fill = Proportion)) +
  geom_tile() +
  scale_fill_viridis_c(trans = "log", labels = scales::percent_format()) +
  labs(
    title = "Pitch Type vs. Pitch Outcome",
    x = "Pitch Type",
    y = "Pitch Outcome",
    fill = "Proportion"
  ) +
  theme_minimal()

4.3 Interpretation

This heat map highlights the effectiveness and tendencies of different pitch types. The most effective pitch is a Cut Fastball. This is because the only pitcher who throws pitch is committed to LSU to play baseball. The least common pitch for hitters to not swing at is a Fastball. This is because a Fastball is the easiest pitch to hit.

5. Fastball Attack Zone Density

5.1 Objective

Determine the most common attack zones for Fastballs thrown by Moeller pitchers. Create a density plot that compares the Attack Zone with the frequency of how many pitches are thrown in that zone. I choose to investigate a Fastball to see what the most common location is for a Fastball.

5.2 R Code

moeller_fastballs <- Varsity %>%
  filter(PitcherTeam == "Moeller") %>%
  filter(PitchType == "Fast Ball") %>%
  filter(!is.na(AttackZone))

ggplot(moeller_fastballs, aes(x = AttackZone, fill = AttackZone)) +
  geom_density(alpha = 0.5) +
  scale_fill_viridis_d() +
  labs(
    title = "Density Plot of Attack Zone for Fastballs",
    x = "Attack Zone",
    y = "Density"
  ) +
  theme_minimal()

5.3 Interpretation

This density plot shows the frequency of attack zones for Moeller Fastballs. The most targeted area is the Heart which means the most occurring common pitch in the data is a Fastball down the middle of the plate.