Overview / Introduction

The article, “We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land,” was written by Annette Choi and explores the stats behind MLB foul balls with regards to their speed and trajectory. Choi used data from Baseball Savant and pulled the 10 stadiums with the highest foul counts from the start of the 2019 season up until June 5th. After zoning out specific segments of the stadium, as illustrated below, the data showed that the majority of high speed foul balls landed in zones 4 and 5, which are zones where the protective netting usually ends. Choi then concludes by mentioning how the MLB has been slowly adding more netting around the stadium with players actively advocating for the safety of the fans.

knitr::include_graphics('https://fivethirtyeight.com/wp-content/uploads/2019/07/choi-foul-0625-4-2.png?resize=1536,1398')

Image 1: Choi MLB Stadium Zoning Diagram (Choi, 2019)

Load Packages and Data

Let’s start this short analysis by first loading the tidyverse and dplyr packages and importing the ‘foul-balls’ data set into our environment.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
foul_balls <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/foul-balls/foul-balls.csv')
glimpse(foul_balls)
## Rows: 906
## Columns: 7
## $ matchup        <chr> "Seattle Mariners VS Minnesota Twins", "Seattle Mariner…
## $ game_date      <chr> "2019-05-18", "2019-05-18", "2019-05-18", "2019-05-18",…
## $ type_of_hit    <chr> "Ground", "Fly", "Fly", "Fly", "Fly", "Ground", "Fly", …
## $ exit_velocity  <dbl> NA, NA, 56.9, 78.8, NA, NA, 74.8, NA, 70.7, 73.4, 76.0,…
## $ predicted_zone <int> 1, 4, 4, 1, 2, 1, 2, 1, 4, 4, 5, 1, 2, 4, 5, 2, 1, 2, 2…
## $ camera_zone    <int> 1, NA, NA, 1, NA, 1, NA, 1, NA, NA, 5, 1, NA, NA, 5, NA…
## $ used_zone      <int> 1, 4, 4, 1, 2, 1, 2, 1, 4, 4, 5, 1, 2, 4, 5, 2, 1, 2, 2…

Prepare Subset

Upon glimpsing the original data set, we can see there are many rows missing foul ball exit velocities. We also see that the ‘used_zone’ column was the one used for the analysis. Therefore, let us create a subset removing the rows with no exit velocity and removing columns ‘predicted_zone’ and ‘camera_zone’ which were only needed to determine the ‘used_zone’.

foul_balls_trimmed <- subset(foul_balls,exit_velocity != '', select = c(1:4,7))
glimpse(foul_balls_trimmed)
## Rows: 580
## Columns: 5
## $ matchup       <chr> "Seattle Mariners VS Minnesota Twins", "Seattle Mariners…
## $ game_date     <chr> "2019-05-18", "2019-05-18", "2019-05-18", "2019-05-18", …
## $ type_of_hit   <chr> "Fly", "Fly", "Fly", "Fly", "Fly", "Fly", "Fly", "Line",…
## $ exit_velocity <dbl> 56.9, 78.8, 74.8, 70.7, 73.4, 76.0, 72.1, 95.9, 74.4, 69…
## $ used_zone     <int> 4, 1, 2, 4, 4, 5, 2, 5, 2, 1, 2, 4, 5, 3, 6, 4, 1, 5, 4,…

Rename and Add Columns

Let us also rename the ‘used_zone’ column to ‘landing_zone’ to better understand what it is referring to and add a column categorizing exit velocities above or below 90 mph named ‘danger’ and a column indicating the home team named ‘home_team’.

foul_balls_trimmed <- foul_balls_trimmed %>%
  rename(landing_zone = used_zone) %>%
  mutate(danger = ifelse(exit_velocity >= 90, '>= 90 mph','< 90 mph')) %>%
  mutate(home_team = sub(" vs.*","",matchup,ignore.case = TRUE))
glimpse(foul_balls_trimmed)
## Rows: 580
## Columns: 7
## $ matchup       <chr> "Seattle Mariners VS Minnesota Twins", "Seattle Mariners…
## $ game_date     <chr> "2019-05-18", "2019-05-18", "2019-05-18", "2019-05-18", …
## $ type_of_hit   <chr> "Fly", "Fly", "Fly", "Fly", "Fly", "Fly", "Fly", "Line",…
## $ exit_velocity <dbl> 56.9, 78.8, 74.8, 70.7, 73.4, 76.0, 72.1, 95.9, 74.4, 69…
## $ landing_zone  <int> 4, 1, 2, 4, 4, 5, 2, 5, 2, 1, 2, 4, 5, 3, 6, 4, 1, 5, 4,…
## $ danger        <chr> "< 90 mph", "< 90 mph", "< 90 mph", "< 90 mph", "< 90 mp…
## $ home_team     <chr> "Seattle Mariners", "Seattle Mariners", "Seattle Mariner…

Most Dangerous Foul Balls

The original authors of the data set charted the landing zones for the foul balls with the highest, or most dangerous, speeds with a bar chart similar to the one below. The difference being that we removed the observations that lacked exit velocity beforehand.

ggplot(data = foul_balls_trimmed, aes(y = landing_zone, fill = danger)) +
  geom_bar()

Stadiums Where Most Dangerous Foul Balls Occurred

Another interesting stat one could look into would be the stadiums with the most foul balls over 90 mph out of these 10. Let’s go ahead and summarize this.

foul_balls_trimmed %>%
  filter(danger == '>= 90 mph') %>%
  group_by(home_team) %>%
  summarize(
    mean = mean(exit_velocity),
    n = n()
  ) %>%
  arrange(-n)
## # A tibble: 10 × 3
##    home_team              mean     n
##    <chr>                 <dbl> <int>
##  1 Baltimore Orioles     101.     14
##  2 Seattle Mariners       98.3    11
##  3 New York Yankees       96.3     8
##  4 Texas Rangers          98.0     8
##  5 Los Angeles Dodgers    96.9     7
##  6 Pittsburgh Pirates    101.      7
##  7 Milwaukee Brewers      98.9     6
##  8 Oakland A's            95.8     6
##  9 Philadelphia Phillies  96.4     5
## 10 Atlanta Braves         94       2

Conclusions / Findings and Recommendations

The summary above displays the stadiums where the most amount of these dangerous foul balls occurred. Since each stadium is not designed exactly the same, maybe their structure can be further analyzed to see if they play a role on how many foul balls reach the stands. The findings made by Annette Choi were interesting and definitely revealing of the dangers posed by foul balls. Being that this article was written in 2019, it would be good to follow up and see if more stadiums have implemented the necessary safety measures to address this issue and if the amount of injuries produced by these foul balls has decreased in the long run.

Works Cited

Choi, A. (2019, July 15). We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land. FiveThirtyEight. https://fivethirtyeight.com/features/we-watched-906-foul-balls-to-find-out-where-the-most-dangerous-ones-land/