In the last few years there has been an increase in injuries at major league baseball games due to foul balls hitting spectators and causing serious injuries including death. Many children and senior adults who cannot protect themselves have suffered near fatal injuries. This is not surprising with pitchers throwing faster, hitters getting stronger and baseballs being wound tighter that in earlier years. The safety net that had previously been erected behind home plate has been extended to the dugouts at most stadiums. Signs warning about the dangers of high velocity foul balls have been placed in certain sections. Is that enough to protect fans? Should the safety nets be extend all the way to the foul poles? Can the MLB do more to protect fans?
The dangers of sitting in areas susceptible to hard hit foul balls is explored in this assignment. The foul ball numbers have been collected from the heaviest foul ball days at the ten highest average hit fall ball locations in the MLB in the 2019 season prior to June 5th of that year.
# Load needed libraries
library(devtools)
library(tidyverse)
library(RCurl)
library(plyr)
library(knitr)
# Source the file from the 538 Website github repository and set NA strings to 0
filename <- getURL("https://raw.githubusercontent.com/fivethirtyeight/data/master/foul-balls/foul-balls.csv")
foul_ball_danger <- read.csv(text = filename,na.strings = "")
foul_ball_danger[is.na(foul_ball_danger)] <- 0
head(foul_ball_danger, 10)
## Ă¯..matchup game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground 0.0
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 0.0
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 0.0
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground 0.0
## 7 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 74.8
## 8 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground 0.0
## 9 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 70.7
## 10 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 73.4
## predicted_zone camera_zone used_zone
## 1 1 1 1
## 2 4 0 4
## 3 4 0 4
## 4 1 1 1
## 5 2 0 2
## 6 1 1 1
## 7 2 0 2
## 8 1 1 1
## 9 4 0 4
## 10 4 0 4
# Rename the columns and change the matchup values to reflect the Team home (Stadium)
names(foul_ball_danger) <- c("Stadium","Date","FoulType","ExitVelocity","PredictedZone","CameraZone","FoulZone")
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Seattle Mariners VS Minnesota Twins"] <- "Mariners"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Baltimore Orioles VS Minnesota Twins"] <- "Orioles"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Pittsburgh Pirates VS Milwaukee Brewers"] <- "Pirates"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Oakland A's vs Houston Astros"] <- "A's"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Texas Rangers vs Toronto Blue Jays"] <- "Rangers"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Los Angeles Dodgers vs Arizona Diamondsbacks"] <- "Dodgers"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Milwaukee Brewers vs New York Mets"] <- "Brewers"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Philadelphia Phillies vs Miami Marlins"] <- "Phillies"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "Atlanta Braves vs New York Mets"] <- "Braves"
foul_ball_danger$Stadium[foul_ball_danger$Stadium == "New York Yankees vs Baltimore Orioles"] <- "Yankees"
head(foul_ball_danger,10)
## Stadium Date FoulType ExitVelocity PredictedZone CameraZone FoulZone
## 1 Mariners 2019-05-18 Ground 0.0 1 1 1
## 2 Mariners 2019-05-18 Fly 0.0 4 0 4
## 3 Mariners 2019-05-18 Fly 56.9 4 0 4
## 4 Mariners 2019-05-18 Fly 78.8 1 1 1
## 5 Mariners 2019-05-18 Fly 0.0 2 0 2
## 6 Mariners 2019-05-18 Ground 0.0 1 1 1
## 7 Mariners 2019-05-18 Fly 74.8 2 0 2
## 8 Mariners 2019-05-18 Ground 0.0 1 1 1
## 9 Mariners 2019-05-18 Fly 70.7 4 0 4
## 10 Mariners 2019-05-18 Fly 73.4 4 0 4
class(foul_ball_danger)
## [1] "data.frame"
ggplot(data = foul_ball_danger) +
geom_bar(mapping = aes(x = Stadium, fill = Stadium)) +
labs(title = "Number of Foul Balls by Top Ten Foul Average Location", x = "Team Stadium", y = "Number of Foul Balls" )
I have watched and attended many MLB games in my life but looking closely at the number of foul balls in one game at these ten stadiums, I am surprised at the number of foul balls that can be hit in one game. I am even more surprised that more fans are not injured. One hundred and twenty foul balls at Camden Yards (Orioles) and over one hundred and five at Oakland Coliseum (A’s) and PNC Park (Pirates).
ggplot(data = foul_ball_danger) +
geom_bar(mapping = aes(x = FoulType, fill = FoulType)) +
labs(title = "Number of Foul Balls by Foul Type", x = "Foul Type", y = "Number of Foul Balls" )
The image below shows foul ball zones supported by the data and the black arc shows current foul ball safety barrier erected by all MLB stadiums. All zones and barrier in the image are approximations.
fb_zones_img <- "https://raw.githubusercontent.com/audiorunner13/Masters-Coursework/main/DATA607%20Spring%202021/Week1/Assignment/images/fb_zones.jpg"
# attr(fb_zones_img, "info")
include_graphics(fb_zones_img)
The scatterplots below show the number of foul balls hit into each zone by foul type. A few things to note.
ggplot(data = foul_ball_danger) +
geom_bar(mapping = aes(x = FoulZone)) +
facet_wrap(~ FoulType) +
labs(title = "Number of Foul Balls by Zone by Foul Type", x = "Foul Zone", y = "Number of Foul Balls" )
avg_ev <- round(mean(foul_ball_danger$ExitVelocity),2)
sprintf("The mean exit velocity of the 906 foul balls is: %s mph", avg_ev)
## [1] "The mean exit velocity of the 906 foul balls is: 48.91 mph"
Because the exit velocity of some foul balls was not captured the average exit velocity of the entire dataset is a low 48.91 mph. In order to get a more accurate average exit velocity, the records with no exit velocity must be filtered.
# Subset dataset to only use data with exit velocities > 0. Not all foul balls' exit velocity were recorded.
foul_ball_danger_v2 <- foul_ball_danger[foul_ball_danger$ExitVelocity > 0.0,names(foul_ball_danger) <- c("Stadium","Date","FoulType","ExitVelocity","FoulZone")]
head(foul_ball_danger_v2,10)
## Stadium Date FoulType ExitVelocity FoulZone
## 3 Mariners 2019-05-18 Fly 56.9 4
## 4 Mariners 2019-05-18 Fly 78.8 1
## 7 Mariners 2019-05-18 Fly 74.8 2
## 9 Mariners 2019-05-18 Fly 70.7 4
## 10 Mariners 2019-05-18 Fly 73.4 4
## 11 Mariners 2019-05-18 Fly 76.0 5
## 13 Mariners 2019-05-18 Fly 72.1 2
## 15 Mariners 2019-05-18 Line 95.9 5
## 16 Mariners 2019-05-18 Fly 74.4 2
## 17 Mariners 2019-05-18 Batter hits self 69.9 1
Although we have filtered 326 foul balls, a more accurate average exit velocity can be calculated.
avg_ev <- round(mean(foul_ball_danger_v2$ExitVelocity),2)
sprintf("The mean exit velocity of the 580 foul balls is: %s mph", avg_ev)
## [1] "The mean exit velocity of the 580 foul balls is: 76.4 mph"
# Display the exit velocity by the foul zone by the type of foul ball
ggplot(data = foul_ball_danger_v2) +
geom_point(mapping = aes(x = ExitVelocity, y = FoulZone)) +
facet_wrap(~ FoulType, nrow = 2) +
labs(title = "Foul Ball Exit Velocity by Zone by Foul Ball Type", x = "Exit Velocity (mph)", y = "Foul Zone" )
# Subset dataset to only use data with foul types ground, line, fly and hit into zones 4 through 7.
foul_ball_danger_grln45 <- subset(foul_ball_danger_v2,(FoulType == "Ground" | FoulType == "Line" | FoulType == "Fly") & (FoulZone == 4 | FoulZone == 5 | FoulZone == 6 | FoulZone == 7))
head(foul_ball_danger_grln45,10)
## Stadium Date FoulType ExitVelocity FoulZone
## 3 Mariners 2019-05-18 Fly 56.9 4
## 9 Mariners 2019-05-18 Fly 70.7 4
## 10 Mariners 2019-05-18 Fly 73.4 4
## 11 Mariners 2019-05-18 Fly 76.0 5
## 15 Mariners 2019-05-18 Line 95.9 5
## 20 Mariners 2019-05-18 Fly 72.0 4
## 22 Mariners 2019-05-18 Line 84.9 5
## 26 Mariners 2019-05-18 Fly 104.6 6
## 27 Mariners 2019-05-18 Ground 96.2 4
## 31 Mariners 2019-05-18 Line 76.1 5
# Calculate the mean exit velocity for ground, line and fly foul balls into zones 4 through 7.
avg_exv_z47 <- round(mean(foul_ball_danger_grln45$ExitVelocity),2)
sprintf("The mean exit velocity of ground, line and fly foul balls to Zones 4 through 7 is: %s", avg_exv_z47, "mph")
## [1] "The mean exit velocity of ground, line and fly foul balls to Zones 4 through 7 is: 79.77"
I have often been asked why I enjoy attending baseball games. My answer has always been for love of the game. I also enjoy the games because it’s a game where one can socialize while watching. It is apparent that although that may be true one must be attentive to the game especially if you happen to be sitting in sections where many foul balls are hit with young and senior fans. The data shows that most foul balls are hit into Zones 4 and 5 with an average exit velocity of 80 mph. It is my belief and recommendation that the MLB can further protect fans by doing the following.