The goal is to create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.
We will call the tidyverse package and load in the foul-balls.csv data.
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
raw_data <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/foul-balls/foul-balls.csv",na.strings=c("NA", "NULL"))
head(raw_data)
## matchup game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground NA
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly NA
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly NA
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground NA
## predicted_zone camera_zone used_zone
## 1 1 1 1
## 2 4 NA 4
## 3 4 NA 4
## 4 1 1 1
## 5 2 NA 2
## 6 1 1 1
We will use the tidyverse functions to analyze the exit velocities of foul balls.
First, we limit the data to those that contain an exit velocity.Then, using dplyr functions, we’ll group the data by the type of hit and summarize the data.
my_data <- na.omit(raw_data,cols="exit_velocity")
analysis_df <- group_by(my_data, type_of_hit)
analysis_df<- summarize(analysis_df, mean=round(mean(exit_velocity),1), median=median(exit_velocity), max = max(exit_velocity), min=min(exit_velocity))
analysis_df <- as.data.frame(analysis_df)
analysis_df
## type_of_hit mean median max min
## 1 Batter hits self 69.4 68.3 82.0 60.4
## 2 Fly 81.6 79.1 108.5 60.3
## 3 Ground 75.5 74.8 107.0 25.4
## 4 Line 82.1 82.6 110.6 39.9
## 5 Pop Up 77.9 77.5 90.7 69.7
With a dataframe that contains summary foul ball exit velocity data, we can use the tidyverse package ggplot2 to visualize the summary.
ggplot(analysis_df, aes(x=type_of_hit,y = mean)) +
geom_bar(width = .75,stat = "identity", position="dodge") +
ggtitle("Average Exit Velocity of Foul Balls by Type of Hit") +
labs(x="Type of Hit",y="Average Exit Velocity (mph)") +
theme(plot.title = element_text(hjust=0.5)) +
scale_y_continuous(breaks = seq(0,100,by = 5))
ggplot(analysis_df, aes(x=type_of_hit,y = max)) +
geom_bar(width = .75,stat = "identity", position="dodge") +
ggtitle("Max Exit Velocity of Foul Balls by Type of Hit") +
labs(x="Type of Hit",y="Max Exit Velocity (mph)") +
theme(plot.title = element_text(hjust=0.5)) +
scale_y_continuous(breaks = seq(0,115,by = 5))
We can observe that line hits have the fastest average velocity as well as the fastest max velocity for our dataset.