Introduction:

The goal is to create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.

We will call the tidyverse package and load in the foul-balls.csv data.

library("tidyverse")
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
raw_data <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/foul-balls/foul-balls.csv",na.strings=c("NA", "NULL"))

head(raw_data)
##                               matchup  game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18      Ground            NA
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly            NA
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly          56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly          78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly            NA
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18      Ground            NA
##   predicted_zone camera_zone used_zone
## 1              1           1         1
## 2              4          NA         4
## 3              4          NA         4
## 4              1           1         1
## 5              2          NA         2
## 6              1           1         1

Use of tidyverse functions to transform data

We will use the tidyverse functions to analyze the exit velocities of foul balls.

First, we limit the data to those that contain an exit velocity.Then, using dplyr functions, we’ll group the data by the type of hit and summarize the data.

my_data <- na.omit(raw_data,cols="exit_velocity")

Use group_by to select type_of_hit as the analysis group.

analysis_df <- group_by(my_data, type_of_hit)

Use summarize to create summary columns for each type of hit.

analysis_df<- summarize(analysis_df, mean=round(mean(exit_velocity),1), median=median(exit_velocity), max = max(exit_velocity), min=min(exit_velocity))

Make analysis_df into a data frame for plotting.

analysis_df <- as.data.frame(analysis_df)

analysis_df
##        type_of_hit mean median   max  min
## 1 Batter hits self 69.4   68.3  82.0 60.4
## 2              Fly 81.6   79.1 108.5 60.3
## 3           Ground 75.5   74.8 107.0 25.4
## 4             Line 82.1   82.6 110.6 39.9
## 5           Pop Up 77.9   77.5  90.7 69.7

Plotting using ggplot functions

With a dataframe that contains summary foul ball exit velocity data, we can use the tidyverse package ggplot2 to visualize the summary.

Average exit velocity in ggplot

ggplot(analysis_df, aes(x=type_of_hit,y = mean)) +
    geom_bar(width = .75,stat = "identity", position="dodge") +
    ggtitle("Average Exit Velocity of Foul Balls by Type of Hit") +
    labs(x="Type of Hit",y="Average Exit Velocity (mph)") +
    theme(plot.title = element_text(hjust=0.5)) +
    scale_y_continuous(breaks = seq(0,100,by = 5))

Max exit velocity in ggplot

ggplot(analysis_df, aes(x=type_of_hit,y = max)) +
    geom_bar(width = .75,stat = "identity", position="dodge") +
    ggtitle("Max Exit Velocity of Foul Balls by Type of Hit") +
    labs(x="Type of Hit",y="Max Exit Velocity (mph)") +
    theme(plot.title = element_text(hjust=0.5)) +
    scale_y_continuous(breaks = seq(0,115,by = 5))

Conclusion

We can observe that line hits have the fastest average velocity as well as the fastest max velocity for our dataset.