Description

In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

GitHub repository: https://github.com/pkowalchuk/SPRING2024TIDYVERSE

FiveThirtyEight.com datasets.

Kaggle datasets.

Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.

Data

The dataset I chose for this assignment from FiveThirtyEight.com and the article is “We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land”. Here is the link of the data: https://fivethirtyeight.com/features/we-watched-906-foul-balls-to-find-out-where-the-most-dangerous-ones-land/

The data I downloaded from: https://github.com/fivethirtyeight/data/blob/master/foul-balls/foul-balls.csv

foul_balls<-read.csv("foul-balls.csv")

# Summary of the data
summary(foul_balls)
##    matchup           game_date         type_of_hit        exit_velocity  
##  Length:906         Length:906         Length:906         Min.   : 25.4  
##  Class :character   Class :character   Class :character   1st Qu.: 69.7  
##  Mode  :character   Mode  :character   Mode  :character   Median : 75.7  
##                                                           Mean   : 76.4  
##                                                           3rd Qu.: 81.7  
##                                                           Max.   :110.6  
##                                                           NA's   :326    
##  predicted_zone   camera_zone      used_zone    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median :1.000   Median :3.000  
##  Mean   :3.038   Mean   :2.369   Mean   :3.058  
##  3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:5.000  
##  Max.   :7.000   Max.   :7.000   Max.   :7.000  
##                  NA's   :513
#Display the dataset
head(foul_balls)
##                               matchup  game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18      Ground            NA
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly            NA
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly          56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly          78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly            NA
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18      Ground            NA
##   predicted_zone camera_zone used_zone
## 1              1           1         1
## 2              4          NA         4
## 3              4          NA         4
## 4              1           1         1
## 5              2          NA         2
## 6              1           1         1

Renaming Data Columns

names(foul_balls)<-c("MatchUP", "Game_Date", "Type_of_Hit", "Exit_Velocity", "Predicted_Zone", "Camera_Zone", "Used_Zone")

colnames(foul_balls)
## [1] "MatchUP"        "Game_Date"      "Type_of_Hit"    "Exit_Velocity" 
## [5] "Predicted_Zone" "Camera_Zone"    "Used_Zone"

Filtering Data

I am filtering the data to the outermost predicted zone and comparing it to the exit velocity.

foul_balls %>%
    filter(Predicted_Zone=="7"|Predicted_Zone=="6")
##                                         MatchUP  Game_Date Type_of_Hit
## 1           Seattle Mariners VS Minnesota Twins 2019-05-18         Fly
## 2           Seattle Mariners VS Minnesota Twins 2019-05-18         Fly
## 3          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 4          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 5          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 6          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 7       Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01         Fly
## 8       Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01         Fly
## 9            Texas Rangers vs Toronto Blue Jays 2019-05-03         Fly
## 10           Texas Rangers vs Toronto Blue Jays 2019-05-03         Fly
## 11 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29         Fly
##    Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1          104.6              6           6         6
## 2           94.0              7           7         7
## 3          105.3              6           6         6
## 4           92.7              6           6         6
## 5          108.5              6           6         6
## 6          108.5              7           7         7
## 7           96.6              7           7         7
## 8          102.3              7           7         7
## 9           94.0              6           6         6
## 10          91.9              7           7         7
## 11         100.0              6           6         6

Arranging Data

Since this is a large data set, I wanted to take a look at the top 20 highest exit velocities.

foul_balls %>%
    arrange(desc(Exit_Velocity))%>%
    head(20)
##                                         MatchUP  Game_Date Type_of_Hit
## 1          Baltimore Orioles VS Minnesota Twins 2019-04-20        Line
## 2            Milwaukee Brewers vs New York Mets 2019-05-04         Fly
## 3          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 4          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 5            Texas Rangers vs Toronto Blue Jays 2019-05-03      Ground
## 6         New York Yankees vs Baltimore Orioles 2019-03-31         Fly
## 7       Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01      Ground
## 8           Seattle Mariners VS Minnesota Twins 2019-05-18      Ground
## 9          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 10         Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 11           Texas Rangers vs Toronto Blue Jays 2019-05-03         Fly
## 12          Seattle Mariners VS Minnesota Twins 2019-05-18         Fly
## 13 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29        Line
## 14                Oakland A's vs Houston Astros 2019-06-02      Ground
## 15         Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 16      Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01        Line
## 17      Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01         Fly
## 18        New York Yankees vs Baltimore Orioles 2019-03-31        Line
## 19           Milwaukee Brewers vs New York Mets 2019-05-04         Fly
## 20 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29        Line
##    Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1          110.6              4           4         4
## 2          108.7              4          NA         4
## 3          108.5              6           6         6
## 4          108.5              7           7         7
## 5          107.6              5          NA         5
## 6          107.1              5          NA         5
## 7          107.0              4           4         4
## 8          106.6              5           5         5
## 9          106.2              5           5         5
## 10         105.3              6           6         6
## 11         105.3              4           4         4
## 12         104.6              6           6         6
## 13         103.3              4           4         4
## 14         103.0              5           5         5
## 15         102.9              5          NA         5
## 16         102.7              5          NA         5
## 17         102.3              7           7         7
## 18         102.3              5           5         5
## 19         101.8              5           5         5
## 20         101.7              4           4         4

Summerise

I used the summarize function from dplyr to generate some statistics for the exit velocity.

foul_info <- na.omit(foul_balls)

foul_sum <- group_by(foul_info, Type_of_Hit)

Summarising minimum, maximum, median, and mean of the “Exit_Velocity”.

foul_sum <- summarise(foul_sum,Min=min(Exit_Velocity),Max = max(Exit_Velocity),
                      Median=median(Exit_Velocity), Mean=round(mean(Exit_Velocity),1))
foul_sum<-as.data.frame(foul_sum)

foul_sum
##        Type_of_Hit  Min   Max Median Mean
## 1 Batter hits self 60.4  82.0   68.3 69.4
## 2              Fly 60.3 108.5   79.1 81.6
## 3           Ground 25.4 107.0   74.8 75.5
## 4             Line 39.9 110.6   82.6 82.1
## 5           Pop Up 69.7  90.7   77.5 77.9