In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
GitHub repository: https://github.com/pkowalchuk/SPRING2024TIDYVERSE
Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.
The dataset I chose for this assignment from FiveThirtyEight.com and the article is “We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land”. Here is the link of the data: https://fivethirtyeight.com/features/we-watched-906-foul-balls-to-find-out-where-the-most-dangerous-ones-land/
The data I downloaded from: https://github.com/fivethirtyeight/data/blob/master/foul-balls/foul-balls.csv
foul_balls<-read.csv("foul-balls.csv")
# Summary of the data
summary(foul_balls)
## matchup game_date type_of_hit exit_velocity
## Length:906 Length:906 Length:906 Min. : 25.4
## Class :character Class :character Class :character 1st Qu.: 69.7
## Mode :character Mode :character Mode :character Median : 75.7
## Mean : 76.4
## 3rd Qu.: 81.7
## Max. :110.6
## NA's :326
## predicted_zone camera_zone used_zone
## Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :3.000 Median :1.000 Median :3.000
## Mean :3.038 Mean :2.369 Mean :3.058
## 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:5.000
## Max. :7.000 Max. :7.000 Max. :7.000
## NA's :513
#Display the dataset
head(foul_balls)
## matchup game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground NA
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly NA
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly NA
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground NA
## predicted_zone camera_zone used_zone
## 1 1 1 1
## 2 4 NA 4
## 3 4 NA 4
## 4 1 1 1
## 5 2 NA 2
## 6 1 1 1
names(foul_balls)<-c("MatchUP", "Game_Date", "Type_of_Hit", "Exit_Velocity", "Predicted_Zone", "Camera_Zone", "Used_Zone")
colnames(foul_balls)
## [1] "MatchUP" "Game_Date" "Type_of_Hit" "Exit_Velocity"
## [5] "Predicted_Zone" "Camera_Zone" "Used_Zone"
I am filtering the data to the outermost predicted zone and comparing it to the exit velocity.
foul_balls %>%
filter(Predicted_Zone=="7"|Predicted_Zone=="6")
## MatchUP Game_Date Type_of_Hit
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly
## 3 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 4 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 5 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 6 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 7 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Fly
## 8 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Fly
## 9 Texas Rangers vs Toronto Blue Jays 2019-05-03 Fly
## 10 Texas Rangers vs Toronto Blue Jays 2019-05-03 Fly
## 11 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29 Fly
## Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1 104.6 6 6 6
## 2 94.0 7 7 7
## 3 105.3 6 6 6
## 4 92.7 6 6 6
## 5 108.5 6 6 6
## 6 108.5 7 7 7
## 7 96.6 7 7 7
## 8 102.3 7 7 7
## 9 94.0 6 6 6
## 10 91.9 7 7 7
## 11 100.0 6 6 6
Since this is a large data set, I wanted to take a look at the top 20 highest exit velocities.
foul_balls %>%
arrange(desc(Exit_Velocity))%>%
head(20)
## MatchUP Game_Date Type_of_Hit
## 1 Baltimore Orioles VS Minnesota Twins 2019-04-20 Line
## 2 Milwaukee Brewers vs New York Mets 2019-05-04 Fly
## 3 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 4 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 5 Texas Rangers vs Toronto Blue Jays 2019-05-03 Ground
## 6 New York Yankees vs Baltimore Orioles 2019-03-31 Fly
## 7 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Ground
## 8 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground
## 9 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 10 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 11 Texas Rangers vs Toronto Blue Jays 2019-05-03 Fly
## 12 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly
## 13 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29 Line
## 14 Oakland A's vs Houston Astros 2019-06-02 Ground
## 15 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 16 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Line
## 17 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Fly
## 18 New York Yankees vs Baltimore Orioles 2019-03-31 Line
## 19 Milwaukee Brewers vs New York Mets 2019-05-04 Fly
## 20 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29 Line
## Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1 110.6 4 4 4
## 2 108.7 4 NA 4
## 3 108.5 6 6 6
## 4 108.5 7 7 7
## 5 107.6 5 NA 5
## 6 107.1 5 NA 5
## 7 107.0 4 4 4
## 8 106.6 5 5 5
## 9 106.2 5 5 5
## 10 105.3 6 6 6
## 11 105.3 4 4 4
## 12 104.6 6 6 6
## 13 103.3 4 4 4
## 14 103.0 5 5 5
## 15 102.9 5 NA 5
## 16 102.7 5 NA 5
## 17 102.3 7 7 7
## 18 102.3 5 5 5
## 19 101.8 5 5 5
## 20 101.7 4 4 4
I used the summarize function from dplyr to generate some statistics for the exit velocity.
foul_info <- na.omit(foul_balls)
foul_sum <- group_by(foul_info, Type_of_Hit)
Summarising minimum, maximum, median, and mean of the “Exit_Velocity”.
foul_sum <- summarise(foul_sum,Min=min(Exit_Velocity),Max = max(Exit_Velocity),
Median=median(Exit_Velocity), Mean=round(mean(Exit_Velocity),1))
foul_sum<-as.data.frame(foul_sum)
foul_sum
## Type_of_Hit Min Max Median Mean
## 1 Batter hits self 60.4 82.0 68.3 69.4
## 2 Fly 60.3 108.5 79.1 81.6
## 3 Ground 25.4 107.0 74.8 75.5
## 4 Line 39.9 110.6 82.6 82.1
## 5 Pop Up 69.7 90.7 77.5 77.9