In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
FiveThirtyEight.com datasets.
Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
Later (see next assignment below), you’ll be asked to extend an existing vignette. Using one of your classmate’s examples (as created above), you’ll then extend his or her example with additional annotated code. (15 points)
You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.
After you’ve created your vignette, please submit your GitHub handle name in the submission link provided below.
You should complete your submission on the schedule stated in the course syllabus.
The dataset I chose came from FiveThirtyEight.com article We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land.
library(tidyverse)
## -- Attaching packages ----------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts -------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
foul<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/foul-balls/foul-balls.csv", na.strings=c("NA", "NULL"))
head(foul)
## ï..matchup game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground NA
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly NA
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly 78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly NA
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground NA
## predicted_zone camera_zone used_zone
## 1 1 1 1
## 2 4 NA 4
## 3 4 NA 4
## 4 1 1 1
## 5 2 NA 2
## 6 1 1 1
names(foul)<-c("Matchup","Game_Date", "Type_of_Hit","Exit_Velocity", "Predicted_Zone", "Camera_Zone", "Used_Zone")
colnames(foul)
## [1] "Matchup" "Game_Date" "Type_of_Hit" "Exit_Velocity"
## [5] "Predicted_Zone" "Camera_Zone" "Used_Zone"
In order to have a general visualization of the designated zones, I attached a photo from the article We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land.
Generic Stadium Map
I will use dplyr from the Tidyverse package.
As a huge baseball fan, I wanted to filter out the outermost predicted zone and compare it to the exit velocity.
foul %>%
filter(Predicted_Zone=="7"|Predicted_Zone=="6")
## Matchup Game_Date Type_of_Hit
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly
## 3 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 4 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 5 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 6 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 7 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Fly
## 8 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Fly
## 9 Texas Rangers vs Toronto Blue Jays 2019-05-03 Fly
## 10 Texas Rangers vs Toronto Blue Jays 2019-05-03 Fly
## 11 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29 Fly
## Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1 104.6 6 6 6
## 2 94.0 7 7 7
## 3 105.3 6 6 6
## 4 92.7 6 6 6
## 5 108.5 6 6 6
## 6 108.5 7 7 7
## 7 96.6 7 7 7
## 8 102.3 7 7 7
## 9 94.0 6 6 6
## 10 91.9 7 7 7
## 11 100.0 6 6 6
Since this is a large data set, I wanted to take a look at the top 20 highest exit velocities.
foul %>%
arrange(desc(Exit_Velocity))%>%
head(20)
## Matchup Game_Date Type_of_Hit
## 1 Baltimore Orioles VS Minnesota Twins 2019-04-20 Line
## 2 Milwaukee Brewers vs New York Mets 2019-05-04 Fly
## 3 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 4 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 5 Texas Rangers vs Toronto Blue Jays 2019-05-03 Ground
## 6 New York Yankees vs Baltimore Orioles 2019-03-31 Fly
## 7 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Ground
## 8 Seattle Mariners VS Minnesota Twins 2019-05-18 Ground
## 9 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 10 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 11 Texas Rangers vs Toronto Blue Jays 2019-05-03 Fly
## 12 Seattle Mariners VS Minnesota Twins 2019-05-18 Fly
## 13 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29 Line
## 14 Oakland A's vs Houston Astros 2019-06-02 Ground
## 15 Baltimore Orioles VS Minnesota Twins 2019-04-20 Fly
## 16 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Line
## 17 Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01 Fly
## 18 New York Yankees vs Baltimore Orioles 2019-03-31 Line
## 19 Milwaukee Brewers vs New York Mets 2019-05-04 Fly
## 20 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29 Line
## Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1 110.6 4 4 4
## 2 108.7 4 NA 4
## 3 108.5 6 6 6
## 4 108.5 7 7 7
## 5 107.6 5 NA 5
## 6 107.1 5 NA 5
## 7 107.0 4 4 4
## 8 106.6 5 5 5
## 9 106.2 5 5 5
## 10 105.3 6 6 6
## 11 105.3 4 4 4
## 12 104.6 6 6 6
## 13 103.3 4 4 4
## 14 103.0 5 5 5
## 15 102.9 5 NA 5
## 16 102.7 5 NA 5
## 17 102.3 7 7 7
## 18 102.3 5 5 5
## 19 101.8 5 5 5
## 20 101.7 4 4 4
I used the summarize function from dplyr to generate some statistics for the exit velocity.
foul_info<-na.omit(foul)
foul_sum <- group_by(foul_info, Type_of_Hit)
foul_sum <- summarise(foul_sum,Min=min(Exit_Velocity),Max = max(Exit_Velocity),Median=median(Exit_Velocity), Mean=round(mean(Exit_Velocity),1))
## `summarise()` ungrouping output (override with `.groups` argument)
foul_sum<-as.data.frame(foul_sum)
foul_sum
## Type_of_Hit Min Max Median Mean
## 1 Batter hits self 60.4 82.0 68.3 69.4
## 2 Fly 60.3 108.5 79.1 81.6
## 3 Ground 25.4 107.0 74.8 75.5
## 4 Line 39.9 110.6 82.6 82.1
## 5 Pop Up 69.7 90.7 77.5 77.9