Assignment

In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

GitHub repository

FiveThirtyEight.com datasets.

Kaggle datasets

Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)

Later (see next assignment below), you’ll be asked to extend an existing vignette. Using one of your classmate’s examples (as created above), you’ll then extend his or her example with additional annotated code. (15 points)

You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.

After you’ve created your vignette, please submit your GitHub handle name in the submission link provided below.

You should complete your submission on the schedule stated in the course syllabus.

Data

The dataset I chose came from FiveThirtyEight.com article We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land.

library(tidyverse)
## -- Attaching packages ----------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts -------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Reading the Data from Github

foul<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/foul-balls/foul-balls.csv", na.strings=c("NA", "NULL"))

head(foul)
##                            ï..matchup  game_date type_of_hit exit_velocity
## 1 Seattle Mariners VS Minnesota Twins 2019-05-18      Ground            NA
## 2 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly            NA
## 3 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly          56.9
## 4 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly          78.8
## 5 Seattle Mariners VS Minnesota Twins 2019-05-18         Fly            NA
## 6 Seattle Mariners VS Minnesota Twins 2019-05-18      Ground            NA
##   predicted_zone camera_zone used_zone
## 1              1           1         1
## 2              4          NA         4
## 3              4          NA         4
## 4              1           1         1
## 5              2          NA         2
## 6              1           1         1

Renaming Data Columns

names(foul)<-c("Matchup","Game_Date", "Type_of_Hit","Exit_Velocity", "Predicted_Zone", "Camera_Zone", "Used_Zone")

colnames(foul)
## [1] "Matchup"        "Game_Date"      "Type_of_Hit"    "Exit_Velocity" 
## [5] "Predicted_Zone" "Camera_Zone"    "Used_Zone"

In order to have a general visualization of the designated zones, I attached a photo from the article We Watched 906 Foul Balls To Find Out Where The Most Dangerous Ones Land.

Generic Stadium Map

Dplyr

I will use dplyr from the Tidyverse package.

filter()

As a huge baseball fan, I wanted to filter out the outermost predicted zone and compare it to the exit velocity.

foul %>%
    filter(Predicted_Zone=="7"|Predicted_Zone=="6")
##                                         Matchup  Game_Date Type_of_Hit
## 1           Seattle Mariners VS Minnesota Twins 2019-05-18         Fly
## 2           Seattle Mariners VS Minnesota Twins 2019-05-18         Fly
## 3          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 4          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 5          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 6          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 7       Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01         Fly
## 8       Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01         Fly
## 9            Texas Rangers vs Toronto Blue Jays 2019-05-03         Fly
## 10           Texas Rangers vs Toronto Blue Jays 2019-05-03         Fly
## 11 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29         Fly
##    Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1          104.6              6           6         6
## 2           94.0              7           7         7
## 3          105.3              6           6         6
## 4           92.7              6           6         6
## 5          108.5              6           6         6
## 6          108.5              7           7         7
## 7           96.6              7           7         7
## 8          102.3              7           7         7
## 9           94.0              6           6         6
## 10          91.9              7           7         7
## 11         100.0              6           6         6

arrange()

Since this is a large data set, I wanted to take a look at the top 20 highest exit velocities.

foul %>%
    arrange(desc(Exit_Velocity))%>%
    head(20)
##                                         Matchup  Game_Date Type_of_Hit
## 1          Baltimore Orioles VS Minnesota Twins 2019-04-20        Line
## 2            Milwaukee Brewers vs New York Mets 2019-05-04         Fly
## 3          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 4          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 5            Texas Rangers vs Toronto Blue Jays 2019-05-03      Ground
## 6         New York Yankees vs Baltimore Orioles 2019-03-31         Fly
## 7       Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01      Ground
## 8           Seattle Mariners VS Minnesota Twins 2019-05-18      Ground
## 9          Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 10         Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 11           Texas Rangers vs Toronto Blue Jays 2019-05-03         Fly
## 12          Seattle Mariners VS Minnesota Twins 2019-05-18         Fly
## 13 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29        Line
## 14                Oakland A's vs Houston Astros 2019-06-02      Ground
## 15         Baltimore Orioles VS Minnesota Twins 2019-04-20         Fly
## 16      Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01        Line
## 17      Pittsburgh Pirates VS Milwaukee Brewers 2019-06-01         Fly
## 18        New York Yankees vs Baltimore Orioles 2019-03-31        Line
## 19           Milwaukee Brewers vs New York Mets 2019-05-04         Fly
## 20 Los Angeles Dodgers vs Arizona Diamondsbacks 2019-03-29        Line
##    Exit_Velocity Predicted_Zone Camera_Zone Used_Zone
## 1          110.6              4           4         4
## 2          108.7              4          NA         4
## 3          108.5              6           6         6
## 4          108.5              7           7         7
## 5          107.6              5          NA         5
## 6          107.1              5          NA         5
## 7          107.0              4           4         4
## 8          106.6              5           5         5
## 9          106.2              5           5         5
## 10         105.3              6           6         6
## 11         105.3              4           4         4
## 12         104.6              6           6         6
## 13         103.3              4           4         4
## 14         103.0              5           5         5
## 15         102.9              5          NA         5
## 16         102.7              5          NA         5
## 17         102.3              7           7         7
## 18         102.3              5           5         5
## 19         101.8              5           5         5
## 20         101.7              4           4         4

summarise()

I used the summarize function from dplyr to generate some statistics for the exit velocity.

foul_info<-na.omit(foul)
foul_sum <- group_by(foul_info, Type_of_Hit)
foul_sum <- summarise(foul_sum,Min=min(Exit_Velocity),Max = max(Exit_Velocity),Median=median(Exit_Velocity), Mean=round(mean(Exit_Velocity),1))
## `summarise()` ungrouping output (override with `.groups` argument)
foul_sum<-as.data.frame(foul_sum)
foul_sum
##        Type_of_Hit  Min   Max Median Mean
## 1 Batter hits self 60.4  82.0   68.3 69.4
## 2              Fly 60.3 108.5   79.1 81.6
## 3           Ground 25.4 107.0   74.8 75.5
## 4             Line 39.9 110.6   82.6 82.1
## 5           Pop Up 69.7  90.7   77.5 77.9