mkolp0-" date: “11/30/2019” output: html_document —

library(knitr)
library(tidyverse)
## -- Attaching packages ----------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts -------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Read Data using readr

We use a fixed commit because this version has a data issue.

df <- read_csv("https://raw.githubusercontent.com/willoutcault/Data607-Data-Acquisition/975ba8d48e0c3d8590bd9afa79e75632152aabfc/runningbacks%20consolidated.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Team = col_character(),
##   DisplayName = col_character(),
##   GameClock = col_time(format = ""),
##   PossessionTeam = col_character(),
##   OffenseFormation = col_character(),
##   OffensePersonnel = col_character(),
##   DefensePersonnel = col_character(),
##   PlayDirection = col_character(),
##   TimeHandoff = col_datetime(format = ""),
##   TimeSnap = col_datetime(format = ""),
##   PlayerHeight = col_character(),
##   Stadium = col_character()
## )
## See spec(...) for full column specifications.

Select Factors of Interest

df2 <- select(df, "DisplayName", "PlayerHeight","PossessionTeam", "PlayerWeight")

Seperate Feet and Inches in Height Column

The height column has actually been misparsed as a date (e.g. 02-Jun for 6 feet 2 inches). we can use Lubridate to parse this out and recover, but we have to be a bit clever because the misparsing reads as Jun-00 for someone 6 feet exactly. So we replace the 00 with 13 using stringr and then take the month compoennt as feet, and the inches component we check if the day is 13 and take that as 0 otherwise we use days directly.

df2$playerHeightDate <- lubridate::parse_date_time(str_replace(df2$PlayerHeight,"00","13"), c("d-b","b-d"))
df3 <- df2
df3$feet <- lubridate::month(df2$playerHeightDate)
df3$inches <- ifelse(lubridate::day(df2$playerHeightDate)==13,0,lubridate::day(df2$playerHeightDate)) 

Create Column with Total Inches

df4 <- df3 %>% mutate(Height = (feet*12) + inches) %>%
  select("DisplayName", "Height", "PlayerWeight", "PossessionTeam")

Use BMI Forumla to Calculate Player’s BMI

df5 <- df4 %>%
  mutate(bmi = 703*PlayerWeight/(Height^2))
kable(head(df5))
DisplayName Height PlayerWeight PossessionTeam bmi
Matt Ryan 76 217 ATL 26.41118
Andy Levitre 74 303 ATL 38.89865
Alex Mack 76 311 ATL 37.85197
Brandon Fusco 76 306 ATL 37.24342
Julio Jones 75 220 ATL 27.49511
Logan Paulsen 77 268 ATL 31.77669
df5 %>% ggplot(aes(bmi)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Extended part by Tony Mei

  1. Convert the height of each players from inches into centimeters(metric system)
height_in_cm <- df5 %>% mutate(Height_in_inches = Height * 2.54)
height_in_cm
## # A tibble: 1,999 x 6
##    DisplayName    Height PlayerWeight PossessionTeam   bmi Height_in_inches
##    <chr>           <dbl>        <dbl> <chr>          <dbl>            <dbl>
##  1 Matt Ryan          76          217 ATL             26.4             193.
##  2 Andy Levitre       74          303 ATL             38.9             188.
##  3 Alex Mack          76          311 ATL             37.9             193.
##  4 Brandon Fusco      76          306 ATL             37.2             193.
##  5 Julio Jones        75          220 ATL             27.5             190.
##  6 Logan Paulsen      77          268 ATL             31.8             196.
##  7 Mohamed Sanu       74          215 ATL             27.6             188.
##  8 Ryan Schraeder     79          300 ATL             33.8             201.
##  9 Devonta Freem~     68          206 ATL             31.3             173.
## 10 Jake Matthews      77          309 ATL             36.6             196.
## # ... with 1,989 more rows
  1. Find out which team have the tallest players on average
tallest_players<-height_in_cm %>% 
  group_by(PossessionTeam) %>% 
  summarise(Total=mean(Height_in_inches)) %>% 
  arrange(desc(Total))
tallest_players
## # A tibble: 6 x 2
##   PossessionTeam Total
##   <chr>          <dbl>
## 1 PIT             190.
## 2 CLV             189.
## 3 BLT             189.
## 4 ATL             189.
## 5 BUF             188.
## 6 PHI             187.

Based of the analysis above, Pittsburgh Steelers on average have the tallest players out of the 6 teams.

  1. Find out which team have the heaviest players on average.
heaviest_players<-height_in_cm %>% 
  group_by(PossessionTeam) %>% 
  summarise(Total=mean(PlayerWeight)) %>% 
  arrange(desc(Total))
heaviest_players
## # A tibble: 6 x 2
##   PossessionTeam Total
##   <chr>          <dbl>
## 1 PIT             260.
## 2 CLV             254.
## 3 BLT             253.
## 4 BUF             252.
## 5 ATL             251.
## 6 PHI             248.

Based of the analysis above, Pittsburgh Steelers on average have the heaviest players out of the 6 teams. Basic observations tell us that Pittsburgh Steelers have the tallest and heaviest players out of the 6 teams.