mkolp0-" date: “11/30/2019” output: html_document —
library(knitr)
library(tidyverse)
## -- Attaching packages ----------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts -------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
We use a fixed commit because this version has a data issue.
df <- read_csv("https://raw.githubusercontent.com/willoutcault/Data607-Data-Acquisition/975ba8d48e0c3d8590bd9afa79e75632152aabfc/runningbacks%20consolidated.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## Team = col_character(),
## DisplayName = col_character(),
## GameClock = col_time(format = ""),
## PossessionTeam = col_character(),
## OffenseFormation = col_character(),
## OffensePersonnel = col_character(),
## DefensePersonnel = col_character(),
## PlayDirection = col_character(),
## TimeHandoff = col_datetime(format = ""),
## TimeSnap = col_datetime(format = ""),
## PlayerHeight = col_character(),
## Stadium = col_character()
## )
## See spec(...) for full column specifications.
df2 <- select(df, "DisplayName", "PlayerHeight","PossessionTeam", "PlayerWeight")
The height column has actually been misparsed as a date (e.g. 02-Jun for 6 feet 2 inches). we can use Lubridate to parse this out and recover, but we have to be a bit clever because the misparsing reads as Jun-00 for someone 6 feet exactly. So we replace the 00 with 13 using stringr and then take the month compoennt as feet, and the inches component we check if the day is 13 and take that as 0 otherwise we use days directly.
df2$playerHeightDate <- lubridate::parse_date_time(str_replace(df2$PlayerHeight,"00","13"), c("d-b","b-d"))
df3 <- df2
df3$feet <- lubridate::month(df2$playerHeightDate)
df3$inches <- ifelse(lubridate::day(df2$playerHeightDate)==13,0,lubridate::day(df2$playerHeightDate))
df4 <- df3 %>% mutate(Height = (feet*12) + inches) %>%
select("DisplayName", "Height", "PlayerWeight", "PossessionTeam")
df5 <- df4 %>%
mutate(bmi = 703*PlayerWeight/(Height^2))
kable(head(df5))
| DisplayName | Height | PlayerWeight | PossessionTeam | bmi |
|---|---|---|---|---|
| Matt Ryan | 76 | 217 | ATL | 26.41118 |
| Andy Levitre | 74 | 303 | ATL | 38.89865 |
| Alex Mack | 76 | 311 | ATL | 37.85197 |
| Brandon Fusco | 76 | 306 | ATL | 37.24342 |
| Julio Jones | 75 | 220 | ATL | 27.49511 |
| Logan Paulsen | 77 | 268 | ATL | 31.77669 |
df5 %>% ggplot(aes(bmi)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
height_in_cm <- df5 %>% mutate(Height_in_inches = Height * 2.54)
height_in_cm
## # A tibble: 1,999 x 6
## DisplayName Height PlayerWeight PossessionTeam bmi Height_in_inches
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 Matt Ryan 76 217 ATL 26.4 193.
## 2 Andy Levitre 74 303 ATL 38.9 188.
## 3 Alex Mack 76 311 ATL 37.9 193.
## 4 Brandon Fusco 76 306 ATL 37.2 193.
## 5 Julio Jones 75 220 ATL 27.5 190.
## 6 Logan Paulsen 77 268 ATL 31.8 196.
## 7 Mohamed Sanu 74 215 ATL 27.6 188.
## 8 Ryan Schraeder 79 300 ATL 33.8 201.
## 9 Devonta Freem~ 68 206 ATL 31.3 173.
## 10 Jake Matthews 77 309 ATL 36.6 196.
## # ... with 1,989 more rows
tallest_players<-height_in_cm %>%
group_by(PossessionTeam) %>%
summarise(Total=mean(Height_in_inches)) %>%
arrange(desc(Total))
tallest_players
## # A tibble: 6 x 2
## PossessionTeam Total
## <chr> <dbl>
## 1 PIT 190.
## 2 CLV 189.
## 3 BLT 189.
## 4 ATL 189.
## 5 BUF 188.
## 6 PHI 187.
Based of the analysis above, Pittsburgh Steelers on average have the tallest players out of the 6 teams.
heaviest_players<-height_in_cm %>%
group_by(PossessionTeam) %>%
summarise(Total=mean(PlayerWeight)) %>%
arrange(desc(Total))
heaviest_players
## # A tibble: 6 x 2
## PossessionTeam Total
## <chr> <dbl>
## 1 PIT 260.
## 2 CLV 254.
## 3 BLT 253.
## 4 BUF 252.
## 5 ATL 251.
## 6 PHI 248.
Based of the analysis above, Pittsburgh Steelers on average have the heaviest players out of the 6 teams. Basic observations tell us that Pittsburgh Steelers have the tallest and heaviest players out of the 6 teams.