In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
GitHub repository: https://github.com/acatlin/FALL2022TIDYVERSE
Your task here is to Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.
After you’ve extended your classmate’s vignette, please submit your GitHub handle name in the submission link provided below. This will let your instructor know that your work is ready to be peer-graded.
You should complete your submission on the schedule stated in the course syllabus.
I will be extending Ariana Nolans tidyverse example with additional code.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
myData <- read.csv('https://raw.githubusercontent.com/arinolan/Tidyverse-Create/main/nhl_elo.csv')
glimpse(myData)
## Rows: 66,030
## Columns: 24
## $ season <int> 1918, 1918, 1918, 1918, 1918, 1918, 1918, 19…
## $ date <chr> "1917-12-18", "1917-12-18", "1917-12-20", "1…
## $ playoff <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ neutral <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ status <chr> "post", "post", "post", "post", "post", "pos…
## $ ot <chr> "", "", "", "", "", "", "", "", "", "", "", …
## $ home_team <chr> "Ottawa Senators", "Montreal Wanderers", "To…
## $ away_team <chr> "Montreal Canadiens", "Toronto Maple Leafs",…
## $ home_team_abbr <chr> "OTS", "MTW", "TOR", "MTW", "MTW", "TOR", "M…
## $ away_team_abbr <chr> "MTL", "TOR", "OTS", "MTL", "OTS", "MTL", "T…
## $ home_team_pregame_rating <dbl> 1380.000, 1380.000, 1377.980, 1382.020, 1374…
## $ away_team_pregame_rating <dbl> 1380.000, 1380.000, 1374.590, 1385.410, 1369…
## $ home_team_winprob <dbl> 0.5714631, 0.5714631, 0.5762352, 0.5666778, …
## $ away_team_winprob <dbl> 0.4285369, 0.4285369, 0.4237648, 0.4333222, …
## $ overtime_prob <dbl> 0.2348521, 0.2348521, 0.2342543, 0.2354509, …
## $ home_team_expected_points <dbl> 1.243569, 1.243569, 1.251739, 1.235382, 1.25…
## $ away_team_expected_points <dbl> 0.9912831, 0.9912831, 0.9825153, 1.0000693, …
## $ home_team_score <int> 4, 10, 11, 2, 3, 7, 9, 9, 1, 5, 1, 6, 6, 9, …
## $ away_team_score <int> 7, 9, 4, 11, 6, 5, 2, 2, 0, 6, 0, 5, 4, 4, 6…
## $ home_team_postgame_rating <dbl> 1374.590, 1382.020, 1383.198, 1374.109, 1368…
## $ away_team_postgame_rating <dbl> 1385.410, 1377.980, 1369.372, 1393.321, 1374…
## $ game_quality_rating <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ game_importance_rating <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ game_overall_rating <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
myData <- myData %>% rename(Overtime = 'ot',
Game_Type = 'status')
myData %>% select(home_team, home_team_score, away_team, away_team_score, season, playoff) %>%
filter(season >= 2018) %>%
group_by(season) %>%
summarise(mean_home_score = mean(home_team_score),
mean_away_score = mean(away_team_score))
## # A tibble: 6 × 3
## season mean_home_score mean_away_score
## <int> <dbl> <dbl>
## 1 2018 3.12 2.82
## 2 2019 3.15 2.85
## 3 2020 3.08 2.90
## 4 2021 3.07 2.77
## 5 2022 3.29 3.00
## 6 2023 NA NA
season_Ducks <- myData %>%
filter(season >= 2023 & home_team == 'Anaheim Ducks' & date <= '2022-10-30' ) %>%
ggplot(aes(date, home_team_score, group = 1)) +
geom_point(na.rm=TRUE, color = 'black') +
geom_line(na.rm=TRUE, color = 'red') +
labs(title = 'Ducks Home Score 2022-23 Season', x = 'Date', y = 'Home Score') + theme_classic() +
guides(x = guide_axis(n.dodge = 4))
season_Ducks
In the following few examples, we shall be looking at some of the important and frequently used verb functions (arrange and mutate) from dplyr package.
myData %>% select(home_team, home_team_score, away_team, away_team_score, season, playoff) %>%
filter(season >= 2018) %>%
group_by(season) %>%
summarise(mean_home_score = mean(home_team_score),
mean_away_score = mean(away_team_score)) %>%
mutate(flag_away_score_high = ifelse(mean_away_score>mean_home_score,1,0)) %>%
arrange(desc(flag_away_score_high))
## # A tibble: 6 × 4
## season mean_home_score mean_away_score flag_away_score_high
## <int> <dbl> <dbl> <dbl>
## 1 2018 3.12 2.82 0
## 2 2019 3.15 2.85 0
## 3 2020 3.08 2.90 0
## 4 2021 3.07 2.77 0
## 5 2022 3.29 3.00 0
## 6 2023 NA NA NA
# Convert data from long to wide
myData %>% filter(season >= 2018 & season <= 2022) %>% select(playoff,season,away_team_score) %>%
group_by(playoff,season) %>%
summarize(mean_away_score = mean(away_team_score)) %>%
arrange(desc(mean_away_score)) %>% spread(season, mean_away_score)
## `summarise()` has grouped output by 'playoff'. You can override using the
## `.groups` argument.
## # A tibble: 2 × 6
## # Groups: playoff [2]
## playoff `2018` `2019` `2020` `2021` `2022`
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 2.81 2.86 2.90 2.80 3.02
## 2 1 2.89 2.66 2.91 2.45 2.74
I extended Ariana Nolans tidyverse example with additional code. I extended the example by further looking at multiple other functions from dplyr package such as arrange and mutate. Furthermore, I also used tidyr package in tidyverse to convert data from long to wide.