In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
GitHub repository: https://github.com/acatlin/SPRING2020TIDYVERSE
Kaggle datasets.
Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
I will be using the – dataset
I downloaded this dataset from FiveThirtyEight.com datasets and uploaded the csv to GitHub
spi_matches <- read_csv("https://raw.githubusercontent.com/nathtrish334/Data-607/main/spi_matches.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## date = col_date(format = ""),
## league = col_character(),
## team1 = col_character(),
## team2 = col_character()
## )
## i Use `spec()` for the full column specifications.
head(spi_matches)
## # A tibble: 6 x 23
## season date league_id league team1 team2 spi1 spi2 prob1 prob2 probtie
## <dbl> <date> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2016 2016-07-09 7921 FA Wo~ Live~ Read~ 51.6 50.4 0.439 0.277 0.284
## 2 2016 2016-07-10 7921 FA Wo~ Arse~ Nott~ 46.6 54.0 0.357 0.361 0.282
## 3 2016 2016-07-10 7921 FA Wo~ Chel~ Birm~ 59.8 54.6 0.480 0.249 0.271
## 4 2016 2016-07-16 7921 FA Wo~ Live~ Nott~ 53 52.4 0.429 0.270 0.301
## 5 2016 2016-07-17 7921 FA Wo~ Chel~ Arse~ 59.4 61.0 0.412 0.316 0.272
## 6 2016 2016-07-24 7921 FA Wo~ Read~ Birm~ 50.8 55.0 0.382 0.32 0.298
## # ... with 12 more variables: proj_score1 <dbl>, proj_score2 <dbl>,
## # importance1 <dbl>, importance2 <dbl>, score1 <dbl>, score2 <dbl>,
## # xg1 <dbl>, xg2 <dbl>, nsxg1 <dbl>, nsxg2 <dbl>, adj_score1 <dbl>,
## # adj_score2 <dbl>
spi_matches_select <-select(spi_matches, c("season", "league", "team1", "team2", "prob1", "prob2", "probtie", "score1", "score2"))
head(spi_matches_select)
## # A tibble: 6 x 9
## season league team1 team2 prob1 prob2 probtie score1 score2
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2016 FA Women's Su~ Liverpool~ Reading 0.439 0.277 0.284 2 0
## 2 2016 FA Women's Su~ Arsenal W~ Notts Coun~ 0.357 0.361 0.282 2 0
## 3 2016 FA Women's Su~ Chelsea F~ Birmingham~ 0.480 0.249 0.271 1 1
## 4 2016 FA Women's Su~ Liverpool~ Notts Coun~ 0.429 0.270 0.301 0 0
## 5 2016 FA Women's Su~ Chelsea F~ Arsenal Wo~ 0.412 0.316 0.272 1 2
## 6 2016 FA Women's Su~ Reading Birmingham~ 0.382 0.32 0.298 1 1
spi_matches_filter <-filter(spi_matches_select, season >= 2020 & league == "UEFA Champions League")
head(spi_matches_filter)
## # A tibble: 6 x 9
## season league team1 team2 prob1 prob2 probtie score1 score2
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2020 UEFA Champio~ Dynamo Kiev Juventus 0.283 0.473 0.244 0 2
## 2 2020 UEFA Champio~ Zenit St P~ Club Brug~ 0.565 0.182 0.253 1 2
## 3 2020 UEFA Champio~ Lazio Borussia ~ 0.275 0.479 0.246 3 1
## 4 2020 UEFA Champio~ Stade Renn~ FC Krasno~ 0.501 0.224 0.274 1 1
## 5 2020 UEFA Champio~ Barcelona Ferencvar~ 0.865 0.0218 0.113 5 1
## 6 2020 UEFA Champio~ Chelsea Sevilla FC 0.500 0.248 0.252 0 0
#spi_matches_league <-select(spi_matches_select, c("league"))
spi_matches_count <- spi_matches_select %>% count(league, name = "Count", sort = TRUE)
head(spi_matches_count)
## # A tibble: 6 x 2
## league Count
## <chr> <int>
## 1 English League Championship 2223
## 2 Barclays Premier League 1900
## 3 French Ligue 1 1900
## 4 Italy Serie A 1900
## 5 Spanish Primera Division 1900
## 6 Spanish Segunda Division 1865
I have demonstrated four capabilities of the dplyr package; these have been: reading a csv, filtering, selecting and summarising.