Data 607 TidyVerse CREATE assignment

In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)

Libraries

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Dataset

I will be using the – dataset

I downloaded this dataset from FiveThirtyEight.com datasets and uploaded the csv to GitHub

Capabilities

read_csv

spi_matches <- read_csv("https://raw.githubusercontent.com/nathtrish334/Data-607/main/spi_matches.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   date = col_date(format = ""),
##   league = col_character(),
##   team1 = col_character(),
##   team2 = col_character()
## )
## i Use `spec()` for the full column specifications.

head(spi_matches)

## # A tibble: 6 x 23
##   season date       league_id league team1 team2  spi1  spi2 prob1 prob2 probtie
##    <dbl> <date>         <dbl> <chr>  <chr> <chr> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1   2016 2016-07-09      7921 FA Wo~ Live~ Read~  51.6  50.4 0.439 0.277   0.284
## 2   2016 2016-07-10      7921 FA Wo~ Arse~ Nott~  46.6  54.0 0.357 0.361   0.282
## 3   2016 2016-07-10      7921 FA Wo~ Chel~ Birm~  59.8  54.6 0.480 0.249   0.271
## 4   2016 2016-07-16      7921 FA Wo~ Live~ Nott~  53    52.4 0.429 0.270   0.301
## 5   2016 2016-07-17      7921 FA Wo~ Chel~ Arse~  59.4  61.0 0.412 0.316   0.272
## 6   2016 2016-07-24      7921 FA Wo~ Read~ Birm~  50.8  55.0 0.382 0.32    0.298
## # ... with 12 more variables: proj_score1 <dbl>, proj_score2 <dbl>,
## #   importance1 <dbl>, importance2 <dbl>, score1 <dbl>, score2 <dbl>,
## #   xg1 <dbl>, xg2 <dbl>, nsxg1 <dbl>, nsxg2 <dbl>, adj_score1 <dbl>,
## #   adj_score2 <dbl>

select
Select and display only a set of columns

spi_matches_select <-select(spi_matches, c("season", "league", "team1", "team2", "prob1", "prob2", "probtie", "score1", "score2"))
head(spi_matches_select)

## # A tibble: 6 x 9
##   season league         team1      team2       prob1 prob2 probtie score1 score2
##    <dbl> <chr>          <chr>      <chr>       <dbl> <dbl>   <dbl>  <dbl>  <dbl>
## 1   2016 FA Women's Su~ Liverpool~ Reading     0.439 0.277   0.284      2      0
## 2   2016 FA Women's Su~ Arsenal W~ Notts Coun~ 0.357 0.361   0.282      2      0
## 3   2016 FA Women's Su~ Chelsea F~ Birmingham~ 0.480 0.249   0.271      1      1
## 4   2016 FA Women's Su~ Liverpool~ Notts Coun~ 0.429 0.270   0.301      0      0
## 5   2016 FA Women's Su~ Chelsea F~ Arsenal Wo~ 0.412 0.316   0.272      1      2
## 6   2016 FA Women's Su~ Reading    Birmingham~ 0.382 0.32    0.298      1      1

filter
I am going to filter SPI ratings from 2020 season and onwards for UEFA Champions League

spi_matches_filter <-filter(spi_matches_select, season >= 2020 & league == "UEFA Champions League")
head(spi_matches_filter)

## # A tibble: 6 x 9
##   season league        team1       team2      prob1  prob2 probtie score1 score2
##    <dbl> <chr>         <chr>       <chr>      <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
## 1   2020 UEFA Champio~ Dynamo Kiev Juventus   0.283 0.473    0.244      0      2
## 2   2020 UEFA Champio~ Zenit St P~ Club Brug~ 0.565 0.182    0.253      1      2
## 3   2020 UEFA Champio~ Lazio       Borussia ~ 0.275 0.479    0.246      3      1
## 4   2020 UEFA Champio~ Stade Renn~ FC Krasno~ 0.501 0.224    0.274      1      1
## 5   2020 UEFA Champio~ Barcelona   Ferencvar~ 0.865 0.0218   0.113      5      1
## 6   2020 UEFA Champio~ Chelsea     Sevilla FC 0.500 0.248    0.252      0      0

Summarise
I am going to find the number of times each league appears in the dataset

#spi_matches_league <-select(spi_matches_select, c("league"))
spi_matches_count <- spi_matches_select %>% count(league, name = "Count", sort = TRUE)
head(spi_matches_count)

## # A tibble: 6 x 2
##   league                      Count
##   <chr>                       <int>
## 1 English League Championship  2223
## 2 Barclays Premier League      1900
## 3 French Ligue 1               1900
## 4 Italy Serie A                1900
## 5 Spanish Primera Division     1900
## 6 Spanish Segunda Division     1865

Data 607 TidyVerse CREATE assignment

Trishita Nath

4/10/2021

Libraries

Dataset

Capabilities

Conclusion