Import your data

data <- read_csv("../00_data/myData.csv")
## Rows: 900 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (11): country, city, stage, home_team, away_team, outcome, win_conditio...
## dbl   (3): year, home_score, away_score
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data
## # A tibble: 900 × 15
##     year country city      stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵
##    <dbl> <chr>   <chr>     <chr> <chr>   <chr>     <dbl>   <dbl> <chr>   <chr>  
##  1  1930 Uruguay Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  2  1930 Uruguay Montevid… Grou… Belgium United…       0       3 A       <NA>   
##  3  1930 Uruguay Montevid… Grou… Brazil  Yugosl…       1       2 A       <NA>   
##  4  1930 Uruguay Montevid… Grou… Peru    Romania       1       3 A       <NA>   
##  5  1930 Uruguay Montevid… Grou… Argent… France        1       0 H       <NA>   
##  6  1930 Uruguay Montevid… Grou… Chile   Mexico        3       0 H       <NA>   
##  7  1930 Uruguay Montevid… Grou… Bolivia Yugosl…       0       4 A       <NA>   
##  8  1930 Uruguay Montevid… Grou… Paragu… United…       0       3 A       <NA>   
##  9  1930 Uruguay Montevid… Grou… Uruguay Peru          1       0 H       <NA>   
## 10  1930 Uruguay Montevid… Grou… Argent… Mexico        6       3 H       <NA>   
## # … with 890 more rows, 5 more variables: winning_team <chr>,
## #   losing_team <chr>, date <date>, month <chr>, dayofweek <chr>, and
## #   abbreviated variable names ¹​home_team, ²​away_team, ³​home_score,
## #   ⁴​away_score, ⁵​win_conditions

Pivoting

long to wide form

data %>%
    
    pivot_longer(cols = c(`home_team`, `winning_team`))
## # A tibble: 1,800 × 15
##     year country city      stage away_…¹ home_…² away_…³ outcome win_c…⁴ losin…⁵
##    <dbl> <chr>   <chr>     <chr> <chr>     <dbl>   <dbl> <chr>   <chr>   <chr>  
##  1  1930 Uruguay Montevid… Grou… Mexico        4       1 H       <NA>    Mexico 
##  2  1930 Uruguay Montevid… Grou… Mexico        4       1 H       <NA>    Mexico 
##  3  1930 Uruguay Montevid… Grou… United…       0       3 A       <NA>    Belgium
##  4  1930 Uruguay Montevid… Grou… United…       0       3 A       <NA>    Belgium
##  5  1930 Uruguay Montevid… Grou… Yugosl…       1       2 A       <NA>    Brazil 
##  6  1930 Uruguay Montevid… Grou… Yugosl…       1       2 A       <NA>    Brazil 
##  7  1930 Uruguay Montevid… Grou… Romania       1       3 A       <NA>    Peru   
##  8  1930 Uruguay Montevid… Grou… Romania       1       3 A       <NA>    Peru   
##  9  1930 Uruguay Montevid… Grou… France        1       0 H       <NA>    France 
## 10  1930 Uruguay Montevid… Grou… France        1       0 H       <NA>    France 
## # … with 1,790 more rows, 5 more variables: date <date>, month <chr>,
## #   dayofweek <chr>, name <chr>, value <chr>, and abbreviated variable names
## #   ¹​away_team, ²​home_score, ³​away_score, ⁴​win_conditions, ⁵​losing_team

wide to long form

data %>%
    pivot_wider(names_from = winning_team, values_from = outcome)
## # A tibble: 900 × 80
##     year country city      stage home_…¹ away_…² home_…³ away_…⁴ win_c…⁵ losin…⁶
##    <dbl> <chr>   <chr>     <chr> <chr>   <chr>     <dbl>   <dbl> <chr>   <chr>  
##  1  1930 Uruguay Montevid… Grou… France  Mexico        4       1 <NA>    Mexico 
##  2  1930 Uruguay Montevid… Grou… Belgium United…       0       3 <NA>    Belgium
##  3  1930 Uruguay Montevid… Grou… Brazil  Yugosl…       1       2 <NA>    Brazil 
##  4  1930 Uruguay Montevid… Grou… Peru    Romania       1       3 <NA>    Peru   
##  5  1930 Uruguay Montevid… Grou… Argent… France        1       0 <NA>    France 
##  6  1930 Uruguay Montevid… Grou… Chile   Mexico        3       0 <NA>    Mexico 
##  7  1930 Uruguay Montevid… Grou… Bolivia Yugosl…       0       4 <NA>    Bolivia
##  8  1930 Uruguay Montevid… Grou… Paragu… United…       0       3 <NA>    Paragu…
##  9  1930 Uruguay Montevid… Grou… Uruguay Peru          1       0 <NA>    Peru   
## 10  1930 Uruguay Montevid… Grou… Argent… Mexico        6       3 <NA>    Mexico 
## # … with 890 more rows, 70 more variables: date <date>, month <chr>,
## #   dayofweek <chr>, France <chr>, `United States` <chr>, Yugoslavia <chr>,
## #   Romania <chr>, Argentina <chr>, Chile <chr>, Uruguay <chr>, Paraguay <chr>,
## #   Brazil <chr>, Sweden <chr>, Austria <chr>, Germany <chr>, Spain <chr>,
## #   Czechoslovakia <chr>, Hungary <chr>, Italy <chr>, Switzerland <chr>,
## #   `NA` <chr>, `West Germany` <chr>, Cuba <chr>, England <chr>, Turkey <chr>,
## #   `Northern Ireland` <chr>, `Soviet Union` <chr>, Wales <chr>, …

Separating and Uniting

Separate a column

data %>%
    
    separate(col = year, into = c("country", "month"))
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 900 rows [1, 2, 3, 4, 5,
## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## # A tibble: 900 × 14
##    country month city      stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵
##    <chr>   <chr> <chr>     <chr> <chr>   <chr>     <dbl>   <dbl> <chr>   <chr>  
##  1 1930    <NA>  Montevid… Grou… France  Mexico        4       1 H       <NA>   
##  2 1930    <NA>  Montevid… Grou… Belgium United…       0       3 A       <NA>   
##  3 1930    <NA>  Montevid… Grou… Brazil  Yugosl…       1       2 A       <NA>   
##  4 1930    <NA>  Montevid… Grou… Peru    Romania       1       3 A       <NA>   
##  5 1930    <NA>  Montevid… Grou… Argent… France        1       0 H       <NA>   
##  6 1930    <NA>  Montevid… Grou… Chile   Mexico        3       0 H       <NA>   
##  7 1930    <NA>  Montevid… Grou… Bolivia Yugosl…       0       4 A       <NA>   
##  8 1930    <NA>  Montevid… Grou… Paragu… United…       0       3 A       <NA>   
##  9 1930    <NA>  Montevid… Grou… Uruguay Peru          1       0 H       <NA>   
## 10 1930    <NA>  Montevid… Grou… Argent… Mexico        6       3 H       <NA>   
## # … with 890 more rows, 4 more variables: winning_team <chr>,
## #   losing_team <chr>, date <date>, dayofweek <chr>, and abbreviated variable
## #   names ¹​home_team, ²​away_team, ³​home_score, ⁴​away_score, ⁵​win_conditions

Unite two columns

data %>%
    
    unite(col = "stage", (home_team:outcome), sep = "/", )
## # A tibble: 900 × 10
##     year country city     stage win_c…¹ winni…² losin…³ date       month dayof…⁴
##    <dbl> <chr>   <chr>    <chr> <chr>   <chr>   <chr>   <date>     <chr> <chr>  
##  1  1930 Uruguay Montevi… Fran… <NA>    France  Mexico  1930-07-13 Jul   Sunday 
##  2  1930 Uruguay Montevi… Belg… <NA>    United… Belgium 1930-07-13 Jul   Sunday 
##  3  1930 Uruguay Montevi… Braz… <NA>    Yugosl… Brazil  1930-07-14 Jul   Monday 
##  4  1930 Uruguay Montevi… Peru… <NA>    Romania Peru    1930-07-14 Jul   Monday 
##  5  1930 Uruguay Montevi… Arge… <NA>    Argent… France  1930-07-15 Jul   Tuesday
##  6  1930 Uruguay Montevi… Chil… <NA>    Chile   Mexico  1930-07-16 Jul   Wednes…
##  7  1930 Uruguay Montevi… Boli… <NA>    Yugosl… Bolivia 1930-07-17 Jul   Thursd…
##  8  1930 Uruguay Montevi… Para… <NA>    United… Paragu… 1930-07-17 Jul   Thursd…
##  9  1930 Uruguay Montevi… Urug… <NA>    Uruguay Peru    1930-07-18 Jul   Friday 
## 10  1930 Uruguay Montevi… Arge… <NA>    Argent… Mexico  1930-07-19 Jul   Saturd…
## # … with 890 more rows, and abbreviated variable names ¹​win_conditions,
## #   ²​winning_team, ³​losing_team, ⁴​dayofweek

Missing Values

data %>%
    pivot_wider(names_from = year, values_from = date)
## # A tibble: 900 × 34
##    country city    stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵ winni…⁶
##    <chr>   <chr>   <chr> <chr>   <chr>     <dbl>   <dbl> <chr>   <chr>   <chr>  
##  1 Uruguay Montev… Grou… France  Mexico        4       1 H       <NA>    France 
##  2 Uruguay Montev… Grou… Belgium United…       0       3 A       <NA>    United…
##  3 Uruguay Montev… Grou… Brazil  Yugosl…       1       2 A       <NA>    Yugosl…
##  4 Uruguay Montev… Grou… Peru    Romania       1       3 A       <NA>    Romania
##  5 Uruguay Montev… Grou… Argent… France        1       0 H       <NA>    Argent…
##  6 Uruguay Montev… Grou… Chile   Mexico        3       0 H       <NA>    Chile  
##  7 Uruguay Montev… Grou… Bolivia Yugosl…       0       4 A       <NA>    Yugosl…
##  8 Uruguay Montev… Grou… Paragu… United…       0       3 A       <NA>    United…
##  9 Uruguay Montev… Grou… Uruguay Peru          1       0 H       <NA>    Uruguay
## 10 Uruguay Montev… Grou… Argent… Mexico        6       3 H       <NA>    Argent…
## # … with 890 more rows, 24 more variables: losing_team <chr>, month <chr>,
## #   dayofweek <chr>, `1930` <date>, `1934` <date>, `1938` <date>,
## #   `1950` <date>, `1954` <date>, `1958` <date>, `1962` <date>, `1966` <date>,
## #   `1970` <date>, `1974` <date>, `1978` <date>, `1982` <date>, `1986` <date>,
## #   `1990` <date>, `1994` <date>, `1998` <date>, `2002` <date>, `2006` <date>,
## #   `2010` <date>, `2014` <date>, `2018` <date>, and abbreviated variable names
## #   ¹​home_team, ²​away_team, ³​home_score, ⁴​away_score, ⁵​win_conditions, …

The World Cup has been held every 4 years starting in 1930. By looking at this chart we can see that they did not hold the World Cup in 1942 and 1946, this is due to the fact that World War II was happening during these years.