data <- read_csv("../00_data/myData.csv")
## Rows: 900 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): country, city, stage, home_team, away_team, outcome, win_conditio...
## dbl (3): year, home_score, away_score
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data
## # A tibble: 900 × 15
## year country city stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 1930 Uruguay Montevid… Grou… France Mexico 4 1 H <NA>
## 2 1930 Uruguay Montevid… Grou… Belgium United… 0 3 A <NA>
## 3 1930 Uruguay Montevid… Grou… Brazil Yugosl… 1 2 A <NA>
## 4 1930 Uruguay Montevid… Grou… Peru Romania 1 3 A <NA>
## 5 1930 Uruguay Montevid… Grou… Argent… France 1 0 H <NA>
## 6 1930 Uruguay Montevid… Grou… Chile Mexico 3 0 H <NA>
## 7 1930 Uruguay Montevid… Grou… Bolivia Yugosl… 0 4 A <NA>
## 8 1930 Uruguay Montevid… Grou… Paragu… United… 0 3 A <NA>
## 9 1930 Uruguay Montevid… Grou… Uruguay Peru 1 0 H <NA>
## 10 1930 Uruguay Montevid… Grou… Argent… Mexico 6 3 H <NA>
## # … with 890 more rows, 5 more variables: winning_team <chr>,
## # losing_team <chr>, date <date>, month <chr>, dayofweek <chr>, and
## # abbreviated variable names ¹home_team, ²away_team, ³home_score,
## # ⁴away_score, ⁵win_conditions
data %>%
pivot_longer(cols = c(`home_team`, `winning_team`))
## # A tibble: 1,800 × 15
## year country city stage away_…¹ home_…² away_…³ outcome win_c…⁴ losin…⁵
## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 1930 Uruguay Montevid… Grou… Mexico 4 1 H <NA> Mexico
## 2 1930 Uruguay Montevid… Grou… Mexico 4 1 H <NA> Mexico
## 3 1930 Uruguay Montevid… Grou… United… 0 3 A <NA> Belgium
## 4 1930 Uruguay Montevid… Grou… United… 0 3 A <NA> Belgium
## 5 1930 Uruguay Montevid… Grou… Yugosl… 1 2 A <NA> Brazil
## 6 1930 Uruguay Montevid… Grou… Yugosl… 1 2 A <NA> Brazil
## 7 1930 Uruguay Montevid… Grou… Romania 1 3 A <NA> Peru
## 8 1930 Uruguay Montevid… Grou… Romania 1 3 A <NA> Peru
## 9 1930 Uruguay Montevid… Grou… France 1 0 H <NA> France
## 10 1930 Uruguay Montevid… Grou… France 1 0 H <NA> France
## # … with 1,790 more rows, 5 more variables: date <date>, month <chr>,
## # dayofweek <chr>, name <chr>, value <chr>, and abbreviated variable names
## # ¹away_team, ²home_score, ³away_score, ⁴win_conditions, ⁵losing_team
data %>%
pivot_wider(names_from = winning_team, values_from = outcome)
## # A tibble: 900 × 80
## year country city stage home_…¹ away_…² home_…³ away_…⁴ win_c…⁵ losin…⁶
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 1930 Uruguay Montevid… Grou… France Mexico 4 1 <NA> Mexico
## 2 1930 Uruguay Montevid… Grou… Belgium United… 0 3 <NA> Belgium
## 3 1930 Uruguay Montevid… Grou… Brazil Yugosl… 1 2 <NA> Brazil
## 4 1930 Uruguay Montevid… Grou… Peru Romania 1 3 <NA> Peru
## 5 1930 Uruguay Montevid… Grou… Argent… France 1 0 <NA> France
## 6 1930 Uruguay Montevid… Grou… Chile Mexico 3 0 <NA> Mexico
## 7 1930 Uruguay Montevid… Grou… Bolivia Yugosl… 0 4 <NA> Bolivia
## 8 1930 Uruguay Montevid… Grou… Paragu… United… 0 3 <NA> Paragu…
## 9 1930 Uruguay Montevid… Grou… Uruguay Peru 1 0 <NA> Peru
## 10 1930 Uruguay Montevid… Grou… Argent… Mexico 6 3 <NA> Mexico
## # … with 890 more rows, 70 more variables: date <date>, month <chr>,
## # dayofweek <chr>, France <chr>, `United States` <chr>, Yugoslavia <chr>,
## # Romania <chr>, Argentina <chr>, Chile <chr>, Uruguay <chr>, Paraguay <chr>,
## # Brazil <chr>, Sweden <chr>, Austria <chr>, Germany <chr>, Spain <chr>,
## # Czechoslovakia <chr>, Hungary <chr>, Italy <chr>, Switzerland <chr>,
## # `NA` <chr>, `West Germany` <chr>, Cuba <chr>, England <chr>, Turkey <chr>,
## # `Northern Ireland` <chr>, `Soviet Union` <chr>, Wales <chr>, …
data %>%
separate(col = year, into = c("country", "month"))
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 900 rows [1, 2, 3, 4, 5,
## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## # A tibble: 900 × 14
## country month city stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 1930 <NA> Montevid… Grou… France Mexico 4 1 H <NA>
## 2 1930 <NA> Montevid… Grou… Belgium United… 0 3 A <NA>
## 3 1930 <NA> Montevid… Grou… Brazil Yugosl… 1 2 A <NA>
## 4 1930 <NA> Montevid… Grou… Peru Romania 1 3 A <NA>
## 5 1930 <NA> Montevid… Grou… Argent… France 1 0 H <NA>
## 6 1930 <NA> Montevid… Grou… Chile Mexico 3 0 H <NA>
## 7 1930 <NA> Montevid… Grou… Bolivia Yugosl… 0 4 A <NA>
## 8 1930 <NA> Montevid… Grou… Paragu… United… 0 3 A <NA>
## 9 1930 <NA> Montevid… Grou… Uruguay Peru 1 0 H <NA>
## 10 1930 <NA> Montevid… Grou… Argent… Mexico 6 3 H <NA>
## # … with 890 more rows, 4 more variables: winning_team <chr>,
## # losing_team <chr>, date <date>, dayofweek <chr>, and abbreviated variable
## # names ¹home_team, ²away_team, ³home_score, ⁴away_score, ⁵win_conditions
data %>%
unite(col = "stage", (home_team:outcome), sep = "/", )
## # A tibble: 900 × 10
## year country city stage win_c…¹ winni…² losin…³ date month dayof…⁴
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <date> <chr> <chr>
## 1 1930 Uruguay Montevi… Fran… <NA> France Mexico 1930-07-13 Jul Sunday
## 2 1930 Uruguay Montevi… Belg… <NA> United… Belgium 1930-07-13 Jul Sunday
## 3 1930 Uruguay Montevi… Braz… <NA> Yugosl… Brazil 1930-07-14 Jul Monday
## 4 1930 Uruguay Montevi… Peru… <NA> Romania Peru 1930-07-14 Jul Monday
## 5 1930 Uruguay Montevi… Arge… <NA> Argent… France 1930-07-15 Jul Tuesday
## 6 1930 Uruguay Montevi… Chil… <NA> Chile Mexico 1930-07-16 Jul Wednes…
## 7 1930 Uruguay Montevi… Boli… <NA> Yugosl… Bolivia 1930-07-17 Jul Thursd…
## 8 1930 Uruguay Montevi… Para… <NA> United… Paragu… 1930-07-17 Jul Thursd…
## 9 1930 Uruguay Montevi… Urug… <NA> Uruguay Peru 1930-07-18 Jul Friday
## 10 1930 Uruguay Montevi… Arge… <NA> Argent… Mexico 1930-07-19 Jul Saturd…
## # … with 890 more rows, and abbreviated variable names ¹win_conditions,
## # ²winning_team, ³losing_team, ⁴dayofweek
data %>%
pivot_wider(names_from = year, values_from = date)
## # A tibble: 900 × 34
## country city stage home_…¹ away_…² home_…³ away_…⁴ outcome win_c…⁵ winni…⁶
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 Uruguay Montev… Grou… France Mexico 4 1 H <NA> France
## 2 Uruguay Montev… Grou… Belgium United… 0 3 A <NA> United…
## 3 Uruguay Montev… Grou… Brazil Yugosl… 1 2 A <NA> Yugosl…
## 4 Uruguay Montev… Grou… Peru Romania 1 3 A <NA> Romania
## 5 Uruguay Montev… Grou… Argent… France 1 0 H <NA> Argent…
## 6 Uruguay Montev… Grou… Chile Mexico 3 0 H <NA> Chile
## 7 Uruguay Montev… Grou… Bolivia Yugosl… 0 4 A <NA> Yugosl…
## 8 Uruguay Montev… Grou… Paragu… United… 0 3 A <NA> United…
## 9 Uruguay Montev… Grou… Uruguay Peru 1 0 H <NA> Uruguay
## 10 Uruguay Montev… Grou… Argent… Mexico 6 3 H <NA> Argent…
## # … with 890 more rows, 24 more variables: losing_team <chr>, month <chr>,
## # dayofweek <chr>, `1930` <date>, `1934` <date>, `1938` <date>,
## # `1950` <date>, `1954` <date>, `1958` <date>, `1962` <date>, `1966` <date>,
## # `1970` <date>, `1974` <date>, `1978` <date>, `1982` <date>, `1986` <date>,
## # `1990` <date>, `1994` <date>, `1998` <date>, `2002` <date>, `2006` <date>,
## # `2010` <date>, `2014` <date>, `2018` <date>, and abbreviated variable names
## # ¹home_team, ²away_team, ³home_score, ⁴away_score, ⁵win_conditions, …
The World Cup has been held every 4 years starting in 1930. By looking at this chart we can see that they did not hold the World Cup in 1942 and 1946, this is due to the fact that World War II was happening during these years.