Import two related datasets from TidyTuesday Project.
World_cup_matches <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-29/wcmatches.csv')
## Rows: 900 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): country, city, stage, home_team, away_team, outcome, win_conditio...
## dbl (3): year, home_score, away_score
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
World_Cups <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-29/worldcups.csv')
## Rows: 21 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): host, winner, second, third, fourth
## dbl (5): year, goals_scored, teams, games, attendance
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data 1: World cup matches
Data 2: Different World cups
set.seed(1234)
World_cup_matches_small <- World_cup_matches %>% select(year, winning_team, home_score, away_score) %>% sample_n(15)
World_Cups_small <- World_Cups %>% select (year, winner, second) %>% sample_n(15)
World_cup_matches_small
## # A tibble: 15 × 4
## year winning_team home_score away_score
## <dbl> <chr> <dbl> <dbl>
## 1 1978 <NA> 0 0
## 2 2018 Sweden 1 0
## 3 1954 West Germany 3 2
## 4 2002 <NA> 1 1
## 5 2006 Germany 4 2
## 6 1986 Brazil 4 0
## 7 1954 West Germany 1 6
## 8 1958 Brazil 0 3
## 9 2010 Argentina 4 1
## 10 2002 Spain 3 1
## 11 1982 Soviet Union 3 0
## 12 1954 Yugoslavia 0 1
## 13 2018 Belgium 0 1
## 14 1974 West Germany 2 1
## 15 1986 Denmark 6 1
World_Cups_small
## # A tibble: 15 × 3
## year winner second
## <dbl> <chr> <chr>
## 1 1950 Uruguay Brazil
## 2 2018 France Croatia
## 3 1966 England West Germany
## 4 1938 Italy Hungary
## 5 2014 Germany Argentina
## 6 1994 Brazil Italy
## 7 1998 France Brazil
## 8 1986 Argentina West Germany
## 9 1974 West Germany Netherlands
## 10 1954 West Germany Hungary
## 11 1934 Italy Czechoslovakia
## 12 2010 Spain Netherlands
## 13 2002 Brazil Germany
## 14 1990 West Germany Argentina
## 15 1970 Brazil Italy
Describe the resulting data: Combine data points from points where they overlap with each other.
How is it different from the original two datasets? This one only shows the data from the points where the two overlap
World_cup_matches_small %>% inner_join(World_Cups_small)
## Joining with `by = join_by(year)`
## # A tibble: 11 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 2018 Sweden 1 0 France Croatia
## 2 1954 West Germany 3 2 West Germany Hungary
## 3 2002 <NA> 1 1 Brazil Germany
## 4 1986 Brazil 4 0 Argentina West Germany
## 5 1954 West Germany 1 6 West Germany Hungary
## 6 2010 Argentina 4 1 Spain Netherlands
## 7 2002 Spain 3 1 Brazil Germany
## 8 1954 Yugoslavia 0 1 West Germany Hungary
## 9 2018 Belgium 0 1 France Croatia
## 10 1974 West Germany 2 1 West Germany Netherlands
## 11 1986 Denmark 6 1 Argentina West Germany
Describe the resulting data:
How is it different from the original two datasets? This one shows a bit more data than the small datasets, but only shows the chosen data from the original datasets.
World_cup_matches_small %>% left_join(World_Cups_small)
## Joining with `by = join_by(year)`
## # A tibble: 15 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1978 <NA> 0 0 <NA> <NA>
## 2 2018 Sweden 1 0 France Croatia
## 3 1954 West Germany 3 2 West Germany Hungary
## 4 2002 <NA> 1 1 Brazil Germany
## 5 2006 Germany 4 2 <NA> <NA>
## 6 1986 Brazil 4 0 Argentina West Germany
## 7 1954 West Germany 1 6 West Germany Hungary
## 8 1958 Brazil 0 3 <NA> <NA>
## 9 2010 Argentina 4 1 Spain Netherlands
## 10 2002 Spain 3 1 Brazil Germany
## 11 1982 Soviet Union 3 0 <NA> <NA>
## 12 1954 Yugoslavia 0 1 West Germany Hungary
## 13 2018 Belgium 0 1 France Croatia
## 14 1974 West Germany 2 1 West Germany Netherlands
## 15 1986 Denmark 6 1 Argentina West Germany
Describe the resulting data:
How is it different from the original two datasets? Same as the left_join
World_cup_matches_small %>% right_join(World_Cups_small)
## Joining with `by = join_by(year)`
## # A tibble: 20 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 2018 Sweden 1 0 France Croatia
## 2 1954 West Germany 3 2 West Germany Hungary
## 3 2002 <NA> 1 1 Brazil Germany
## 4 1986 Brazil 4 0 Argentina West Germany
## 5 1954 West Germany 1 6 West Germany Hungary
## 6 2010 Argentina 4 1 Spain Netherlands
## 7 2002 Spain 3 1 Brazil Germany
## 8 1954 Yugoslavia 0 1 West Germany Hungary
## 9 2018 Belgium 0 1 France Croatia
## 10 1974 West Germany 2 1 West Germany Netherlands
## 11 1986 Denmark 6 1 Argentina West Germany
## 12 1950 <NA> NA NA Uruguay Brazil
## 13 1966 <NA> NA NA England West Germany
## 14 1938 <NA> NA NA Italy Hungary
## 15 2014 <NA> NA NA Germany Argentina
## 16 1994 <NA> NA NA Brazil Italy
## 17 1998 <NA> NA NA France Brazil
## 18 1934 <NA> NA NA Italy Czechoslovakia
## 19 1990 <NA> NA NA West Germany Argentina
## 20 1970 <NA> NA NA Brazil Italy
Describe the resulting data:
How is it different from the original two datasets? The full join shows year, winning team, home score, away score, winner and runner up, which is pretty much all of the data from the original datasets.
World_cup_matches_small %>% full_join(World_Cups_small)
## Joining with `by = join_by(year)`
## # A tibble: 24 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1978 <NA> 0 0 <NA> <NA>
## 2 2018 Sweden 1 0 France Croatia
## 3 1954 West Germany 3 2 West Germany Hungary
## 4 2002 <NA> 1 1 Brazil Germany
## 5 2006 Germany 4 2 <NA> <NA>
## 6 1986 Brazil 4 0 Argentina West Germany
## 7 1954 West Germany 1 6 West Germany Hungary
## 8 1958 Brazil 0 3 <NA> <NA>
## 9 2010 Argentina 4 1 Spain Netherlands
## 10 2002 Spain 3 1 Brazil Germany
## # ℹ 14 more rows
Describe the resulting data:
How is it different from the original two datasets? Only shows 11 columns of the overlapping datasets, coming from year, winning team, home score and away score.
World_cup_matches_small %>% semi_join(World_Cups_small)
## Joining with `by = join_by(year)`
## # A tibble: 11 × 4
## year winning_team home_score away_score
## <dbl> <chr> <dbl> <dbl>
## 1 2018 Sweden 1 0
## 2 1954 West Germany 3 2
## 3 2002 <NA> 1 1
## 4 1986 Brazil 4 0
## 5 1954 West Germany 1 6
## 6 2010 Argentina 4 1
## 7 2002 Spain 3 1
## 8 1954 Yugoslavia 0 1
## 9 2018 Belgium 0 1
## 10 1974 West Germany 2 1
## 11 1986 Denmark 6 1
Describe the resulting data:
How is it different from the original two datasets? Only shows year, winning team, home score and away score
World_cup_matches_small %>% anti_join(World_Cups_small)
## Joining with `by = join_by(year)`
## # A tibble: 4 × 4
## year winning_team home_score away_score
## <dbl> <chr> <dbl> <dbl>
## 1 1978 <NA> 0 0
## 2 2006 Germany 4 2
## 3 1958 Brazil 0 3
## 4 1982 Soviet Union 3 0