Import two related datasets from TidyTuesday Project.
wcmatches <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-29/wcmatches.csv')
## Rows: 900 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): country, city, stage, home_team, away_team, outcome, win_conditio...
## dbl (3): year, home_score, away_score
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
worldcups <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-29/worldcups.csv')
## Rows: 21 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): host, winner, second, third, fourth
## dbl (5): year, goals_scored, teams, games, attendance
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: wc matches
Data 2: world cups
set.seed(1234)
wcmatches_small <- wcmatches %>% select(year, winning_team, home_score, away_score) %>% sample_n(10)
worldcups_small <- worldcups %>% select(year,winner,second) %>% sample_n(10)
wcmatches_small
## # A tibble: 10 × 4
## year winning_team home_score away_score
## <dbl> <chr> <dbl> <dbl>
## 1 1978 <NA> 0 0
## 2 2018 Sweden 1 0
## 3 1954 West Germany 3 2
## 4 2002 <NA> 1 1
## 5 2006 Germany 4 2
## 6 1986 Brazil 4 0
## 7 1954 West Germany 1 6
## 8 1958 Brazil 0 3
## 9 2010 Argentina 4 1
## 10 2002 Spain 3 1
worldcups_small
## # A tibble: 10 × 3
## year winner second
## <dbl> <chr> <chr>
## 1 1958 Brazil Sweden
## 2 1994 Brazil Italy
## 3 1990 West Germany Argentina
## 4 2010 Spain Netherlands
## 5 1950 Uruguay Brazil
## 6 2002 Brazil Germany
## 7 1954 West Germany Hungary
## 8 1966 England West Germany
## 9 1998 France Brazil
## 10 2006 Italy France
Describe the resulting data: Combining data from points where they overlap with eachother.
How is it different from the original two datasets?
This only shows the data from points where they overlap.
wcmatches_small %>% inner_join(worldcups_small)
## Joining with `by = join_by(year)`
## # A tibble: 7 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1954 West Germany 3 2 West Germany Hungary
## 2 2002 <NA> 1 1 Brazil Germany
## 3 2006 Germany 4 2 Italy France
## 4 1954 West Germany 1 6 West Germany Hungary
## 5 1958 Brazil 0 3 Brazil Sweden
## 6 2010 Argentina 4 1 Spain Netherlands
## 7 2002 Spain 3 1 Brazil Germany
Describe the resulting data:
How is it different from the original two datasets?
wcmatches_small %>% left_join(worldcups_small)
## Joining with `by = join_by(year)`
## # A tibble: 10 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1978 <NA> 0 0 <NA> <NA>
## 2 2018 Sweden 1 0 <NA> <NA>
## 3 1954 West Germany 3 2 West Germany Hungary
## 4 2002 <NA> 1 1 Brazil Germany
## 5 2006 Germany 4 2 Italy France
## 6 1986 Brazil 4 0 <NA> <NA>
## 7 1954 West Germany 1 6 West Germany Hungary
## 8 1958 Brazil 0 3 Brazil Sweden
## 9 2010 Argentina 4 1 Spain Netherlands
## 10 2002 Spain 3 1 Brazil Germany
Describe the resulting data:
How is it different from the original two datasets?
wcmatches_small %>% right_join(worldcups_small)
## Joining with `by = join_by(year)`
## # A tibble: 12 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1954 West Germany 3 2 West Germany Hungary
## 2 2002 <NA> 1 1 Brazil Germany
## 3 2006 Germany 4 2 Italy France
## 4 1954 West Germany 1 6 West Germany Hungary
## 5 1958 Brazil 0 3 Brazil Sweden
## 6 2010 Argentina 4 1 Spain Netherlands
## 7 2002 Spain 3 1 Brazil Germany
## 8 1994 <NA> NA NA Brazil Italy
## 9 1990 <NA> NA NA West Germany Argentina
## 10 1950 <NA> NA NA Uruguay Brazil
## 11 1966 <NA> NA NA England West Germany
## 12 1998 <NA> NA NA France Brazil
Describe the resulting data:
How is it different from the original two datasets?
wcmatches_small %>% full_join(worldcups_small)
## Joining with `by = join_by(year)`
## # A tibble: 15 × 6
## year winning_team home_score away_score winner second
## <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1978 <NA> 0 0 <NA> <NA>
## 2 2018 Sweden 1 0 <NA> <NA>
## 3 1954 West Germany 3 2 West Germany Hungary
## 4 2002 <NA> 1 1 Brazil Germany
## 5 2006 Germany 4 2 Italy France
## 6 1986 Brazil 4 0 <NA> <NA>
## 7 1954 West Germany 1 6 West Germany Hungary
## 8 1958 Brazil 0 3 Brazil Sweden
## 9 2010 Argentina 4 1 Spain Netherlands
## 10 2002 Spain 3 1 Brazil Germany
## 11 1994 <NA> NA NA Brazil Italy
## 12 1990 <NA> NA NA West Germany Argentina
## 13 1950 <NA> NA NA Uruguay Brazil
## 14 1966 <NA> NA NA England West Germany
## 15 1998 <NA> NA NA France Brazil
Describe the resulting data:
How is it different from the original two datasets?
wcmatches_small %>% semi_join(worldcups_small)
## Joining with `by = join_by(year)`
## # A tibble: 7 × 4
## year winning_team home_score away_score
## <dbl> <chr> <dbl> <dbl>
## 1 1954 West Germany 3 2
## 2 2002 <NA> 1 1
## 3 2006 Germany 4 2
## 4 1954 West Germany 1 6
## 5 1958 Brazil 0 3
## 6 2010 Argentina 4 1
## 7 2002 Spain 3 1
Describe the resulting data:
How is it different from the original two datasets?
wcmatches_small %>% anti_join(worldcups_small)
## Joining with `by = join_by(year)`
## # A tibble: 3 × 4
## year winning_team home_score away_score
## <dbl> <chr> <dbl> <dbl>
## 1 1978 <NA> 0 0
## 2 2018 Sweden 1 0
## 3 1986 Brazil 4 0
worldcups_small %>% anti_join(wcmatches_small)
## Joining with `by = join_by(year)`
## # A tibble: 5 × 3
## year winner second
## <dbl> <chr> <chr>
## 1 1994 Brazil Italy
## 2 1990 West Germany Argentina
## 3 1950 Uruguay Brazil
## 4 1966 England West Germany
## 5 1998 France Brazil