1. Import your data

Import two related datasets from TidyTuesday Project.

attendance <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-02-04/attendance.csv')
## Rows: 10846 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): team, team_name
## dbl (6): year, total, home, away, week, weekly_attendance
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
standings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-02-04/standings.csv')
## Rows: 638 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): team, team_name, playoffs, sb_winner
## dbl (11): year, wins, loss, points_for, points_against, points_differential,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: attendance_small

Data 2: standings_small

set.seed(1234)
attendance_small <- attendance %>% select(team_name, year, week) %>% sample_n(10)
standings_small <- standings %>% select(team, year, wins) %>% sample_n(10)

attendance_small
## # A tibble: 10 × 3
##    team_name   year  week
##    <chr>      <dbl> <dbl>
##  1 Steelers    2013     6
##  2 Chargers    2014     9
##  3 Browns      2013     5
##  4 Buccaneers  2014    11
##  5 Colts       2013    10
##  6 Titans      2016    16
##  7 Bears       2001    11
##  8 Steelers    2001    16
##  9 Chiefs      2005     7
## 10 Cardinals   2004     4
standings_small
## # A tibble: 10 × 3
##    team           year  wins
##    <chr>         <dbl> <dbl>
##  1 Indianapolis   2003    12
##  2 Tampa Bay      2018     5
##  3 Cincinnati     2010     4
##  4 Philadelphia   2002    12
##  5 Kansas City    2008     2
##  6 St. Louis      2011     2
##  7 Carolina       2005    11
##  8 San Francisco  2017     6
##  9 Buffalo        2000     8
## 10 Tennessee      2017     9

3. inner_join

Describe the resulting data:

How is it different from the original two datasets?

attendance_small %>% inner_join(standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 1 × 5
##   team_name  year  week team      wins
##   <chr>     <dbl> <dbl> <chr>    <dbl>
## 1 Chiefs     2005     7 Carolina    11

4. left_join

Describe the resulting data:

How is it different from the original two datasets?

left_join(attendance_small, standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 10 × 5
##    team_name   year  week team      wins
##    <chr>      <dbl> <dbl> <chr>    <dbl>
##  1 Steelers    2013     6 <NA>        NA
##  2 Chargers    2014     9 <NA>        NA
##  3 Browns      2013     5 <NA>        NA
##  4 Buccaneers  2014    11 <NA>        NA
##  5 Colts       2013    10 <NA>        NA
##  6 Titans      2016    16 <NA>        NA
##  7 Bears       2001    11 <NA>        NA
##  8 Steelers    2001    16 <NA>        NA
##  9 Chiefs      2005     7 Carolina    11
## 10 Cardinals   2004     4 <NA>        NA

5. right_join

Describe the resulting data:

How is it different from the original two datasets?

right_join(attendance_small, standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 10 × 5
##    team_name  year  week team           wins
##    <chr>     <dbl> <dbl> <chr>         <dbl>
##  1 Chiefs     2005     7 Carolina         11
##  2 <NA>       2003    NA Indianapolis     12
##  3 <NA>       2018    NA Tampa Bay         5
##  4 <NA>       2010    NA Cincinnati        4
##  5 <NA>       2002    NA Philadelphia     12
##  6 <NA>       2008    NA Kansas City       2
##  7 <NA>       2011    NA St. Louis         2
##  8 <NA>       2017    NA San Francisco     6
##  9 <NA>       2000    NA Buffalo           8
## 10 <NA>       2017    NA Tennessee         9

6. full_join

Describe the resulting data:

How is it different from the original two datasets?

full_join(standings_small, attendance_small, by = c("year"))
## # A tibble: 19 × 5
##    team           year  wins team_name   week
##    <chr>         <dbl> <dbl> <chr>      <dbl>
##  1 Indianapolis   2003    12 <NA>          NA
##  2 Tampa Bay      2018     5 <NA>          NA
##  3 Cincinnati     2010     4 <NA>          NA
##  4 Philadelphia   2002    12 <NA>          NA
##  5 Kansas City    2008     2 <NA>          NA
##  6 St. Louis      2011     2 <NA>          NA
##  7 Carolina       2005    11 Chiefs         7
##  8 San Francisco  2017     6 <NA>          NA
##  9 Buffalo        2000     8 <NA>          NA
## 10 Tennessee      2017     9 <NA>          NA
## 11 <NA>           2013    NA Steelers       6
## 12 <NA>           2014    NA Chargers       9
## 13 <NA>           2013    NA Browns         5
## 14 <NA>           2014    NA Buccaneers    11
## 15 <NA>           2013    NA Colts         10
## 16 <NA>           2016    NA Titans        16
## 17 <NA>           2001    NA Bears         11
## 18 <NA>           2001    NA Steelers      16
## 19 <NA>           2004    NA Cardinals      4

7. semi_join

Describe the resulting data:

How is it different from the original two datasets?

semi_join(standings_small, attendance_small)
## Joining with `by = join_by(year)`
## # A tibble: 1 × 3
##   team      year  wins
##   <chr>    <dbl> <dbl>
## 1 Carolina  2005    11

8. anti_join

Describe the resulting data:

How is it different from the original two datasets?

anti_join(attendance_small, standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 9 × 3
##   team_name   year  week
##   <chr>      <dbl> <dbl>
## 1 Steelers    2013     6
## 2 Chargers    2014     9
## 3 Browns      2013     5
## 4 Buccaneers  2014    11
## 5 Colts       2013    10
## 6 Titans      2016    16
## 7 Bears       2001    11
## 8 Steelers    2001    16
## 9 Cardinals   2004     4