1. Import your data

Import two related datasets from TidyTuesday Project.

team_results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-03-26/team-results.csv')
## Rows: 236 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): TEAM, F4PERCENT, CHAMPPERCENT
## dbl (17): TEAMID, PAKE, PAKERANK, PASE, PASERANK, GAMES, W, L, WINPERCENT, R...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
public_picks <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-03-26/public-picks.csv')
## Rows: 64 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): TEAM, R64, R32, S16, E8, F4, FINALS
## dbl (2): YEAR, TEAMNO
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: public_picks

Data 2: team_results

team_results %>% select(TEAM, TEAMID) %>% sample_n(30)
## # A tibble: 30 × 2
##    TEAM           TEAMID
##    <chr>           <dbl>
##  1 Miami FL          113
##  2 UNC Greensboro    215
##  3 Saint Peter's     176
##  4 Holy Cross         75
##  5 Wisconsin         240
##  6 Seton Hall        181
##  7 BYU                25
##  8 Saint Joseph's    173
##  9 UTSA              223
## 10 Norfolk St.       134
## # ℹ 20 more rows
team_results_small <- team_results %>% select(TEAM, TEAMID) %>% sample_n(30)

public_picks %>% select(TEAM, YEAR) %>% sample_n(30)
## # A tibble: 30 × 2
##    TEAM          YEAR
##    <chr>        <dbl>
##  1 Stetson       2024
##  2 Clemson       2024
##  3 Alabama       2024
##  4 Iowa St.      2024
##  5 Saint Mary's  2024
##  6 TCU           2024
##  7 Kentucky      2024
##  8 Colgate       2024
##  9 Kansas        2024
## 10 Vermont       2024
## # ℹ 20 more rows
public_picks_small <- public_picks %>% select(TEAM, YEAR) %>% sample_n(30)

team_results_small
## # A tibble: 30 × 2
##    TEAM         TEAMID
##    <chr>         <dbl>
##  1 Lafayette        92
##  2 Morehead St.    125
##  3 Hampton          71
##  4 Dayton           45
##  5 Davidson         44
##  6 Princeton       165
##  7 Norfolk St.     134
##  8 Xavier          244
##  9 Wofford         241
## 10 Kent St.         89
## # ℹ 20 more rows
public_picks_small
## # A tibble: 30 × 2
##    TEAM              YEAR
##    <chr>            <dbl>
##  1 Saint Mary's      2024
##  2 McNeese St.       2024
##  3 Wisconsin         2024
##  4 South Dakota St.  2024
##  5 Kansas            2024
##  6 UAB               2024
##  7 Oakland           2024
##  8 Florida Atlantic  2024
##  9 Iowa St.          2024
## 10 Longwood          2024
## # ℹ 20 more rows

3. inner_join

Describe the resulting data:

How is it different from the original two datasets?

team_results_small %>% inner_join(public_picks_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 4 × 3
##   TEAM                  TEAMID  YEAR
##   <chr>                  <dbl> <dbl>
## 1 Morehead St.             125  2024
## 2 Dayton                    45  2024
## 3 Creighton                 43  2024
## 4 College of Charleston     37  2024

4. left_join

Describe the resulting data:

How is it different from the original two datasets?

left_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 30 × 3
##    TEAM              YEAR TEAMID
##    <chr>            <dbl>  <dbl>
##  1 Saint Mary's      2024     NA
##  2 McNeese St.       2024     NA
##  3 Wisconsin         2024     NA
##  4 South Dakota St.  2024     NA
##  5 Kansas            2024     NA
##  6 UAB               2024     NA
##  7 Oakland           2024     NA
##  8 Florida Atlantic  2024     NA
##  9 Iowa St.          2024     NA
## 10 Longwood          2024     NA
## # ℹ 20 more rows

5. right_join

Describe the resulting data:

How is it different from the original two datasets?

right_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 30 × 3
##    TEAM                   YEAR TEAMID
##    <chr>                 <dbl>  <dbl>
##  1 Dayton                 2024     45
##  2 Morehead St.           2024    125
##  3 College of Charleston  2024     37
##  4 Creighton              2024     43
##  5 Lafayette                NA     92
##  6 Hampton                  NA     71
##  7 Davidson                 NA     44
##  8 Princeton                NA    165
##  9 Norfolk St.              NA    134
## 10 Xavier                   NA    244
## # ℹ 20 more rows

6. full_join

Describe the resulting data:

How is it different from the original two datasets?

full_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 56 × 3
##    TEAM              YEAR TEAMID
##    <chr>            <dbl>  <dbl>
##  1 Saint Mary's      2024     NA
##  2 McNeese St.       2024     NA
##  3 Wisconsin         2024     NA
##  4 South Dakota St.  2024     NA
##  5 Kansas            2024     NA
##  6 UAB               2024     NA
##  7 Oakland           2024     NA
##  8 Florida Atlantic  2024     NA
##  9 Iowa St.          2024     NA
## 10 Longwood          2024     NA
## # ℹ 46 more rows

7. semi_join

Describe the resulting data:

How is it different from the original two datasets?

semi_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 4 × 2
##   TEAM                   YEAR
##   <chr>                 <dbl>
## 1 Dayton                 2024
## 2 Morehead St.           2024
## 3 College of Charleston  2024
## 4 Creighton              2024

8. anti_join

Describe the resulting data:

How is it different from the original two datasets?

anti_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 26 × 2
##    TEAM              YEAR
##    <chr>            <dbl>
##  1 Saint Mary's      2024
##  2 McNeese St.       2024
##  3 Wisconsin         2024
##  4 South Dakota St.  2024
##  5 Kansas            2024
##  6 UAB               2024
##  7 Oakland           2024
##  8 Florida Atlantic  2024
##  9 Iowa St.          2024
## 10 Longwood          2024
## # ℹ 16 more rows