Import two related datasets from TidyTuesday Project.
team_results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-03-26/team-results.csv')
## Rows: 236 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): TEAM, F4PERCENT, CHAMPPERCENT
## dbl (17): TEAMID, PAKE, PAKERANK, PASE, PASERANK, GAMES, W, L, WINPERCENT, R...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
public_picks <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-03-26/public-picks.csv')
## Rows: 64 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): TEAM, R64, R32, S16, E8, F4, FINALS
## dbl (2): YEAR, TEAMNO
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: public_picks
Data 2: team_results
team_results %>% select(TEAM, TEAMID) %>% sample_n(30)
## # A tibble: 30 × 2
## TEAM TEAMID
## <chr> <dbl>
## 1 Miami FL 113
## 2 UNC Greensboro 215
## 3 Saint Peter's 176
## 4 Holy Cross 75
## 5 Wisconsin 240
## 6 Seton Hall 181
## 7 BYU 25
## 8 Saint Joseph's 173
## 9 UTSA 223
## 10 Norfolk St. 134
## # ℹ 20 more rows
team_results_small <- team_results %>% select(TEAM, TEAMID) %>% sample_n(30)
public_picks %>% select(TEAM, YEAR) %>% sample_n(30)
## # A tibble: 30 × 2
## TEAM YEAR
## <chr> <dbl>
## 1 Stetson 2024
## 2 Clemson 2024
## 3 Alabama 2024
## 4 Iowa St. 2024
## 5 Saint Mary's 2024
## 6 TCU 2024
## 7 Kentucky 2024
## 8 Colgate 2024
## 9 Kansas 2024
## 10 Vermont 2024
## # ℹ 20 more rows
public_picks_small <- public_picks %>% select(TEAM, YEAR) %>% sample_n(30)
team_results_small
## # A tibble: 30 × 2
## TEAM TEAMID
## <chr> <dbl>
## 1 Lafayette 92
## 2 Morehead St. 125
## 3 Hampton 71
## 4 Dayton 45
## 5 Davidson 44
## 6 Princeton 165
## 7 Norfolk St. 134
## 8 Xavier 244
## 9 Wofford 241
## 10 Kent St. 89
## # ℹ 20 more rows
public_picks_small
## # A tibble: 30 × 2
## TEAM YEAR
## <chr> <dbl>
## 1 Saint Mary's 2024
## 2 McNeese St. 2024
## 3 Wisconsin 2024
## 4 South Dakota St. 2024
## 5 Kansas 2024
## 6 UAB 2024
## 7 Oakland 2024
## 8 Florida Atlantic 2024
## 9 Iowa St. 2024
## 10 Longwood 2024
## # ℹ 20 more rows
Describe the resulting data:
How is it different from the original two datasets?
team_results_small %>% inner_join(public_picks_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 4 × 3
## TEAM TEAMID YEAR
## <chr> <dbl> <dbl>
## 1 Morehead St. 125 2024
## 2 Dayton 45 2024
## 3 Creighton 43 2024
## 4 College of Charleston 37 2024
Describe the resulting data:
How is it different from the original two datasets?
left_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 30 × 3
## TEAM YEAR TEAMID
## <chr> <dbl> <dbl>
## 1 Saint Mary's 2024 NA
## 2 McNeese St. 2024 NA
## 3 Wisconsin 2024 NA
## 4 South Dakota St. 2024 NA
## 5 Kansas 2024 NA
## 6 UAB 2024 NA
## 7 Oakland 2024 NA
## 8 Florida Atlantic 2024 NA
## 9 Iowa St. 2024 NA
## 10 Longwood 2024 NA
## # ℹ 20 more rows
Describe the resulting data:
How is it different from the original two datasets?
right_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 30 × 3
## TEAM YEAR TEAMID
## <chr> <dbl> <dbl>
## 1 Dayton 2024 45
## 2 Morehead St. 2024 125
## 3 College of Charleston 2024 37
## 4 Creighton 2024 43
## 5 Lafayette NA 92
## 6 Hampton NA 71
## 7 Davidson NA 44
## 8 Princeton NA 165
## 9 Norfolk St. NA 134
## 10 Xavier NA 244
## # ℹ 20 more rows
Describe the resulting data:
How is it different from the original two datasets?
full_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 56 × 3
## TEAM YEAR TEAMID
## <chr> <dbl> <dbl>
## 1 Saint Mary's 2024 NA
## 2 McNeese St. 2024 NA
## 3 Wisconsin 2024 NA
## 4 South Dakota St. 2024 NA
## 5 Kansas 2024 NA
## 6 UAB 2024 NA
## 7 Oakland 2024 NA
## 8 Florida Atlantic 2024 NA
## 9 Iowa St. 2024 NA
## 10 Longwood 2024 NA
## # ℹ 46 more rows
Describe the resulting data:
How is it different from the original two datasets?
semi_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 4 × 2
## TEAM YEAR
## <chr> <dbl>
## 1 Dayton 2024
## 2 Morehead St. 2024
## 3 College of Charleston 2024
## 4 Creighton 2024
Describe the resulting data:
How is it different from the original two datasets?
anti_join(public_picks_small, team_results_small)
## Joining with `by = join_by(TEAM)`
## # A tibble: 26 × 2
## TEAM YEAR
## <chr> <dbl>
## 1 Saint Mary's 2024
## 2 McNeese St. 2024
## 3 Wisconsin 2024
## 4 South Dakota St. 2024
## 5 Kansas 2024
## 6 UAB 2024
## 7 Oakland 2024
## 8 Florida Atlantic 2024
## 9 Iowa St. 2024
## 10 Longwood 2024
## # ℹ 16 more rows