1. Import your data

Import two related datasets from TidyTuesday Project.

nhl_rosters <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-09/nhl_rosters.csv')
## Rows: 54883 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (10): team_code, position_type, headshot, first_name, last_name, positi...
## dbl   (7): season, player_id, sweater_number, height_in_inches, weight_in_po...
## date  (1): birth_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nhl_teams <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-01-09/nhl_teams.csv')
## Rows: 59 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): team_code, full_name
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: nhl_rosters

nhl_rosters_small <- nhl_rosters %>% select(team_code, last_name, sweater_number, position_type) %>% sample_n(10)

nhl_rosters_small
## # A tibble: 10 × 4
##    team_code last_name   sweater_number position_type
##    <chr>     <chr>                <dbl> <chr>        
##  1 ATL       Valabik                  5 defensemen   
##  2 PIT       Cunneyworth             19 forwards     
##  3 NYR       Bathgate                 9 forwards     
##  4 PHX       Hudson                  15 forwards     
##  5 BUF       Bogosian                 4 defensemen   
##  6 VAN       Mazur                   33 forwards     
##  7 TBL       Angelidis               10 forwards     
##  8 VAN       Benning                  3 defensemen   
##  9 BOS       Millar                   1 goalies      
## 10 CHI       Amonte                  10 forwards

Data 2: nhl_teams

nhl_teams_small <- nhl_teams %>% select(team_code, full_name) %>% sample_n(10)

nhl_teams_small
## # A tibble: 10 × 2
##    team_code full_name              
##    <chr>     <chr>                  
##  1 MNS       Minnesota North Stars  
##  2 QUA       Philadelphia Quakers   
##  3 SJS       San Jose Sharks        
##  4 DET       Detroit Red Wings      
##  5 LAK       Los Angeles Kings      
##  6 CGS       California Golden Seals
##  7 NYI       New York Islanders     
##  8 TAN       Toronto Arenas         
##  9 CHI       Chicago Blackhawks     
## 10 PHX       Phoenix Coyotes

3. inner_join

Describe the resulting data:

How is it different from the original two datasets? - 2 rows in output - all columns from the 2 datasets

nhl_rosters_small %>% inner_join(nhl_teams_small, by = c("team_code"))
## # A tibble: 2 × 5
##   team_code last_name sweater_number position_type full_name         
##   <chr>     <chr>              <dbl> <chr>         <chr>             
## 1 PHX       Hudson                15 forwards      Phoenix Coyotes   
## 2 CHI       Amonte                10 forwards      Chicago Blackhawks

4. left_join

Describe the resulting data:

How is it different from the original two datasets? - all columns from the 2 datasets - Shows the players as rows; full team name of the team the player plays for IF the team code shows in both samples of 10

nhl_rosters_small %>% left_join(nhl_teams_small, by = c("team_code"))
## # A tibble: 10 × 5
##    team_code last_name   sweater_number position_type full_name         
##    <chr>     <chr>                <dbl> <chr>         <chr>             
##  1 ATL       Valabik                  5 defensemen    <NA>              
##  2 PIT       Cunneyworth             19 forwards      <NA>              
##  3 NYR       Bathgate                 9 forwards      <NA>              
##  4 PHX       Hudson                  15 forwards      Phoenix Coyotes   
##  5 BUF       Bogosian                 4 defensemen    <NA>              
##  6 VAN       Mazur                   33 forwards      <NA>              
##  7 TBL       Angelidis               10 forwards      <NA>              
##  8 VAN       Benning                  3 defensemen    <NA>              
##  9 BOS       Millar                   1 goalies       <NA>              
## 10 CHI       Amonte                  10 forwards      Chicago Blackhawks

5. right_join

Describe the resulting data:

How is it different from the original two datasets? - all columns from the 2 datasets - Shows the teams as rows, includes all information on player if team code shows for that player in both samples of 10

nhl_rosters_small %>% right_join(nhl_teams_small, by = c("team_code"))
## # A tibble: 10 × 5
##    team_code last_name sweater_number position_type full_name              
##    <chr>     <chr>              <dbl> <chr>         <chr>                  
##  1 PHX       Hudson                15 forwards      Phoenix Coyotes        
##  2 CHI       Amonte                10 forwards      Chicago Blackhawks     
##  3 MNS       <NA>                  NA <NA>          Minnesota North Stars  
##  4 QUA       <NA>                  NA <NA>          Philadelphia Quakers   
##  5 SJS       <NA>                  NA <NA>          San Jose Sharks        
##  6 DET       <NA>                  NA <NA>          Detroit Red Wings      
##  7 LAK       <NA>                  NA <NA>          Los Angeles Kings      
##  8 CGS       <NA>                  NA <NA>          California Golden Seals
##  9 NYI       <NA>                  NA <NA>          New York Islanders     
## 10 TAN       <NA>                  NA <NA>          Toronto Arenas

6. full_join

Describe the resulting data:

How is it different from the original two datasets? - inludes all columns from both tables. If there are no relations with regard to the team_code, the teams without associated players are listed at the end

nhl_rosters_small %>% full_join(nhl_teams_small, by = c("team_code"))
## # A tibble: 18 × 5
##    team_code last_name   sweater_number position_type full_name              
##    <chr>     <chr>                <dbl> <chr>         <chr>                  
##  1 ATL       Valabik                  5 defensemen    <NA>                   
##  2 PIT       Cunneyworth             19 forwards      <NA>                   
##  3 NYR       Bathgate                 9 forwards      <NA>                   
##  4 PHX       Hudson                  15 forwards      Phoenix Coyotes        
##  5 BUF       Bogosian                 4 defensemen    <NA>                   
##  6 VAN       Mazur                   33 forwards      <NA>                   
##  7 TBL       Angelidis               10 forwards      <NA>                   
##  8 VAN       Benning                  3 defensemen    <NA>                   
##  9 BOS       Millar                   1 goalies       <NA>                   
## 10 CHI       Amonte                  10 forwards      Chicago Blackhawks     
## 11 MNS       <NA>                    NA <NA>          Minnesota North Stars  
## 12 QUA       <NA>                    NA <NA>          Philadelphia Quakers   
## 13 SJS       <NA>                    NA <NA>          San Jose Sharks        
## 14 DET       <NA>                    NA <NA>          Detroit Red Wings      
## 15 LAK       <NA>                    NA <NA>          Los Angeles Kings      
## 16 CGS       <NA>                    NA <NA>          California Golden Seals
## 17 NYI       <NA>                    NA <NA>          New York Islanders     
## 18 TAN       <NA>                    NA <NA>          Toronto Arenas

7. semi_join

Describe the resulting data:

How is it different from the original two datasets? - Displays only the columns from nhl_rosters_small for rows(players) that are related to nhl_teams_small by team_code

nhl_rosters_small %>% semi_join(nhl_teams_small, by = c("team_code"))
## # A tibble: 2 × 4
##   team_code last_name sweater_number position_type
##   <chr>     <chr>              <dbl> <chr>        
## 1 PHX       Hudson                15 forwards     
## 2 CHI       Amonte                10 forwards

8. anti_join

Describe the resulting data:

How is it different from the original two datasets? - opposite to semi_join, shows the columns from nhl_rosters_small for rows(players) that are not related to nhl_teams_small by team_code

nhl_rosters_small %>% anti_join(nhl_teams_small, by = c("team_code"))
## # A tibble: 8 × 4
##   team_code last_name   sweater_number position_type
##   <chr>     <chr>                <dbl> <chr>        
## 1 ATL       Valabik                  5 defensemen   
## 2 PIT       Cunneyworth             19 forwards     
## 3 NYR       Bathgate                 9 forwards     
## 4 BUF       Bogosian                 4 defensemen   
## 5 VAN       Mazur                   33 forwards     
## 6 TBL       Angelidis               10 forwards     
## 7 VAN       Benning                  3 defensemen   
## 8 BOS       Millar                   1 goalies