1. Import your data

Import two related datasets from TidyTuesday Project.

    rosters <- read_excel("../00_data/myData.xlsx", sheet = "nhl_rosters")
rosters
## # A tibble: 54,883 × 18
##    team_code   season position_type player_id headshot      first_name last_name
##    <chr>        <dbl> <chr>             <dbl> <chr>         <chr>      <chr>    
##  1 ATL       19992000 forwards        8467867 https://asse… Bryan      Adams    
##  2 ATL       19992000 forwards        8445176 https://asse… Donald     Audette  
##  3 ATL       19992000 forwards        8460014 https://asse… Eric       Bertrand 
##  4 ATL       19992000 forwards        8460510 https://asse… Jason      Botterill
##  5 ATL       19992000 forwards        8459596 https://asse… Andrew     Brunette 
##  6 ATL       19992000 forwards        8445733 https://asse… Kelly      Buchberg…
##  7 ATL       19992000 forwards        8460573 https://asse… Hnat       Domenich…
##  8 ATL       19992000 forwards        8459450 https://asse… Shean      Donovan  
##  9 ATL       19992000 forwards        8446675 https://asse… Nelson     Emerson  
## 10 ATL       19992000 forwards        8446823 https://asse… Ray        Ferraro  
## # ℹ 54,873 more rows
## # ℹ 11 more variables: sweater_number <chr>, position_code <chr>,
## #   shoots_catches <chr>, height_in_inches <dbl>, weight_in_pounds <dbl>,
## #   height_in_centimeters <dbl>, weight_in_kilograms <dbl>, birth_date <dttm>,
## #   birth_city <chr>, birth_country <chr>, birth_state_province <chr>
births <- read_excel("../00_data/myData.xlsx", sheet = "nhl_player_births")

births
## # A tibble: 8,474 × 9
##    player_id first_name last_name   birth_date          birth_city birth_country
##        <dbl> <chr>      <chr>       <dttm>              <chr>      <chr>        
##  1   8467867 Bryan      Adams       1977-03-20 00:00:00 Fort St. … CAN          
##  2   8445176 Donald     Audette     1969-09-23 00:00:00 Laval      CAN          
##  3   8460014 Eric       Bertrand    1975-04-16 00:00:00 St-Ephrem  CAN          
##  4   8460510 Jason      Botterill   1976-05-19 00:00:00 Edmonton   CAN          
##  5   8459596 Andrew     Brunette    1973-08-24 00:00:00 Sudbury    CAN          
##  6   8445733 Kelly      Buchberger  1966-12-02 00:00:00 Langenburg CAN          
##  7   8460573 Hnat       Domenichel… 1976-02-17 00:00:00 Edmonton   CAN          
##  8   8459450 Shean      Donovan     1975-01-22 00:00:00 Timmins    CAN          
##  9   8446675 Nelson     Emerson     1967-08-17 00:00:00 Hamilton   CAN          
## 10   8446823 Ray        Ferraro     1964-08-23 00:00:00 Trail      CAN          
## # ℹ 8,464 more rows
## # ℹ 3 more variables: birth_state_province <chr>, birth_year <dbl>,
## #   birth_month <dbl>

2. Make data small

set.seed(1234)
births_small <- births %>% select(player_id, first_name, last_name, birth_month) %>% sample_n(10)
births_small
## # A tibble: 10 × 4
##    player_id first_name last_name  birth_month
##        <dbl> <chr>      <chr>            <dbl>
##  1   8470760 Marc       Methot               6
##  2   8478048 Igor       Shesterkin          12
##  3   8474194 Timo       Pielmeier            7
##  4   8476345 Gregory    Hofmann             11
##  5   8447169 Greg       Gilbert              1
##  6   8446541 Mike       Eaves                6
##  7   8449286 Bob        Mason                4
##  8   8449129 Emory      Sparrow              9
##  9   8480948 Egor       Yakovlev             9
## 10   8459484 Jamie      Rivers               3
rosters_small <- rosters %>% select(player_id, first_name, last_name, sweater_number) %>% sample_n(10)
rosters_small
## # A tibble: 10 × 4
##    player_id first_name last_name    sweater_number
##        <dbl> <chr>      <chr>        <chr>         
##  1   8459670 Kimmo      Timonen      44            
##  2   8473419 Brad       Marchand     63            
##  3   8446311 Dan        Daoust       24            
##  4   8445371 Real       Chevrefils   12            
##  5   8447536 Lowell     MacDonald    18            
##  6   8460516 Dan        Cloutier     39            
##  7   8474697 Mark       Borowiecki   74            
##  8   8447733 Archie     Henderson    29            
##  9   8447502 Ed         Litzenberger 12            
## 10   8468517 Andy       Hilbert      11

Describe the two datasets:

Data1

Data 2

3. inner_join

Describe the resulting data:

How is it different from the original two datasets?

inner_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 0 × 5
## # ℹ 5 variables: player_id <dbl>, first_name <chr>, last_name <chr>,
## #   birth_month <dbl>, sweater_number <chr>

4. left_join

Describe the resulting data:

How is it different from the original two datasets? I got one more column from the right.

left_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 10 × 5
##    player_id first_name last_name  birth_month sweater_number
##        <dbl> <chr>      <chr>            <dbl> <chr>         
##  1   8470760 Marc       Methot               6 <NA>          
##  2   8478048 Igor       Shesterkin          12 <NA>          
##  3   8474194 Timo       Pielmeier            7 <NA>          
##  4   8476345 Gregory    Hofmann             11 <NA>          
##  5   8447169 Greg       Gilbert              1 <NA>          
##  6   8446541 Mike       Eaves                6 <NA>          
##  7   8449286 Bob        Mason                4 <NA>          
##  8   8449129 Emory      Sparrow              9 <NA>          
##  9   8480948 Egor       Yakovlev             9 <NA>          
## 10   8459484 Jamie      Rivers               3 <NA>

5. right_join

Describe the resulting data:

How is it different from the original two datasets?

right_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 10 × 5
##    player_id first_name last_name    birth_month sweater_number
##        <dbl> <chr>      <chr>              <dbl> <chr>         
##  1   8459670 Kimmo      Timonen               NA 44            
##  2   8473419 Brad       Marchand              NA 63            
##  3   8446311 Dan        Daoust                NA 24            
##  4   8445371 Real       Chevrefils            NA 12            
##  5   8447536 Lowell     MacDonald             NA 18            
##  6   8460516 Dan        Cloutier              NA 39            
##  7   8474697 Mark       Borowiecki            NA 74            
##  8   8447733 Archie     Henderson             NA 29            
##  9   8447502 Ed         Litzenberger          NA 12            
## 10   8468517 Andy       Hilbert               NA 11

6. full_join

Describe the resulting data:

How is it different from the original two datasets?

full_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 20 × 5
##    player_id first_name last_name    birth_month sweater_number
##        <dbl> <chr>      <chr>              <dbl> <chr>         
##  1   8470760 Marc       Methot                 6 <NA>          
##  2   8478048 Igor       Shesterkin            12 <NA>          
##  3   8474194 Timo       Pielmeier              7 <NA>          
##  4   8476345 Gregory    Hofmann               11 <NA>          
##  5   8447169 Greg       Gilbert                1 <NA>          
##  6   8446541 Mike       Eaves                  6 <NA>          
##  7   8449286 Bob        Mason                  4 <NA>          
##  8   8449129 Emory      Sparrow                9 <NA>          
##  9   8480948 Egor       Yakovlev               9 <NA>          
## 10   8459484 Jamie      Rivers                 3 <NA>          
## 11   8459670 Kimmo      Timonen               NA 44            
## 12   8473419 Brad       Marchand              NA 63            
## 13   8446311 Dan        Daoust                NA 24            
## 14   8445371 Real       Chevrefils            NA 12            
## 15   8447536 Lowell     MacDonald             NA 18            
## 16   8460516 Dan        Cloutier              NA 39            
## 17   8474697 Mark       Borowiecki            NA 74            
## 18   8447733 Archie     Henderson             NA 29            
## 19   8447502 Ed         Litzenberger          NA 12            
## 20   8468517 Andy       Hilbert               NA 11

7. semi_join

Describe the resulting data:

How is it different from the original two datasets?

semi_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 0 × 4
## # ℹ 4 variables: player_id <dbl>, first_name <chr>, last_name <chr>,
## #   birth_month <dbl>

8. anti_join

Describe the resulting data:

How is it different from the original two datasets?

anti_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 10 × 4
##    player_id first_name last_name  birth_month
##        <dbl> <chr>      <chr>            <dbl>
##  1   8470760 Marc       Methot               6
##  2   8478048 Igor       Shesterkin          12
##  3   8474194 Timo       Pielmeier            7
##  4   8476345 Gregory    Hofmann             11
##  5   8447169 Greg       Gilbert              1
##  6   8446541 Mike       Eaves                6
##  7   8449286 Bob        Mason                4
##  8   8449129 Emory      Sparrow              9
##  9   8480948 Egor       Yakovlev             9
## 10   8459484 Jamie      Rivers               3