Import two related datasets from TidyTuesday Project.
rosters <- read_excel("../00_data/myData.xlsx", sheet = "nhl_rosters")
rosters
## # A tibble: 54,883 × 18
## team_code season position_type player_id headshot first_name last_name
## <chr> <dbl> <chr> <dbl> <chr> <chr> <chr>
## 1 ATL 19992000 forwards 8467867 https://asse… Bryan Adams
## 2 ATL 19992000 forwards 8445176 https://asse… Donald Audette
## 3 ATL 19992000 forwards 8460014 https://asse… Eric Bertrand
## 4 ATL 19992000 forwards 8460510 https://asse… Jason Botterill
## 5 ATL 19992000 forwards 8459596 https://asse… Andrew Brunette
## 6 ATL 19992000 forwards 8445733 https://asse… Kelly Buchberg…
## 7 ATL 19992000 forwards 8460573 https://asse… Hnat Domenich…
## 8 ATL 19992000 forwards 8459450 https://asse… Shean Donovan
## 9 ATL 19992000 forwards 8446675 https://asse… Nelson Emerson
## 10 ATL 19992000 forwards 8446823 https://asse… Ray Ferraro
## # ℹ 54,873 more rows
## # ℹ 11 more variables: sweater_number <chr>, position_code <chr>,
## # shoots_catches <chr>, height_in_inches <dbl>, weight_in_pounds <dbl>,
## # height_in_centimeters <dbl>, weight_in_kilograms <dbl>, birth_date <dttm>,
## # birth_city <chr>, birth_country <chr>, birth_state_province <chr>
births <- read_excel("../00_data/myData.xlsx", sheet = "nhl_player_births")
births
## # A tibble: 8,474 × 9
## player_id first_name last_name birth_date birth_city birth_country
## <dbl> <chr> <chr> <dttm> <chr> <chr>
## 1 8467867 Bryan Adams 1977-03-20 00:00:00 Fort St. … CAN
## 2 8445176 Donald Audette 1969-09-23 00:00:00 Laval CAN
## 3 8460014 Eric Bertrand 1975-04-16 00:00:00 St-Ephrem CAN
## 4 8460510 Jason Botterill 1976-05-19 00:00:00 Edmonton CAN
## 5 8459596 Andrew Brunette 1973-08-24 00:00:00 Sudbury CAN
## 6 8445733 Kelly Buchberger 1966-12-02 00:00:00 Langenburg CAN
## 7 8460573 Hnat Domenichel… 1976-02-17 00:00:00 Edmonton CAN
## 8 8459450 Shean Donovan 1975-01-22 00:00:00 Timmins CAN
## 9 8446675 Nelson Emerson 1967-08-17 00:00:00 Hamilton CAN
## 10 8446823 Ray Ferraro 1964-08-23 00:00:00 Trail CAN
## # ℹ 8,464 more rows
## # ℹ 3 more variables: birth_state_province <chr>, birth_year <dbl>,
## # birth_month <dbl>
set.seed(1234)
births_small <- births %>% select(player_id, first_name, last_name, birth_month) %>% sample_n(10)
births_small
## # A tibble: 10 × 4
## player_id first_name last_name birth_month
## <dbl> <chr> <chr> <dbl>
## 1 8470760 Marc Methot 6
## 2 8478048 Igor Shesterkin 12
## 3 8474194 Timo Pielmeier 7
## 4 8476345 Gregory Hofmann 11
## 5 8447169 Greg Gilbert 1
## 6 8446541 Mike Eaves 6
## 7 8449286 Bob Mason 4
## 8 8449129 Emory Sparrow 9
## 9 8480948 Egor Yakovlev 9
## 10 8459484 Jamie Rivers 3
rosters_small <- rosters %>% select(player_id, first_name, last_name, sweater_number) %>% sample_n(10)
rosters_small
## # A tibble: 10 × 4
## player_id first_name last_name sweater_number
## <dbl> <chr> <chr> <chr>
## 1 8459670 Kimmo Timonen 44
## 2 8473419 Brad Marchand 63
## 3 8446311 Dan Daoust 24
## 4 8445371 Real Chevrefils 12
## 5 8447536 Lowell MacDonald 18
## 6 8460516 Dan Cloutier 39
## 7 8474697 Mark Borowiecki 74
## 8 8447733 Archie Henderson 29
## 9 8447502 Ed Litzenberger 12
## 10 8468517 Andy Hilbert 11
Describe the two datasets:
Data1
Data 2
Describe the resulting data:
How is it different from the original two datasets?
inner_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 0 × 5
## # ℹ 5 variables: player_id <dbl>, first_name <chr>, last_name <chr>,
## # birth_month <dbl>, sweater_number <chr>
Describe the resulting data:
How is it different from the original two datasets? I got one more column from the right.
left_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 10 × 5
## player_id first_name last_name birth_month sweater_number
## <dbl> <chr> <chr> <dbl> <chr>
## 1 8470760 Marc Methot 6 <NA>
## 2 8478048 Igor Shesterkin 12 <NA>
## 3 8474194 Timo Pielmeier 7 <NA>
## 4 8476345 Gregory Hofmann 11 <NA>
## 5 8447169 Greg Gilbert 1 <NA>
## 6 8446541 Mike Eaves 6 <NA>
## 7 8449286 Bob Mason 4 <NA>
## 8 8449129 Emory Sparrow 9 <NA>
## 9 8480948 Egor Yakovlev 9 <NA>
## 10 8459484 Jamie Rivers 3 <NA>
Describe the resulting data:
How is it different from the original two datasets?
right_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 10 × 5
## player_id first_name last_name birth_month sweater_number
## <dbl> <chr> <chr> <dbl> <chr>
## 1 8459670 Kimmo Timonen NA 44
## 2 8473419 Brad Marchand NA 63
## 3 8446311 Dan Daoust NA 24
## 4 8445371 Real Chevrefils NA 12
## 5 8447536 Lowell MacDonald NA 18
## 6 8460516 Dan Cloutier NA 39
## 7 8474697 Mark Borowiecki NA 74
## 8 8447733 Archie Henderson NA 29
## 9 8447502 Ed Litzenberger NA 12
## 10 8468517 Andy Hilbert NA 11
Describe the resulting data:
How is it different from the original two datasets?
full_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 20 × 5
## player_id first_name last_name birth_month sweater_number
## <dbl> <chr> <chr> <dbl> <chr>
## 1 8470760 Marc Methot 6 <NA>
## 2 8478048 Igor Shesterkin 12 <NA>
## 3 8474194 Timo Pielmeier 7 <NA>
## 4 8476345 Gregory Hofmann 11 <NA>
## 5 8447169 Greg Gilbert 1 <NA>
## 6 8446541 Mike Eaves 6 <NA>
## 7 8449286 Bob Mason 4 <NA>
## 8 8449129 Emory Sparrow 9 <NA>
## 9 8480948 Egor Yakovlev 9 <NA>
## 10 8459484 Jamie Rivers 3 <NA>
## 11 8459670 Kimmo Timonen NA 44
## 12 8473419 Brad Marchand NA 63
## 13 8446311 Dan Daoust NA 24
## 14 8445371 Real Chevrefils NA 12
## 15 8447536 Lowell MacDonald NA 18
## 16 8460516 Dan Cloutier NA 39
## 17 8474697 Mark Borowiecki NA 74
## 18 8447733 Archie Henderson NA 29
## 19 8447502 Ed Litzenberger NA 12
## 20 8468517 Andy Hilbert NA 11
Describe the resulting data:
How is it different from the original two datasets?
semi_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 0 × 4
## # ℹ 4 variables: player_id <dbl>, first_name <chr>, last_name <chr>,
## # birth_month <dbl>
Describe the resulting data:
How is it different from the original two datasets?
anti_join(births_small, rosters_small)
## Joining with `by = join_by(player_id, first_name, last_name)`
## # A tibble: 10 × 4
## player_id first_name last_name birth_month
## <dbl> <chr> <chr> <dbl>
## 1 8470760 Marc Methot 6
## 2 8478048 Igor Shesterkin 12
## 3 8474194 Timo Pielmeier 7
## 4 8476345 Gregory Hofmann 11
## 5 8447169 Greg Gilbert 1
## 6 8446541 Mike Eaves 6
## 7 8449286 Bob Mason 4
## 8 8449129 Emory Sparrow 9
## 9 8480948 Egor Yakovlev 9
## 10 8459484 Jamie Rivers 3