Import two related datasets from TidyTuesday Project.
game_goals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-03/game_goals.csv')
## Rows: 49384 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): player, age, team, at, opp, location, outcome
## dbl (17): season, rank, game_num, goals, assists, points, plus_minus, penal...
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
season_goals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-03/season_goals.csv')
## Rows: 4810 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): position, hand, player, years, status, season, team, league, headshot
## dbl (14): rank, total_goals, yr_start, age, season_games, goals, assists, po...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1 ## Game goals for a set player, used goals and assists aswell as player
Data 2 ## Season records fpr each player * Columns: 3 * Rows: 20
Games <- game_goals %>%
sample_n(20) %>%
select(player,goals,assists)
Season <- season_goals %>%
sample_n(20) %>%
select(player,goals,assists)
Describe the resulting data:
## No one matched up from bboth data sets
How is it different from the original two datasets?
inner_join (Games,Season, by = "goals")
## # A tibble: 6 × 5
## player.x goals assists.x player.y assists.y
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Zach Parise 1 0 John MacLean 0
## 2 Joe Sakic 1 1 John MacLean 0
## 3 Steven Stamkos 1 0 John MacLean 0
## 4 Wayne Gretzky 1 1 John MacLean 0
## 5 Patrick Marleau 1 0 John MacLean 0
## 6 Sidney Crosby 1 2 John MacLean 0
Describe the resulting data:
How is it different from the original two datasets?
left_join(Games,Season, by = "player")
## # A tibble: 20 × 5
## player goals.x assists.x goals.y assists.y
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Joe Thornton 0 0 NA NA
## 2 Steve Yzerman 0 0 NA NA
## 3 Max Pacioretty 0 0 NA NA
## 4 Dino Ciccarelli 0 0 NA NA
## 5 Mario Lemieux 0 1 NA NA
## 6 Ryan Getzlaf 0 0 24 58
## 7 Evgeni Malkin 0 1 NA NA
## 8 Zach Parise 1 0 NA NA
## 9 Jeff Carter 0 1 NA NA
## 10 Patrick Kane 0 0 NA NA
## 11 Joe Sakic 1 1 NA NA
## 12 Dave Andreychuk 0 0 NA NA
## 13 James Neal 0 0 NA NA
## 14 Eric Staal 0 0 NA NA
## 15 Dustin Brown 0 0 NA NA
## 16 Steven Stamkos 1 0 NA NA
## 17 Wayne Gretzky 1 1 NA NA
## 18 Patrick Marleau 1 0 44 39
## 19 Sidney Crosby 1 2 NA NA
## 20 Ryan Getzlaf 0 0 24 58
Describe the resulting data:
How is it different from the original two datasets?
left_join(Games,Season, by = "player")
## # A tibble: 20 × 5
## player goals.x assists.x goals.y assists.y
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Joe Thornton 0 0 NA NA
## 2 Steve Yzerman 0 0 NA NA
## 3 Max Pacioretty 0 0 NA NA
## 4 Dino Ciccarelli 0 0 NA NA
## 5 Mario Lemieux 0 1 NA NA
## 6 Ryan Getzlaf 0 0 24 58
## 7 Evgeni Malkin 0 1 NA NA
## 8 Zach Parise 1 0 NA NA
## 9 Jeff Carter 0 1 NA NA
## 10 Patrick Kane 0 0 NA NA
## 11 Joe Sakic 1 1 NA NA
## 12 Dave Andreychuk 0 0 NA NA
## 13 James Neal 0 0 NA NA
## 14 Eric Staal 0 0 NA NA
## 15 Dustin Brown 0 0 NA NA
## 16 Steven Stamkos 1 0 NA NA
## 17 Wayne Gretzky 1 1 NA NA
## 18 Patrick Marleau 1 0 44 39
## 19 Sidney Crosby 1 2 NA NA
## 20 Ryan Getzlaf 0 0 24 58
Describe the resulting data:
How is it different from the original two datasets?
full_join(Games,Season, by = "player")
## # A tibble: 38 × 5
## player goals.x assists.x goals.y assists.y
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Joe Thornton 0 0 NA NA
## 2 Steve Yzerman 0 0 NA NA
## 3 Max Pacioretty 0 0 NA NA
## 4 Dino Ciccarelli 0 0 NA NA
## 5 Mario Lemieux 0 1 NA NA
## 6 Ryan Getzlaf 0 0 24 58
## 7 Evgeni Malkin 0 1 NA NA
## 8 Zach Parise 1 0 NA NA
## 9 Jeff Carter 0 1 NA NA
## 10 Patrick Kane 0 0 NA NA
## # ℹ 28 more rows
Describe the resulting data:
How is it different from the original two datasets?
semi_join(Games,Season, by = "player")
## # A tibble: 3 × 3
## player goals assists
## <chr> <dbl> <dbl>
## 1 Ryan Getzlaf 0 0
## 2 Patrick Marleau 1 0
## 3 Ryan Getzlaf 0 0
Describe the resulting data:
How is it different from the original two datasets?
anti_join(Games,Season, by = "player")
## # A tibble: 17 × 3
## player goals assists
## <chr> <dbl> <dbl>
## 1 Joe Thornton 0 0
## 2 Steve Yzerman 0 0
## 3 Max Pacioretty 0 0
## 4 Dino Ciccarelli 0 0
## 5 Mario Lemieux 0 1
## 6 Evgeni Malkin 0 1
## 7 Zach Parise 1 0
## 8 Jeff Carter 0 1
## 9 Patrick Kane 0 0
## 10 Joe Sakic 1 1
## 11 Dave Andreychuk 0 0
## 12 James Neal 0 0
## 13 Eric Staal 0 0
## 14 Dustin Brown 0 0
## 15 Steven Stamkos 1 0
## 16 Wayne Gretzky 1 1
## 17 Sidney Crosby 1 2