Import your data

data <- read_excel("../00_data/myData.xlsx")
## New names:
## • `` -> `...1`
data
## # A tibble: 4,810 × 24
##     ...1  rank position hand  player   years total…¹ status yr_st…² season   age
##    <dbl> <dbl> <chr>    <chr> <chr>    <chr>   <dbl> <chr>    <dbl> <chr>  <dbl>
##  1     1     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18
##  2     2     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18
##  3     3     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18
##  4     4     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1979-…    19
##  5     5     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1980-…    20
##  6     6     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1981-…    21
##  7     7     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1982-…    22
##  8     8     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1983-…    23
##  9     9     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1984-…    24
## 10    10     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1985-…    25
## # … with 4,800 more rows, 13 more variables: team <chr>, league <chr>,
## #   season_games <dbl>, goals <dbl>, assists <dbl>, points <dbl>,
## #   plus_minus <chr>, penalty_min <dbl>, goals_even <chr>,
## #   goals_power_play <chr>, goals_short_handed <chr>, goals_game_winner <chr>,
## #   headshot <chr>, and abbreviated variable names ¹​total_goals, ²​yr_start

Chapter 13

What are primary keys in your data?

Primary keys in my data are: Rank, season, and team

data %>% count(rank,season,team)
## # A tibble: 4,810 × 4
##     rank season  team      n
##    <dbl> <chr>   <chr> <int>
##  1     1 1978-79 EDO       1
##  2     1 1978-79 INR       1
##  3     1 1978-79 TOT       1
##  4     1 1979-80 EDM       1
##  5     1 1980-81 EDM       1
##  6     1 1981-82 EDM       1
##  7     1 1982-83 EDM       1
##  8     1 1983-84 EDM       1
##  9     1 1984-85 EDM       1
## 10     1 1985-86 EDM       1
## # … with 4,800 more rows

Can you divide your data into two?

Divide it using dplyr::select in a way the two have a common variable, which you could use to join the two.

data_1half <- data %>% select(rank:team) %>% head(250)
data2_half <- data %>% select(team:goals_game_winner) %>% head(250)

Can you join the two together?

Use tidyr::left_join or other joining functions.

left_join(data_1half, data2_half)
## Joining, by = "team"
## # A tibble: 6,490 × 22
##     rank position hand  player   years total…¹ status yr_st…² season   age team 
##    <dbl> <chr>    <chr> <chr>    <chr>   <dbl> <chr>    <dbl> <chr>  <dbl> <chr>
##  1     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  2     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  3     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  4     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  5     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  6     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  7     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  8     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
##  9     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
## 10     1 C        Left  Wayne G… 1979…     894 Retir…    1979 1978-…    18 TOT  
## # … with 6,480 more rows, 11 more variables: league <chr>, season_games <dbl>,
## #   goals <dbl>, assists <dbl>, points <dbl>, plus_minus <chr>,
## #   penalty_min <dbl>, goals_even <chr>, goals_power_play <chr>,
## #   goals_short_handed <chr>, goals_game_winner <chr>, and abbreviated variable
## #   names ¹​total_goals, ²​yr_start