Import data

# excel file
data <- read_excel("../00_data/Data3.xlsx")
data
## # A tibble: 8,474 × 9
##    player_id first_name last_name   birth_date          birth_city birth_country
##        <dbl> <chr>      <chr>       <dttm>              <chr>      <chr>        
##  1   8467867 Bryan      Adams       1977-03-20 00:00:00 Fort St. … CAN          
##  2   8445176 Donald     Audette     1969-09-23 00:00:00 Laval      CAN          
##  3   8460014 Eric       Bertrand    1975-04-16 00:00:00 St-Ephrem  CAN          
##  4   8460510 Jason      Botterill   1976-05-19 00:00:00 Edmonton   CAN          
##  5   8459596 Andrew     Brunette    1973-08-24 00:00:00 Sudbury    CAN          
##  6   8445733 Kelly      Buchberger  1966-12-02 00:00:00 Langenburg CAN          
##  7   8460573 Hnat       Domenichel… 1976-02-17 00:00:00 Edmonton   CAN          
##  8   8459450 Shean      Donovan     1975-01-22 00:00:00 Timmins    CAN          
##  9   8446675 Nelson     Emerson     1967-08-17 00:00:00 Hamilton   CAN          
## 10   8446823 Ray        Ferraro     1964-08-23 00:00:00 Trail      CAN          
## # ℹ 8,464 more rows
## # ℹ 3 more variables: birth_state_province <chr>, birth_year <dbl>,
## #   birth_month <dbl>

State one question

Is there a correlation between birth month and the amount of Canadian NHL players it produces?

Plot data

ggplot(data, aes(x = birth_month, fill = birth_state_province)) + 
  geom_bar(position = "stack")

filter(data, birth_country == "CAN")
## # A tibble: 5,468 × 9
##    player_id first_name last_name   birth_date          birth_city birth_country
##        <dbl> <chr>      <chr>       <dttm>              <chr>      <chr>        
##  1   8467867 Bryan      Adams       1977-03-20 00:00:00 Fort St. … CAN          
##  2   8445176 Donald     Audette     1969-09-23 00:00:00 Laval      CAN          
##  3   8460014 Eric       Bertrand    1975-04-16 00:00:00 St-Ephrem  CAN          
##  4   8460510 Jason      Botterill   1976-05-19 00:00:00 Edmonton   CAN          
##  5   8459596 Andrew     Brunette    1973-08-24 00:00:00 Sudbury    CAN          
##  6   8445733 Kelly      Buchberger  1966-12-02 00:00:00 Langenburg CAN          
##  7   8460573 Hnat       Domenichel… 1976-02-17 00:00:00 Edmonton   CAN          
##  8   8459450 Shean      Donovan     1975-01-22 00:00:00 Timmins    CAN          
##  9   8446675 Nelson     Emerson     1967-08-17 00:00:00 Hamilton   CAN          
## 10   8446823 Ray        Ferraro     1964-08-23 00:00:00 Trail      CAN          
## # ℹ 5,458 more rows
## # ℹ 3 more variables: birth_state_province <chr>, birth_year <dbl>,
## #   birth_month <dbl>

Interpret

There seems to be a positive correlation between an earlier birth month and the amount of NHL players.