Import data

https://www.hockey-reference.com/leagues/NHL_2023_skaters.html#stats::goals

# excel file
colony <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/colony.csv')

## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

colony

skimr::skim(colony)

7.3 Variation

7.3.2 Unusual values

It takes a situation where there are so many observations in the common bins that the rare bins are so short that you can’t see them.

You can see rare bins more clearly in the second plot.

colony %>%
    
    ggplot(aes(colony_n)) +
    geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 47 rows containing non-finite values (`stat_bin()`).

colony %>%
    
    ggplot(aes(colony_n)) +
    geom_histogram() +
    coord_cartesian(ylim = c(0,50))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 47 rows containing non-finite values (`stat_bin()`).

7.4 Missing values

The second plot shows the result of treating outliers in colony_reno as NA.

colony %>%
    
    ggplot(aes(colony_lost, colony_reno)) +
    geom_point()

## Warning: Removed 131 rows containing missing values (`geom_point()`).

colony %>%
    
    mutate(colony_reno = ifelse(colony_reno > 4e+05, NA, colony_reno)) %>%
    
    ggplot(aes(colony_lost, colony_reno)) +
    geom_point()

## Warning: Removed 139 rows containing missing values (`geom_point()`).

7.5 Covariation

7.5.2 Two categorical variables

The dataset has only two possible categorical variables in months and state. Dark blue represents 7 occurrences and light blue 6.

colony %>%
    
    count(months, state) %>%
    
    ggplot(aes(months, state)) +
    geom_tile(aes(fill = n))

Week 5: Apply 4

Daniel Lee

2022-09-15