Import data

# excel file
data <- read_excel("../00_data/MyData.xlsx")
data
## # A tibble: 19 × 100
##    Date                Opponent   Score Column1 `All shifts` `Time on ice` Goals
##    <dttm>              <chr>      <chr>   <dbl>        <dbl> <chr>         <dbl>
##  1 2024-10-24 00:00:00 vs New En… 5:2         0           17 18:02          1   
##  2 2024-11-01 00:00:00 @ Univers… 0:5         0           23 22:59          0   
##  3 2024-11-03 00:00:00 @ Babson … 4:5         0           23 25:07          0   
##  4 2024-11-07 00:00:00 vs SUNY-P… 5:3         0           22 21:26          0   
##  5 2024-11-09 00:00:00 @ New Eng… 5:1         0           20 21:19          0   
##  6 2024-11-15 00:00:00 vs Worces… 8:2         0           18 18:16          0   
##  7 2024-11-21 00:00:00 vs Framin… 7:1         0           20 19:11          0   
##  8 2024-11-23 00:00:00 @ Salem S… 4:3         0           24 20:05          0   
##  9 2024-11-26 00:00:00 vs Trinit… 1:2         0           24 21:10          0   
## 10 2024-12-05 00:00:00 @ UMass D… 4:2         0           17 19:00          0   
## 11 2024-12-07 00:00:00 vs Westfi… 1:0         0           27 22:20          1   
## 12 2025-01-03 00:00:00 @ Hamilto… 0:4         0           20 19:35          0   
## 13 2025-01-04 00:00:00 @ William… 8:2         0           19 18:18          0   
## 14 2025-01-09 00:00:00 @ Rivier … 6:1         0           20 18:30          0   
## 15 2025-01-11 00:00:00 vs Fitchb… 3:2         0           24 21:30          0   
## 16 2025-01-16 00:00:00 @ Worcest… 5:2         0           20 20:22          0   
## 17 2025-01-18 00:00:00 vs Massac… 3:1         0           19 17:18          0   
## 18 2025-01-23 00:00:00 @ Anna Ma… 2:4         0           25 25:17          0   
## 19 NA                  Average p… <NA>      169           20 18:59          0.17
## # ℹ 93 more variables: `First assist` <dbl>, `Second assist` <dbl>,
## #   Assists <dbl>, `Puck touches` <dbl>, `Puck control time` <chr>,
## #   Points <dbl>, `+/-` <dbl>, Plus <dbl>, Minus <dbl>, Penalties <dbl>,
## #   `Penalties drawn` <dbl>, `Penalty time` <chr>, Hits <dbl>,
## #   `Hits against` <dbl>, `Error leading to goal` <dbl>, `Dump ins` <dbl>,
## #   `Dump outs` <dbl>, `Team goals when on ice` <dbl>,
## #   `Opponent's goals when on ice` <dbl>, Shots <dbl>, `Shots on goal` <dbl>, …

Introduction

This is an analysis of my hockey data across a season. It examines various metrics like shots on goal, plus/minus, and hits using different data visualization techniques. We run into some problems with calculated averages being included in the visuals

Questions

  1. How are my shot counts per game distributed?
  2. What are my typical plus/minus ratings?
    • plus/minus is a statistic measuring the amount of goals for and against you are on the ice for. If it’s positive, you are on the ice for more goals for your team than the other.
  3. Any outliers in my shots/game distribution?
  4. How do my takeaways vary between road and home games?

Variation

Visualizing distributions

data %>% 
    ggplot(aes(x = Shots)) + 
    geom_histogram(binwidth = 0.5) +
    labs(title = "Distribution of Shots per Game", 
        x = "# of Shots", 
        y = "Frequency")

Typical values

How are the typical plus/minus values distibuted over games?

data %>%
    ggplot(aes(x = `+/-`)) +
    geom_histogram() +
    labs(title = "Plus/Minus Rating Distribution")

Note

Here the average +/- from the data is included. We can exclude this by using mod to filter out decimals. x mod 1 == 0 will be true for only whole numbers:

data %>%
    filter(`+/-` %% 1 == 0) %>%
    ggplot(aes(x = `+/-`)) +
    geom_histogram() +
    labs(title = "Plus/Minus Rating Distribution")

Unusual values

Lets look at the distribution of shots per game and see if there are any outliers.

data %>%
    ggplot(aes(Shots)) + 
    geom_histogram() + 
    coord_cartesian(xlim = c(0, 10)) +
    labs(title = "Distribution of Shot Count",
         x = "Shots per Game",
         y = "Frequency")

Note

Seems like 6 is my lucky number.

Missing Values

No missing Values in my data

Covariation

Note

A lack of categorical data points in my dataset

A categorical and continuous variable

data <- data %>%
    mutate(Game_Location = ifelse(str_detect(Opponent, "^@"), "Away", "Home"))


home_games <- data %>%
    filter(Game_Location == "Home")

ggplot(home_games, mapping = aes(x = `Puck battles`)) + 
    geom_freqpoly() +
    labs(title = "Distribution of Puck Battles in Home Games",
         x = "Number of puck battles", 
         y = "Frequency")

Two continous variables

ggplot(data, mapping = aes(x = Takeaways)) + 
    geom_freqpoly(mapping = aes(colour = Game_Location), binwidth = 1) +
    labs(title = "Distribution - Takeaways by Game Location",
         x = "Number of Takeaways",
         y = "Frequency",
         colour = "Game Location")