Final Project

Author

Bella Meyerratken

National Hockey League Analysis:

Exploring Relationships Between Performance Metrics and Overall Team Success

Introduction

The National Hockey League, or better known as the NHL, is comprised of thirty-two teams, twenty-five located in the United States and seven in Canada. Games are broken down into three, twenty minute periods and eighty-three games are played throughout the regular season. Hockey is a game that takes physicality, speed, finesse, grit, endurance, and most importantly strategy.

Analyses conducted on NHL team level statistics, over one or more seasons, aid in decision making and performance analysis. Strengths and weaknesses of the team can be identified through offensive, defensive, special teams, and goalie performance. Based off these identified strengths and weaknesses, team decisions, like trades, starting lineups, and coaching can be made. Team statistics can help identify which areas need to be improved and an overall team strategy for the season. In addition, statistics of opposing teams can be used to build game strategies for specific games.

The relationships between overall team, conference, and divisional success in specific performance statistics are going to be investigated. It is hypothesized that there is going to be a positive relationship between winning percentage and goals in both regulation, overtime, and shootouts. Some questions that will be answered in regard to the hypothesis are the following:

  1. Do teams with a greater number of shoot out points have higher win percentages ?

  2. Do teams that score more goals tend to allow more or less goals against them at the team, conference, and divisional level?

  3. Is there a relationship between the ability to win shootouts (‘Pct_SO’) the ability to win games before shootouts (‘PPG’) at the team, conference, and division level?

Data

An analysis of the 2022-2023 NHL regular season will be conducted at the team level to find strengths and weaknesses between conferences and divisions. The data being used for this analysis is from SCORE Sports Data Repository. This data contains team statistics for the 2022-2023 regular season and contains variables such as wins, loses, goals, shootout goals, penalties, etc. To further the analysis, regular season team data from 2021-2025 will be scraped from NHL.com to look at trends over multiple seasons.

The data being used in this analysis can accessed through these links:

Table 1: Variable Dictionary For 2022-2023 Season Team Data

Variable Definition
Team team abbreviation (3-letters)
GP total games played
SOG number of games that went into shootouts
W wins
L losses
OTL overtime loses
PTS, P points
Reg_PTS points won per game in regulation or overtime (2 pts per game)
PPG points won per game in regulation time or OT (# non-shootout pts / # of non-SO games)
SO_PTS points won from SO (1 extra point per SO)
RW number of regulation wins
ROW number of regulation wins & OT wins
SOW number of shootout wins
SOL number of shootout losses
GF total goals scored
GA total goals against
DIFF goal differential (GF-GA)

Table 2: Variable Dictionary For 2021-2025 Season Team Data

Variable Definition
Team name of NHL team
Season NHL season
GP games played
W wins
L losses
T ties
P points (2 pts per win; 1 for OT/SO loss)
P% Points / Max Possible Points
RW regulation win
ROW regulation + overtime wins (does not include shoot outs)
S/O Win shootout wins
GF goals for (total goals scored)
GA goals against (total goals allowed)
GF/GP average goals for per game
GA/GP average goals against per game
PP% power play percentage (power play goals / opportunities * 100)
PK% penalty kill percentage (penalty killed / penalties faced *100)
Net PP% power play efficiency adjusted for shorthanded goals against
Net PK% penalty kill efficiency adjusted for shorthanded goals for
Shots/GP average shots for per game
SA/GP average shots against per game
FOW% faceoff win percentage

Table 3: Points Summary Statistics

Data MIN Q1 MED Q3 MAX MEAN ST. DEV.
2022-2023 58.0 80.0 92.5 107.25 135.0 91.44 18.8986
2021-2025 47.0 49.75 92.0 105.25 135.0 90.84 17.8168

Table 4: Wins Summary Statistics

Data MIN Q1 MED Q3 MAX MEAN ST. DEV.
2022-2023 22.0 35.0 42.0 47.0 65.0 41.0 9.8995
2021-2025 19.0 35.0 42.0 48.25 65.0 41.0 9.3548

See code below for more summary statistics

# Packages Used
library(tidyverse)
library(skimr)

# 2022-2023
summary(NHL22_23)
sd(NHL22_23$PTS)
sd(NHL22_23$W)

# 2021-2025
summary(NHL21_25)
sd(NHL21_25$P)
sd(NHL21_25$W)

Descriptive Analysis

2022-2023 Season Analysis

Before beginning the analysis of the 2022-2023 NHL regular season, “conference” and “division” variables need to be created to allow for analysis at those levels.

# Load Packages: 
library(tidyverse) # for cleaning, transforming, visualizing, and analyzing data 
library(dplyr) # for data manipulation 
library(ggimage) # for logos to be used as points in visuals 
library(patchwork) #package used to put visual side-by-side
library(broom) #for r^2 values 

#Load Data: 
NHL22_23 <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/meyerratkeni_xavier_edu/Ecko0LcMGjJCsN1RZOK4RsoBmlTCgqaaHw_Yy1ehLWqZxQ?download=1")
NHL22_23<- 
  NHL22_23 %>% 
  mutate(SOW_Pct = SOW / (SOW + SOL),
         OTW = ROW - RW,
         TSOG =  SOW + SOL,
         winning_pct = W/GP) %>% 
  mutate(Conference = ifelse(Team %in% c("BOS","CAR","NJ","TOR","NYR","TB","NYI","FLA",
                                  "PIT","BUF","OTT","DET","WSH","PHI","MTL","CB"),
                      "Eastern", "Western")) %>% 
  mutate(Division = case_when(
    Team %in% c("BOS", "BUF", "DET", "FLA", "MTL", "OTT", "TB", "TOR") ~ "Atlantic",
    Team %in% c("CAR", "CBJ", "NJ", "NYI", "NYR", "PHI", "PIT", "WSH") ~ "Metropolitan",
    Team %in% c("CHI", "COL", "DAL", "MIN", "NSH", "STL", "WPG", "ARI") ~ "Central",
    Team %in% c("ANA", "CGY", "EDM", "LA", "SEA", "SJ", "VAN", "VGK") ~ "Pacific"
  ))

To start the analysis, we want to start at a basic level to see an overview of the season. To do this, the relationship between number of wins and number of goals scored for/against is going to be investigated to see if it is positive or negative.

pts_for<- ggplot(NHL22_23, aes(x = W, y = GF)) +
  geom_point()+
  geom_image(aes(image = logo), size = 0.05) +
  theme_minimal() +
  labs(x = "Wins", 
       y = "Goals For")

pts_against<- ggplot(NHL22_23, aes(x = W, y = GA)) +
  geom_point()+
  geom_image(aes(image = logo), size = 0.05) +
  theme_minimal() +
  labs( x = "Wins", 
       y = "Goals Against")

(pts_for | pts_against) + plot_layout(guides = "collect")+
  plot_annotation(
    title = "Relationship Between Points Scored Against/For & Wins",
    subtitle= "2022-2023 Regular Season",
        theme = theme(
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 12, hjust = 0.5)))

As seen in the visual above, there is a positive relationship between goals scored for and number of season wins, whereas there is a negative relationship between goals scored against and number of season wins. These relationships make sense because the primary goal of the game is to score more points than your opponent within the three, twenty minute periods.

To expand upon this finding, shootout points are going to be analyzed to determine if there is a relationship between shootout points and winning percentage. Do teams with a greater number of shootout points have higher winning percentages? Are shootout points a significant factor in increasing winning percentage? Is the relationship strong or weak, positive or negative?

divisional_SO<- NHL22_23 %>% 
  ggplot(aes(x = SO_PTS, y = winning_pct, color = Division)) +
  geom_point(alpha = 0.8) +
  geom_smooth(method = lm, color = "black", se = FALSE, linetype = "dashed") +
  geom_text(aes(label = Team), hjust = 1.1, vjust = 0.5, size = 3, 
            check_overlap = TRUE) +
  labs(
    x = "SO Points Scored",
    y = "Win Percentage",
    color = "Division"
  ) +
  theme_minimal()+
  theme(legend.position = "bottom") 

conference_SO<- NHL22_23 %>% 
  ggplot(aes(x = SO_PTS, y = winning_pct, color = Conference)) +
  geom_point(alpha = 0.8) +
  geom_smooth(method = lm, color = "black", se = FALSE, linetype = "dashed") +
  geom_text(aes(label = Team), hjust = 1.1, vjust = 0.5, size = 3, 
            check_overlap = TRUE) +
  labs(
    x = "SO Points Scored",
    y = "Win Percentage",
    color = "Conference"
  ) +
  theme_minimal()+
  theme(legend.position = "bottom")

# place visuals side-by-side 
(conference_SO | divisional_SO) +
  plot_annotation(
    title = "Relationship Between Shootout Points & Win Percentage",
    subtitle = "2022–2023 Regular Season",
    theme = theme(
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 12, hjust = 0.5)
    )
  )

Looking at the visual above, we can determine that there is a relatively weak, positive relationship between number of shootout goals scored and winning percentage. We can gather that teams do not rely on shootout points to increase their winning percentage. Teams are evenly dispersed above and below the regression line, with a larger percentage of teams having three or fewer shootout goals. Additionally, there are not divisional clusters so we cannot determine if shootout points have a greater influence on winning percentage in certain divisions over others. Lastly, since there are more teams located above the regression line with three or fewer shootout points than there are above the regression line and more than three shootout points, we can infer that regulation points have a greater impact on winning percentage than shootout points. At the conference level, there is a cluster of Eastern Conference teams above and below the regression line between one and three shootout points, whereas there is a cluster of Western Conference teams between five to seven shootout points. We can see that the Eastern Conference teams are more similar in number of shootout points than the Western Conference teams that are more spread out. This could be due to the Eastern Conference games being won more often in regulation or overtime than in the Western Conference, resulting in more shootout opportunities in the Western Conference.

Next, the question “does offensive strength come at the cost of defensive weakness?” will be investigated. To do this, “goals scored for” will be plotted against “goals scored against” and colored by division.

division_GF_GA<- NHL22_23 %>% 
  ggplot(aes(x = GA, y = GF, color = Division, size = winning_pct)) +
  geom_point(alpha = 0.8) +
  geom_smooth(aes(color=Division),method = lm, se = FALSE, linetype = "dashed") + # regression line at divisional level 
  geom_text(aes(label = Team), check_overlap = TRUE, vjust = 1.2, size = 3) +
  labs( 
    x = "Goals Against",
    y = "Goals Scored",
    color = "Division",
  ) +
  guides(size = "none") +  #removes size legend
  theme_minimal() +
  theme(legend.position = "bottom")

conference_GF_GA<- NHL22_23 %>% 
  ggplot(aes(x = GA, y = GF, color = Conference, size = winning_pct)) +
  geom_point(alpha = 0.8) +
  geom_smooth(aes(color=Conference),method = lm, se = FALSE, linetype = "dashed") + # regression line at conference level 
  geom_text(aes(label = Team), check_overlap = TRUE, vjust = 1.2, size = 3) +
  labs(
    x = "Goals Against",
    y = "Goals Scored",
    color = "Conference",
  ) +
  guides(size = "none") +  #removes size legend
  theme_minimal() +
  theme(legend.position = "bottom")

# Place visuals side-by-side 
(conference_GF_GA | division_GF_GA) + 
  plot_layout(guides = "collect") +
  plot_annotation(
    title = "Goal Scoring vs. Goal Prevention",
    subtitle = "2022–2023 NHL Regular Season | Left: Conference | Right: Division",
    theme = theme(
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 12, hjust = 0.5),
      legend.position = "bottom"
    )
  )

Overall, there is a moderate, negative relationship between “goals scored for” and “goals scored against:. This means that teams who tend to score more points, get scored on less often and vice versa. This negative relationship could be caused by teams that score more often dominating the possession of the puck, keeping it in their zone for longer periods of time and limiting the scoring chances of the opposing team. When looking at the conference level, the Western Conference has a stronger negative relationship than the Eastern Conference. The Edmonton Oilers are an outlier in the Western Conference with a the highest number of points scored, pulling the line upwards, as the cluster of teams with high numbers of goals scored against and low numbers of goals scored for, pulling the line downwards creating a steeper slope of the regression line. In the Eastern Conference, the teams are fairly, evenly dispersed throughout the scatterplot with one outlier, the Boston Bruins, with a high number of goals scored and a very small number of goals scored against, pulling the regression line upwards. Now looking at teams at the divisional level, the Pacific Division has the strongest negative relationship, followed by Metropolitan, Atlantic, and Central. The Pacific Division has two strong outliers, the Edmonton Oiler and Anaheim Ducks, pulling the regression line in opposite directions, resulting in a steep slope.

To build upon the finding on offensive and defensive strengths and weaknesses through goals made against and goals made for, a visual will be built to try and answer the following question:

  • Is offensive ability (‘GF’) or defensive ability (‘GA’) a better predictor of overall performance of a team (as measured by ‘PTS’)?

Offensive ability will be measured through goals for and defensive ability will be measured through goals made against. The overall performance of a team will be measured by the number of overall points a team has, which differ from the number of goals a team has made. A point system is used in the NHL to determine team standing in the regular season and is a major role in playoff qualification. The points are broken down by:

  • 2 points per win in regulation time, overtime, or shootouts

  • 1 point per loss in overtime or shootouts

  • 0 points for a regulation time loss

# Prepare R² values
r2_GF <- summary(lm(PTS ~ GF, data = NHL22_23))$r.squared
r2_GA <- summary(lm(PTS ~ GA, data = NHL22_23))$r.squared

# Plot 1: Offensive Ability
gf_plot <- NHL22_23 %>% 
  ggplot(aes(x = GF, y = PTS)) +
  geom_point() +
  geom_image(aes(image = logo), size = 0.05) +
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "red",
              linetype= "dashed") +
  labs(
    subtitle = paste("Offensive Ability | R² =", round(r2_GF, 2)),
    x = "Goals For (GF)",
    y = "Overall Points"
  ) +
  theme_minimal()


# Plot 2: Defensive Ability
ga_plot <- NHL22_23 %>% 
  ggplot(aes(x = GA, y = PTS)) +
  geom_point() +  
  geom_image(aes(image = logo), size = 0.05) +
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "red", 
              linetype = "dashed") +
  labs(
    subtitle = paste("Defensive Ability | R² =", round(r2_GA, 2)),
    x = "Goals Against (GA)",
    y = "Overall Points"
  ) +
  theme_minimal()

# Combine and annotate
(gf_plot | ga_plot) + 
  plot_layout(guides = "collect") +
  theme(legend.position = "bottom")+
  plot_annotation(
    title = "Is Offensive or Defensive Ability a Better Indicator of Team Success?",
    subtitle= "2022-2023 Regular Season | Analysis of Goals For and Goals Against",
    theme = theme(
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 12, hjust = 0.5) 
    ))

The red regression line, shows that there is a positive relationship between scoring more goals and earning more points and a negative relationship between allowing more goals to be scored on your team and earning points. These visuals differ from the one directly above because it used residuals. The R2 value shows how much variability in the dependent variable can be explained through the independent variable. As seen in the visual above, 64% of the variation in team performance across the league can be explained through offensive ability and 82% of the variation in team performance across the league can be explained through defensive performance. This means that at a league level we can conclude that defensive performance is a better predictor of team performance. Breaking it down into conference and division, the R2 values are the following:

Table 5: Conference Level R2 Values

Conference R2: Offensive Ability R2: Defensive Ability
Eastern 0.5752 0.8015
Western 0.6575 0.8185

Table 6: Divisional Level R2 Values

Division R2: Offensive Ability R2 Defensive Ability
Atlantic 0.6177 0.8961
Central 0.6494 0.3881
Metropolitan 0.8004 0.8734
Pacific 0.7209 0.9039

Looking at Table 5 & 6, we can see that at the conference level, defensive ability is a better indicator of team performance. At the divisional level, defensive ability is also a better indicator of team performance except in the Central Division where offensive ability is a better indicator. This tells us that there is a trend across the league that defensive ability and player are a key element to success but that Central Division teams do not follow the trend and tend to have a stronger offensive ability and player.

Now that we have discovered that defensive ability is a better indicator of team success throughout the league, with the exception of the Central Division, we are going to build upon this and look at shootout ability. Shootouts are a critical part of the game of hockey, but do not happen very often. Being able to excel in shootout games shows your team handles stressful, high stakes, high pressure situations well, which in turn can indicate which teams will respond well to the pressure of the Stanley Cup Playoffs. The following question will be looked at:

  • Is there a relationship between the ability to win shootouts (‘Pct_SO’) and the ability to win games before shootouts (‘PPG’)?
# Calculate R² value
r2_SO <- summary(lm(Pct_SO ~ PPG, data = NHL22_23))$r.squared

# Enhanced visual
NHL22_23 %>% 
  ggplot(aes(x = PPG, y = Pct_SO)) +
  geom_point() +
  geom_image(aes(image = logo), size = 0.05) +
  geom_smooth(method = lm, se = FALSE, linetype = "dashed", color = "red") +
  labs(
    title = "Do Teams That Earn More Points Per Game Excel in Shootouts?",
    subtitle = paste("2022–2023 Season | R² =", round(r2_SO, 2)),
    x = "Points Per Game (PPG)",
    y = "Shootout Win Percentage (Pct_SO)"
  ) +
  theme_minimal() 

The scatter-plot above, shows that there is basically no relationship between a teams ability to win shootout games and ability to win regulation games. 0.1764% of the variability in a team’s ability to win in shootout games can be explained through their ability to win regulation games. Looking at the conference and divisional levels, slightly more of the variability can be explained but not at a significant level. The highest percent of the variability in shootout ability that can be explained through regulation win ability is 25.5% in the Central Division. The table below shows the R2s at the conference and divisional levels.

Table 7: Shootout Win ability R2 values by Conference

Conference R2: Shootout Ability
Eastern 0.0518
Western 0.000568

Table 8: Shootout Win ability R2 values by Division

Division R2: Shootout Ability
Atlantic 0.121
Central 0.255
Metropolitan 0.0227
Pacific 0.00663

This analysis will continue by determining if more of the variability in shootout ability can be explain with more data if the number of seasons is extended from one to four seasons.

2021-2025 Seasons Analysis

The 2021-2025 Seasons Team NHL data contains the same statistics as the data set above with the addition of power-play statistics. Table 2, in the data section, show the the names of the additional variables. Before an analysis of the 2021-2025 seasons, “conference” and “division” variables need to be added, like with the 2022-2023 data set for more comparison. The goal of this analysis will be to see if trends change over time, 2021-2025; 4 seasons, at the team, conference, and division level compared to a single season analysis, 2022-2023.

# load in data set 
NHL21_25 <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/meyerratkeni_xavier_edu/EUXgCEUuBBRBgeIzRYAzW-EBnWjJTz49PL0MDTlUJQZKJA?download=1")

# Create Variable of Average Goals Per game & Winning Percentage 
NHL21_25 <- NHL21_25 %>% 
  group_by(Team) %>% 
  mutate(winning_pct = W / GP) %>%   # Number of Wins / Total # of Games 
  mutate(Conference = case_when(
      Team %in% c(
        "Anaheim Ducks", "Arizona Coyotes", "Calgary Flames", 
        "Chicago Blackhawks","Colorado Avalanche", "Dallas Stars", 
        "Edmonton Oilers", "Los Angeles Kings","Minnesota Wild", 
        "Nashville Predators", "San Jose Sharks", "Seattle Kraken",
        "St. Louis Blues", "Vancouver Canucks", "Vegas Golden Knights",
        "Winnipeg Jets", "Utah Hockey Club"
      ) ~ "Western",
      
      Team %in% c(
        "Boston Bruins", "Buffalo Sabres", "Carolina Hurricanes",
        "Columbus Blue Jackets","Detroit Red Wings", "Florida Panthers"
        ,"Montréal Canadiens", "New Jersey Devils",
        "New York Islanders", "New York Rangers", "Ottawa Senators",
        "Philadelphia Flyers","Pittsburgh Penguins", 
        "Tampa Bay Lightning", "Toronto Maple Leafs", 
        "Washington Capitals"
      ) ~ "Eastern",
  )) %>% 
   mutate(Division = case_when(
      Team %in% c(
        "Boston Bruins", "Buffalo Sabres", "Detroit Red Wings", 
        "Florida Panthers","Montréal Canadiens", "Ottawa Senators", 
        "Tampa Bay Lightning", "Toronto Maple Leafs"
      ) ~ "Atlantic",
      
      Team %in% c(
        "Carolina Hurricanes", "Columbus Blue Jackets",
        "New Jersey Devils","New York Islanders", "New York Rangers", 
        "Philadelphia Flyers","Pittsburgh Penguins", 
        "Washington Capitals"
      ) ~ "Metropolitan",
      
      Team %in% c(
        "Arizona Coyotes", "Chicago Blackhawks", "Colorado Avalanche", 
        "Dallas Stars","Minnesota Wild", "Nashville Predators", 
        "St. Louis Blues", "Winnipeg Jets", "Utah Hockey Club"
      ) ~ "Central",
      
      Team %in% c(
        "Anaheim Ducks", "Calgary Flames", "Edmonton Oilers", 
        "Los Angeles Kings","San Jose Sharks", "Seattle Kraken", 
        "Vancouver Canucks", "Vegas Golden Knights"
      ) ~ "Pacific"))

top_teams <- NHL21_25 %>%
  group_by(Team) %>%
  summarize(Avg_PP = mean(Net.PP.)) %>%
  arrange(desc(Avg_PP)) %>%
  slice_head(n = 5)

# Filter original data to just top 5 teams
top_team_data <- NHL21_25 %>%
  filter(Team %in% top_teams$Team) %>% 
  mutate(Net.PP. = Net.PP./100,
         Net.PK. = Net.PK./100)

To start of the analysis of the 2021-2022 regular season through the 2024-2025 regular season, a simple line graph will be created to show the average goals scored versus the average goals scored against at the divisional level. This will allow us to gain a broader picture of which division fluctuates or stays consistent throughout the season.

# Make sure 'Season' is an ordered factor
NHL21_25 <- NHL21_25 %>%
  mutate(Season = factor(Season, levels = c("2021-22", "2022-23", "2023-24", "2024-25")))

# Reference index for the vertical line
ref_line <- which(levels(NHL21_25$Season) == "2022-23")

# Plot 1: Goals For per Game, averaged by Division
gf_plot <- ggplot(NHL21_25, aes(x = Season, y = GF.GP, group = Division, color = Division)) +
  stat_summary(fun = mean, geom = "line", size = 1.5) +
  stat_summary(fun = mean, geom = "point", size = 3) +
  geom_vline(xintercept = ref_line, linetype = "dashed", color = "black") +
  labs(x = "Season", 
       y = "Average # of Goals For") +
  theme_minimal()+
   theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none") # to remove legend

# Plot 2: Goals Against per Game, averaged by Division
ga_plot <- ggplot(NHL21_25, aes(x = Season, y = GA.GP, group = Division, color = Division)) +
  stat_summary(fun = mean, geom = "line", size = 1.5) +
  stat_summary(fun = mean, geom = "point", size = 3) +
  geom_vline(xintercept = ref_line, linetype = "dashed", color = "black") +
  labs(x = "Season", 
       y = "Averge # of Goals Against") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right")

# Combine with shared legend
(gf_plot | ga_plot) + 
  plot_annotation(
    title = "Average Goals Scored vs Conceded by Division",
    subtitle = "NHL Regular Seasons (2021–2025)",
    theme = theme(
      plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 13, hjust = 0.5, face = "italic")
    )
  )

In goals scored for per game, the Metropolitan Division is staying relatively consistent, increasing slightly. The Pacific Division has a spike in the 2022-2023 season but then dramatically decreases, while the Central Division had a drop in the 2022-2023 season, increased then dropped again in the 2024-2025 season. Lastly, the Atlantic Division hit their peak in the 2022-2023 season and has been decreasing since.

Looking at goals scored against per game, the Atlantic and Central Divisions have been consistently decreasing, wheres the Metropolitan Division has been steadily increasing. The Pacific Division had a spike in the 2022-2023 season then started to decrease. These decreases could be due to gaining better goal tenders or defensemen who allow fewer goals to be scored on their team. Throughout the league in the 2024-2025, the average number of goals score for has decrease, looking at this further it could have to do with the decrease in fewer goals being allowed by opposing teams.

Since the second data set contains more seasons and additional variables on power plays, the relationship between power play efficiency and penalty kills is going to be examined. Power play efficiency shows how well teams perform when they are up a man and penalty kills shows how efficient the team is at not allowing the opposing team to score when they are down a man. A visual will be built in effort to answer the following question:

  • Are there teams that are strong in both power plays and penalty kills (balanced), or only in one?
# Reference index for the vertical line
ref_line <- which(levels(NHL21_25$Season) == "2022-23")

# Plot 1: Power Play Efficiency
pp_plot <- ggplot(top_team_data, aes(x = Season, y = Net.PP. , 
                                   group = Team, color = Team)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  geom_vline(xintercept = ref_line, linetype = "dashed", color = "black") +
  labs(title = "Power Play Efficiency (%)", 
       x = "Season", 
       y = "PP%") +
  scale_y_continuous(label=scales::percent)+
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none") # to remove legend 

# Plot 2: Penalty Kill Efficiency
pk_plot <- ggplot(top_team_data, aes(x = Season, y = Net.PK.,
                                   group = Team, color = Team)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  geom_vline(xintercept = ref_line, linetype = "dashed", color = "black") +
  labs(title = "Penalty Kill Efficiency (%)", 
       x = "Season", 
       y = "PK%") +
  scale_y_continuous(label=scales::percent)+
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") # to place legend on right of visual 

# Put Plot 1 & Plot 2 side-by-side for easy comparison & a single legend 
(pp_plot | pk_plot) + 
  plot_annotation(
    title = "Top Team Special Teams Efficiency",
    subtitle = "2021–2025 NHL Regular Seasons",
    theme = theme(
      plot.title = element_text(size = 17, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 13, hjust = 0.5)
    )
  )

The visual above shows the top five teams in the league throughout the four seasons, starting in 2021 and ending in 2025. In power play efficiency, the Colorado Avalanche have stayed the most consistent around 22.5% efficiency, whereas we see a steady decrease or variation between seasons. In penalty kill efficiency, there is a lot more variation among all the top five teams. Looking at the y-axes, the range of power play efficiency is 20-30% and the range of penalty kill efficiency is 80-88%. There is significantly more success in penalty kills than goals scored during a power play in these five teams. This could be due to having a stronger defensive line than offensive or how goalies perform under the power play pressure.

The last analysis that is going to be done, is looking at whether or not offensive or defensive efficiency is a better indicator of team success and if shootout ability can be explained through regulation game success. These were both looked at for the 2022-2023 season, but further analysis will be done to see if there is a change when looking over four seasons, having a larger amount of data.

First, we are going to answer the question:

  • “Does offensive or defensive efficiency, being an indicator for team success, differ from the 2022-2023 to throughout the 2021-2025 seasons?”.
# Prepare R² values
r2_GF_2125 <- summary(lm(P ~ GF, data = NHL21_25))$r.squared
r2_GA_2125 <- summary(lm(P ~ GA, data = NHL21_25))$r.squared

# Plot 1: Offensive Ability
gf_plot_2125 <- NHL21_25 %>% 
  ggplot(aes(x = GF, y = P)) +
  geom_point(alpha = 1) +
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "red",
              linetype= "dashed") +
  labs(
    subtitle = paste("Offensive Ability | R² =", round(r2_GF_2125, 2)),
    x = "Goals For (GF)",
    y = "Overall Points"
  ) +
  theme_minimal()


# Plot 2: Defensive Ability
ga_plot_2125 <- NHL21_25 %>% 
  ggplot(aes(x = GA, y = P)) +
  geom_point(alpha = 1) +  
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "red", 
              linetype = "dashed") +
  labs(
    subtitle = paste("Defensive Ability | R² =", round(r2_GA_2125, 2)),
    x = "Goals Against (GA)",
    y = "Overall Points"
  ) +
  theme_minimal()

# Combine and annotate
(gf_plot_2125 | ga_plot_2125) + 
   plot_annotation(
    title = "NHL Team Performance (2021–2025)",
    subtitle = "Relationship Between Goal Metrics and Overall Points",
    theme = theme(
      plot.title = element_text(size = 16, face = "bold"),
      plot.subtitle = element_text(size = 12),
      plot.margin = margin(10, 10, 10, 10)
    )
  )

Defensive ability remains explaining more of the variability in team success than offensive ability. 74.36% of the variability in team success from 2021-2025 can be explained through defensive ability, whereas 65.18% of the variability can be explained through offensive ability. The R2 of offensive ability increases from an analysis of one season to four seasons and the R2 of defensive ability decreased but still remains the better indicator.

Looking at the R2 s at the conference and division levels over the four seasons, it was found that defensive ability was a better indicator across all conferences and divisions, including the Central Division. The R2 values are the following:

Table 9: Conference Level R2 Values

Conference R2 : Offensive Ability R2 : Defensive Ability
Eastern 0.593 0.742
Western 0.700 0.760

Table 10: Divisional Level R2 Values

Division R2 : Offensive Ability R2 : Defensive Ability
Atlantic 0.622 0.711
Central 0.726 0.764
Metropolitan 0.583 0.814
Pacific 0.673 0.758

Second, the relationship between shootout efficiency and regulation game efficiency is going to be looked at over a four season period and compared back to the 2022-2023 season.

NHL21_25<- NHL21_25 %>% 
  mutate(regular_W = W - S.O.Win)

# Calculate R² value
r2_SO_2125 <- summary(lm(S.O.Win ~ regular_W, data = NHL21_25))$r.squared

# Enhanced visual
NHL22_23 %>% 
  ggplot(aes(x = PPG, y = Pct_SO)) +
  geom_point(alpha =0.5, size = 3) +
  geom_smooth(method = lm, se = FALSE, linetype = "dashed", color = "red") +
  labs(
    title = "Do Teams That Earn More Points Per Game Excel in Shootouts?",
    subtitle = paste("2021–2025 Seasons | R² =", round(r2_SO_2125, 2)),
    x = "Points Per Game",
    y = "Shootout Win Percentage"
  ) +
  theme_minimal() +
    theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.position = "none",
    panel.grid.minor = element_blank()
  )

Similar to the 2022-2023 season analysis, there is not a relationship between regulation game performance and shootout ability. Only 0.421% of the variability in shootout ability can be explained through regulation game performance. Extending the analysis to four seasons, instead of just a single season, did not prove to show a different relationship than before. There is also no significant relationships at the conference or divisional level. The R2 values are as followed:

Table 11: R2 Values: Conference

Conference R2 : Shootout Ability
Eastern 0.0142
Western 0.000367

Table 12: R2 Values: Division

Division R2 : Shootout Ability
Atlantic 0.0445
Central 0.00576
Metroploitan 0.0046
Pacific 0.0217

Discussion & Conclusion

An analysis was conducted over the NHL 2021-2024 regular season at the team level. Different relationships were explored between performance metrics and overall team success. At the most basic level, the relationship between number of goals scored in a season and the number of wins ina season is positive, whereas the number of goals scored against, and number of wins is negative.  When it comes to shootout goals, there is a positive relationship between number of shootout points and winning percentage. Within the conference level, you can examine trends that differ between the Eastern and Western Conferences. Lastly, performance was analyzed using offensive ability, defensive ability, and shootout ability. In both the 2022-2023 and 2021-2025 analyses, defensive ability was a more significant indicator of overall team success at the league, conference, and divisional levels, with the exception of the Central Division in the 2022-2023 analysis.  In addition, it was learned that regulation wins are not an indicator for shootout ability.

This analysis can be used by the NHL and individual teams to identify trends throughout the league, conference, and division to help inform decisions. For example, defensive preformance is a significant indicator for the overall success of the team. Invest in strong, skill, dominate defensemen and coaching will help increase the success of the team overall. When success increases, the number of points also increase, allowing for better standing and chances of qualifying for the Stanley Cup Playoffs. Analysis that can be conducted to further the findings of this paper are using historical data to see trends over a longer period of seasons, single conference or division analysis, and looking at a specific team and their player’s stats to find strengths and weaknesses.