NHL Shots

Oilers forward Zach Hyman had the most shots on goal (179) from the high-danger area, and Edmonton had the most shots on goal (932) and goals (178) from there as a team. Tampa Bay Lightning center Brayden Point scored the most goals (36).

Anaheim Ducks goalie John Gibson faced the most shots on goal (641) and made the most saves (527) from the high-danger area, while Toronto Maple Leafs goalie Ilya Samsonov had the highest save percentage (.875).

The 10 skaters who spent the most time in the offensive zone and the least time in the defensive zone played for the Carolina Hurricanes. No surprise. The Hurricanes led the NHL in offensive zone time at even strength (47.0 percent) and spent the least time in the defensive zone (34.1 percent). That makes sense, considering they led the NHL in shot attempts percentage at 5-on-5 (60.4 percent).

Hockey is a low scoring sport compared to other sports such as basketball. We want to understand the relationship between the location of the shooter and the probability of scoring. We analyzed data on 122 thousand shots made by NHL (national hockey league) players during the 2022-2023 season. Below is the heat-plot of the locations from which the shots were made

# Show the area only
ggplot(d, aes(x=arenaAdjustedXCord, y=arenaAdjustedYCord) ) +
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon")

Then we loook at the density of the shots that did lead to a goal

ggplot(d%>% filter(goal==1), aes(x=arenaAdjustedXCord, y=arenaAdjustedYCord) ) +
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon")

Looks slightly different, but there are silmilarities. Players do know what locations they are likely to score from and try not to shoot the puck from low-probability spots.

Further,, we use logistic regression with \(y =\){goal, no goal} and we use two predictors $x_1 = $ the absolute value of the angle of the shot in degrees, and $x_2 = $ the distance from the net of the shot in feet. The following is the summary of the logistic regression model.

m = glm(goal~shotAngleAdjusted+arenaAdjustedShotDistance, data=d)
summary(m)
## 
## Call:
## glm(formula = goal ~ shotAngleAdjusted + arenaAdjustedShotDistance, 
##     data = d)
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.791e-01  1.978e-03   90.56   <2e-16 ***
## shotAngleAdjusted         -9.378e-04  3.501e-05  -26.78   <2e-16 ***
## arenaAdjustedShotDistance -2.300e-03  3.825e-05  -60.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.06474427)
## 
##     Null deviance: 8151.7  on 122025  degrees of freedom
## Residual deviance: 7900.3  on 122023  degrees of freedom
## AIC: 12277
## 
## Number of Fisher Scoring iterations: 2

Findings

The probability of a goal is low to begin with

sum(d$goal==1)/nrow(d)
## [1] 0.07198466

and it gets lower with distance and angle. Dah!

Lasso and Ridge

library(glmnet)
dm = subset(d,select=c("arenaAdjustedShotDistance","shotAngleAdjusted","shotType","defendingTeamDefencemenOnIce","shootingTeamDefencemenOnIce","shootingTeamForwardsOnIce","shooterLeftRight","goal"))
options(na.action="na.fail")
mm = model.matrix(goal~.-1,dm)
cvfit <- cv.glmnet(mm, dm$goal, family = "binomial")
plot(cvfit)

cvfit$lambda.min
## [1] 0.0002959287
a = coef(cvfit, s = "lambda.min")
i = which(a!=0)
knitr::kable(cbind(rownames(a)[i], a[i]),digits = 2)
(Intercept) 0.10279606562845
arenaAdjustedShotDistance -0.0482104607941848
shotAngleAdjusted -0.0132156803406798
shotType 0.0836410409022707
shotTypeBACK -0.428225836099433
shotTypeDEFL -0.42096298445421
shotTypeSLAP 0.3119458344746
shotTypeSNAP 0.328781424704178
shotTypeTIP -0.569351118170474
shotTypeWRAP -0.708994027982635
defendingTeamDefencemenOnIce -0.906924281257946
shootingTeamForwardsOnIce 0.290804349326006
shooterLeftRightL -0.0442436852632846
fit <- glmnet(mm, dm$goal, family = "binomial")
plot(fit, xvar = "lambda", label = TRUE)