Oilers forward Zach Hyman had the most shots on goal (179) from the high-danger area, and Edmonton had the most shots on goal (932) and goals (178) from there as a team. Tampa Bay Lightning center Brayden Point scored the most goals (36).
Anaheim Ducks goalie John Gibson faced the most shots on goal (641) and made the most saves (527) from the high-danger area, while Toronto Maple Leafs goalie Ilya Samsonov had the highest save percentage (.875).
The 10 skaters who spent the most time in the offensive zone and the least time in the defensive zone played for the Carolina Hurricanes. No surprise. The Hurricanes led the NHL in offensive zone time at even strength (47.0 percent) and spent the least time in the defensive zone (34.1 percent). That makes sense, considering they led the NHL in shot attempts percentage at 5-on-5 (60.4 percent).
Hockey is a low scoring sport compared to other sports such as basketball. We want to understand the relationship between the location of the shooter and the probability of scoring. We analyzed data on 122 thousand shots made by NHL (national hockey league) players during the 2022-2023 season. Below is the heat-plot of the locations from which the shots were made
# Show the area only
ggplot(d, aes(x=arenaAdjustedXCord, y=arenaAdjustedYCord) ) +
stat_density_2d(aes(fill = after_stat(level)), geom = "polygon")
Then we loook at the density of the shots that did lead to a goal
ggplot(d%>% filter(goal==1), aes(x=arenaAdjustedXCord, y=arenaAdjustedYCord) ) +
stat_density_2d(aes(fill = after_stat(level)), geom = "polygon")
Looks slightly different, but there are silmilarities. Players do know
what locations they are likely to score from and try not to shoot the
puck from low-probability spots.
Further,, we use logistic regression with \(y =\){goal, no goal} and we use two predictors $x_1 = $ the absolute value of the angle of the shot in degrees, and $x_2 = $ the distance from the net of the shot in feet. The following is the summary of the logistic regression model.
m = glm(goal~shotAngleAdjusted+arenaAdjustedShotDistance, data=d)
summary(m)
##
## Call:
## glm(formula = goal ~ shotAngleAdjusted + arenaAdjustedShotDistance,
## data = d)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.791e-01 1.978e-03 90.56 <2e-16 ***
## shotAngleAdjusted -9.378e-04 3.501e-05 -26.78 <2e-16 ***
## arenaAdjustedShotDistance -2.300e-03 3.825e-05 -60.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.06474427)
##
## Null deviance: 8151.7 on 122025 degrees of freedom
## Residual deviance: 7900.3 on 122023 degrees of freedom
## AIC: 12277
##
## Number of Fisher Scoring iterations: 2
The probability of a goal is low to begin with
sum(d$goal==1)/nrow(d)
## [1] 0.07198466
and it gets lower with distance and angle. Dah!
library(glmnet)
dm = subset(d,select=c("arenaAdjustedShotDistance","shotAngleAdjusted","shotType","defendingTeamDefencemenOnIce","shootingTeamDefencemenOnIce","shootingTeamForwardsOnIce","shooterLeftRight","goal"))
options(na.action="na.fail")
mm = model.matrix(goal~.-1,dm)
cvfit <- cv.glmnet(mm, dm$goal, family = "binomial")
plot(cvfit)
cvfit$lambda.min
## [1] 0.0002959287
a = coef(cvfit, s = "lambda.min")
i = which(a!=0)
knitr::kable(cbind(rownames(a)[i], a[i]),digits = 2)
(Intercept) | 0.10279606562845 |
arenaAdjustedShotDistance | -0.0482104607941848 |
shotAngleAdjusted | -0.0132156803406798 |
shotType | 0.0836410409022707 |
shotTypeBACK | -0.428225836099433 |
shotTypeDEFL | -0.42096298445421 |
shotTypeSLAP | 0.3119458344746 |
shotTypeSNAP | 0.328781424704178 |
shotTypeTIP | -0.569351118170474 |
shotTypeWRAP | -0.708994027982635 |
defendingTeamDefencemenOnIce | -0.906924281257946 |
shootingTeamForwardsOnIce | 0.290804349326006 |
shooterLeftRightL | -0.0442436852632846 |
fit <- glmnet(mm, dm$goal, family = "binomial")
plot(fit, xvar = "lambda", label = TRUE)