Number of Goals Scored in NHL Games

About the Data:

Historical data set on NHL hockey, specifically, the information on a hockey game for each team. The data set has 15 features and 14,882 observations. Features include the coaches name, home or away team, team name, # of goals scored, # of hits, # of shots, giveaways, takeaways, and if the team won.

Importing the Hockey Data

hockeyData=read_csv("/Users/gregmaghakian/Documents/Soc 712/Week 6 Zelig/Homework/gameStats.csv")
#A Glimpse of the data
head(hockeyData)
## # A tibble: 6 x 15
##     game_id team_id HoA   won   settled_in head_coach    goals shots  hits
##       <int>   <int> <chr> <lgl> <chr>      <chr>         <int> <int> <int>
## 1    2.01e⁹       3 away  F     OT         John Tortore…     2    35    44
## 2    2.01e⁹       6 home  T     OT         Claude Julien     3    48    51
## 3    2.01e⁹       3 away  F     REG        John Tortore…     2    37    33
## 4    2.01e⁹       6 home  T     REG        Claude Julien     5    32    36
## 5    2.01e⁹       6 away  T     REG        Claude Julien     2    34    28
## 6    2.01e⁹       3 home  F     REG        John Tortore…     1    24    37
## # ... with 6 more variables: pim <int>, powerPlayOpportunities <int>,
## #   powerPlayGoals <int>, faceOffWinPercentage <dbl>, giveaways <int>,
## #   takeaways <int>
#recoding HoA to factor
hockeyData$HoA=as.factor(hockeyData$HoA)

An analysis of Goals Scored by Home Field Advantage

A hockey game is fast-paced and action packed. To win an NHL game, a team must score more goals than the opponent. Goals however are normally hard to come by, and therefore, extracting any information on what features contribute to having higher goals scored is very valuable! By watching hockey for years and reading articles/looking at stats, I have come to realize that a team playing at home usually has more wins than when playing on the road. I want to put this hypothesis to the test by explaining number of goals scored (goals) with if a team has home-field advantage or not (HoA).

To conduct this analysis, we will run a Poisson regression model since we are dealing with count data for the number of goals scored in a game.

Picking our specified model based on AIC/BIC

We will start with our main explanatory variable–HoA (Home or Away)–and add features to try to best explain Number of Goals Scored. We will then choose our working model based on the lowest AIC/BIC score.

reg1=zelig(goals~HoA,model = "poisson",data=hockeyData,cite=F)
reg2=zelig(goals~HoA+faceOffWinPercentage,model = "poisson",data=hockeyData,cite=F)
reg3=zelig(goals~HoA+faceOffWinPercentage+shots,model = "poisson",data=hockeyData,cite=F)
reg4=zelig(goals~HoA+faceOffWinPercentage+shots+hits,model = "poisson",data=hockeyData,cite=F)
reg5=zelig(goals~HoA+faceOffWinPercentage+shots+hits+powerPlayOpportunities,model = "poisson",data=hockeyData,cite=F)
reg6=zelig(goals~HoA+faceOffWinPercentage+shots+hits+powerPlayOpportunities+giveaways,model = "poisson",data=hockeyData,cite=F)
reg7=zelig(goals~HoA+faceOffWinPercentage+shots+hits+powerPlayOpportunities+giveaways+takeaways,model = "poisson",data=hockeyData,cite=F)




htmlreg(list(reg1,reg2,reg3,reg4,reg5,reg6,reg7),digits = 5)
Statistical models
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
(Intercept) 0.94988*** 0.89730*** 0.64221*** 0.79584*** 0.73634*** 0.73665*** 0.67071***
(0.00721) (0.03522) (0.03944) (0.04172) (0.04239) (0.04281) (0.04321)
HoAhome 0.09933*** 0.09566*** 0.08275*** 0.09256*** 0.08579*** 0.08594*** 0.06443***
(0.00995) (0.01024) (0.01028) (0.01031) (0.01035) (0.01074) (0.01092)
faceOffWinPercentage 0.00109 -0.00016 -0.00037 -0.00035 -0.00035 -0.00024
(0.00071) (0.00072) (0.00072) (0.00072) (0.00072) (0.00072)
shots 0.01061*** 0.01075*** 0.00973*** 0.00973*** 0.00904***
(0.00073) (0.00073) (0.00074) (0.00074) (0.00075)
hits -0.00647*** -0.00609*** -0.00609*** -0.00596***
(0.00058) (0.00058) (0.00058) (0.00058)
powerPlayOpportunities 0.02654*** 0.02654*** 0.02899***
(0.00328) (0.00328) (0.00329)
giveaways -0.00006 -0.00247*
(0.00109) (0.00111)
takeaways 0.01496***
(0.00136)
AIC 55561.37223 55561.04358 55353.28287 55227.15304 55164.26062 55166.25800 55049.67302
BIC 55576.58805 55583.86730 55383.71450 55265.19258 55209.90807 55219.51335 55110.53628
Log Likelihood -27778.68612 -27777.52179 -27672.64143 -27608.57652 -27576.13031 -27576.12900 -27516.83651
Deviance 16111.96955 16109.64090 15899.88019 15771.75036 15706.85795 15706.85532 15588.27034
Num. obs. 14882 14882 14882 14882 14882 14882 14882
p < 0.001, p < 0.01, p < 0.05

Based on both AIC and BIC, the lowest score for both is reg7, which we will use for our simulation and analysis!

Setting the counterfactuals, simulating our data, and plotting

Let us set our counterfactuals for a team being home or away.

reg7$setx(HoA="away")
reg7$setx1(HoA="home")

Simulating our Data:

reg7$sim()

Plotting our Data:

reg7$graph()

Here, we set up a first difference simulation for HoA where all other features in regression 7 are set to their default values (i.e. mode, median, etc.). We set X to be a team that is playing away, and X1 to be a team that is playing at home.

From looking at the visuals, we can see that a team that plays at home has an expected value of scoring about .17 more goals than if a team is playing away. Breaking it down more, a team playing away has an expected value of scoring about 2.6 goals whereas a team playing at home has an expected value of scoring about 2.79 goals.

A further look

reg7$get_qi(xvalue="x1",qi="fd")%>%
  data.frame()%>%
  summary()
##        .          
##  Min.   :0.08664  
##  1st Qu.:0.15459  
##  Median :0.17371  
##  Mean   :0.17352  
##  3rd Qu.:0.19315  
##  Max.   :0.25497

Thinking about the numbers more, we can gain further insight about our data. Looking at the mean expected value and interquartile Range for first difference between home and away teams, we can see that there really isn’t much of a difference when it comes to number of goals scored. However, HoA (being home or away), by the results for our best model, regression 7, is statistically significant at the .1% level. This means that our feature is significant and the coefficient, no matter how small, does have an impact on the number of goals scored. This is quite important as in hockey, knowing what gives you an edge, even a small one, can help determine whether a team wins a championship or not.

In Conclusion:

To summarize, even though the difference is small, there is a statistically significant relationship between a teams home field advantage and the number of goals that team scores. A team that is playing at home, really does have “home field advantage” as they have an expected value of .17 more goals than a team that plays away.