Historical data set on NHL hockey, specifically, the information on a hockey game for each team. The data set has 15 features and 14,882 observations. Features include the coaches name, home or away team, team name, # of goals scored, # of hits, # of shots, giveaways, takeaways, and if the team won.
hockeyData=read_csv("/Users/gregmaghakian/Documents/Soc 712/Week 6 Zelig/Homework/gameStats.csv")
#A Glimpse of the data
head(hockeyData)
## # A tibble: 6 x 15
## game_id team_id HoA won settled_in head_coach goals shots hits
## <int> <int> <chr> <lgl> <chr> <chr> <int> <int> <int>
## 1 2.01e⁹ 3 away F OT John Tortore… 2 35 44
## 2 2.01e⁹ 6 home T OT Claude Julien 3 48 51
## 3 2.01e⁹ 3 away F REG John Tortore… 2 37 33
## 4 2.01e⁹ 6 home T REG Claude Julien 5 32 36
## 5 2.01e⁹ 6 away T REG Claude Julien 2 34 28
## 6 2.01e⁹ 3 home F REG John Tortore… 1 24 37
## # ... with 6 more variables: pim <int>, powerPlayOpportunities <int>,
## # powerPlayGoals <int>, faceOffWinPercentage <dbl>, giveaways <int>,
## # takeaways <int>
#recoding HoA to factor
hockeyData$HoA=as.factor(hockeyData$HoA)
A hockey game is fast-paced and action packed. To win an NHL game, a team must score more goals than the opponent. Goals however are normally hard to come by, and therefore, extracting any information on what features contribute to having higher goals scored is very valuable! By watching hockey for years and reading articles/looking at stats, I have come to realize that a team playing at home usually has more wins than when playing on the road. I want to put this hypothesis to the test by explaining number of goals scored (goals) with if a team has home-field advantage or not (HoA).
To conduct this analysis, we will run a Poisson regression model since we are dealing with count data for the number of goals scored in a game.
We will start with our main explanatory variable–HoA (Home or Away)–and add features to try to best explain Number of Goals Scored. We will then choose our working model based on the lowest AIC/BIC score.
reg1=zelig(goals~HoA,model = "poisson",data=hockeyData,cite=F)
reg2=zelig(goals~HoA+faceOffWinPercentage,model = "poisson",data=hockeyData,cite=F)
reg3=zelig(goals~HoA+faceOffWinPercentage+shots,model = "poisson",data=hockeyData,cite=F)
reg4=zelig(goals~HoA+faceOffWinPercentage+shots+hits,model = "poisson",data=hockeyData,cite=F)
reg5=zelig(goals~HoA+faceOffWinPercentage+shots+hits+powerPlayOpportunities,model = "poisson",data=hockeyData,cite=F)
reg6=zelig(goals~HoA+faceOffWinPercentage+shots+hits+powerPlayOpportunities+giveaways,model = "poisson",data=hockeyData,cite=F)
reg7=zelig(goals~HoA+faceOffWinPercentage+shots+hits+powerPlayOpportunities+giveaways+takeaways,model = "poisson",data=hockeyData,cite=F)
htmlreg(list(reg1,reg2,reg3,reg4,reg5,reg6,reg7),digits = 5)
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | ||
|---|---|---|---|---|---|---|---|---|
| (Intercept) | 0.94988*** | 0.89730*** | 0.64221*** | 0.79584*** | 0.73634*** | 0.73665*** | 0.67071*** | |
| (0.00721) | (0.03522) | (0.03944) | (0.04172) | (0.04239) | (0.04281) | (0.04321) | ||
| HoAhome | 0.09933*** | 0.09566*** | 0.08275*** | 0.09256*** | 0.08579*** | 0.08594*** | 0.06443*** | |
| (0.00995) | (0.01024) | (0.01028) | (0.01031) | (0.01035) | (0.01074) | (0.01092) | ||
| faceOffWinPercentage | 0.00109 | -0.00016 | -0.00037 | -0.00035 | -0.00035 | -0.00024 | ||
| (0.00071) | (0.00072) | (0.00072) | (0.00072) | (0.00072) | (0.00072) | |||
| shots | 0.01061*** | 0.01075*** | 0.00973*** | 0.00973*** | 0.00904*** | |||
| (0.00073) | (0.00073) | (0.00074) | (0.00074) | (0.00075) | ||||
| hits | -0.00647*** | -0.00609*** | -0.00609*** | -0.00596*** | ||||
| (0.00058) | (0.00058) | (0.00058) | (0.00058) | |||||
| powerPlayOpportunities | 0.02654*** | 0.02654*** | 0.02899*** | |||||
| (0.00328) | (0.00328) | (0.00329) | ||||||
| giveaways | -0.00006 | -0.00247* | ||||||
| (0.00109) | (0.00111) | |||||||
| takeaways | 0.01496*** | |||||||
| (0.00136) | ||||||||
| AIC | 55561.37223 | 55561.04358 | 55353.28287 | 55227.15304 | 55164.26062 | 55166.25800 | 55049.67302 | |
| BIC | 55576.58805 | 55583.86730 | 55383.71450 | 55265.19258 | 55209.90807 | 55219.51335 | 55110.53628 | |
| Log Likelihood | -27778.68612 | -27777.52179 | -27672.64143 | -27608.57652 | -27576.13031 | -27576.12900 | -27516.83651 | |
| Deviance | 16111.96955 | 16109.64090 | 15899.88019 | 15771.75036 | 15706.85795 | 15706.85532 | 15588.27034 | |
| Num. obs. | 14882 | 14882 | 14882 | 14882 | 14882 | 14882 | 14882 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||||||
Based on both AIC and BIC, the lowest score for both is reg7, which we will use for our simulation and analysis!
Let us set our counterfactuals for a team being home or away.
reg7$setx(HoA="away")
reg7$setx1(HoA="home")
Simulating our Data:
reg7$sim()
Plotting our Data:
reg7$graph()
Here, we set up a first difference simulation for HoA where all other features in regression 7 are set to their default values (i.e. mode, median, etc.). We set X to be a team that is playing away, and X1 to be a team that is playing at home.
From looking at the visuals, we can see that a team that plays at home has an expected value of scoring about .17 more goals than if a team is playing away. Breaking it down more, a team playing away has an expected value of scoring about 2.6 goals whereas a team playing at home has an expected value of scoring about 2.79 goals.
reg7$get_qi(xvalue="x1",qi="fd")%>%
data.frame()%>%
summary()
## .
## Min. :0.08664
## 1st Qu.:0.15459
## Median :0.17371
## Mean :0.17352
## 3rd Qu.:0.19315
## Max. :0.25497
Thinking about the numbers more, we can gain further insight about our data. Looking at the mean expected value and interquartile Range for first difference between home and away teams, we can see that there really isn’t much of a difference when it comes to number of goals scored. However, HoA (being home or away), by the results for our best model, regression 7, is statistically significant at the .1% level. This means that our feature is significant and the coefficient, no matter how small, does have an impact on the number of goals scored. This is quite important as in hockey, knowing what gives you an edge, even a small one, can help determine whether a team wins a championship or not.
To summarize, even though the difference is small, there is a statistically significant relationship between a teams home field advantage and the number of goals that team scores. A team that is playing at home, really does have “home field advantage” as they have an expected value of .17 more goals than a team that plays away.