606 Final Project

Brian Weinfeld

May 7th, 2018

Introduction

In the National Hockey League (NHL), is there a relationship between the number of goals scored and the number of shots taken when considering the period?

\[goals\sim shots + period\]

I am a big fan of the NHL and one aspect I had noticed is that there seems to be more goals scored in the 2nd period and 3rd period when compared to the 1st period. I decided to explore this relationship in order to determine whether this perceived difference is statistically significant.

I also believe that this investigation may reveal insight on possible future rule changes the NHL could make to increase scoring with the goal of increasing scoring while still keeping the “spirit” of the game.

I will be performing a multiple linear regression analysis considering goals scored, shots taken and period of play (1st, 2nd or 3rd).

Data Collection

Data Collection

The raw data was collected and tidied in R. Below is a sample of the collected data.

id period goals shots
29019 1 1 18
29019 2 1 31
29019 3 2 17
29020 1 4 23
29020 2 1 17
29020 3 0 21
29021 1 2 24
29021 2 2 16
29021 3 2 34
29022 1 3 22

Exploratory Analysis

period n mean sd median min max range
1 1230 19.06829 4.436173 19 7 37 30
2 1230 20.49512 4.540521 20 6 35 29
3 1230 18.90406 4.438228 19 4 40 36

The distributions of shots per period are very similar. This appears to indicate that the period of play does not affect the number of shots taken.

Exploratory Analysis

period n mean sd median min max range
1 1230 1.452846 1.193543 1 0 7 7
2 1230 1.833333 1.291152 2 0 7 7
3 1230 1.914634 1.252511 2 0 6 6

The number of goals scored when considering period shows some variation. The difference is not much but considering the breadth of the sample, it could be meainingful to have nearly a \(\frac{1}{2}\) goal difference between periods.

Exploratory Analysis

##               Df Sum Sq Mean Sq F value Pr(>F)    
## period         2    149   74.75   48.12 <2e-16 ***
## Residuals   3687   5728    1.55                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The exploratory analysis is promising. There appears to be a significant difference in the number of goals scored per period.

Precondition Verification

Before regression analysis, I needed to ensure that the precondition for analysis were met.

Precondition Verification

Before regression analysis, I needed to ensure that the precondition for analysis were met.

Precondition Verification

Before regression analysis, I needed to ensure that the precondition for analysis were met.

Precondition Verification

Before regression analysis, I needed to ensure that the precondition for analysis were met.

Regression Analysis

\[\widehat{goals}=0.066448\times shots+0.285678\times period2+0.472701\times period3 + 0.185791\]

## 
## Call:
## lm(formula = goals ~ shots + period, data = box.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7848 -0.9134 -0.0662  0.8164  5.2859 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.185791   0.091754   2.025    0.043 *  
## shots       0.066448   0.004458  14.904  < 2e-16 ***
## period2     0.285678   0.049229   5.803 7.06e-09 ***
## period3     0.472701   0.048822   9.682  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.211 on 3686 degrees of freedom
## Multiple R-squared:  0.08083,    Adjusted R-squared:  0.08008 
## F-statistic:   108 on 3 and 3686 DF,  p-value: < 2.2e-16

While the \(R^2_{adj}\) is low, indicating that the variables of shots and period only explain a small portion of the variability of the data, the p-value for shots and period are both well below 0.01 and are statistically significant.

The regression indicates that, everything else being equal, the 3rd period expected nearly an extra \(\frac{1}{2}\) of a goal when compared to the 1st period. The 2nd period is less pronounced but still has a difference of about \(\frac{2}{7}\) of a goal.

Regression Analysis

\[\widehat{goals}=0.066448\times shots+0.285678\times period2+0.472701\times period3 + 0.185791\]

Conclusion

\[\widehat{goals}=0.066448\times shots+0.285678\times period2+0.472701\times period3 + 0.185791\]