#Package Loading
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::src() masks Hmisc::src()
## ✖ dplyr::summarize() masks Hmisc::summarize()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(haven)
#Loading the MoneyPuck Shot Dataset
mpd = read.csv("C:/Users/Logan/Downloads/shots_2024_1/shots_2024.csv")
#adding descriptors to dataframe
#Load the data dictionary (update with your file path)
#data_dict <- read.csv("C:/Users/Logan/Downloads/MoneyPuck_Shot_Data_Dictionary (1) (1).csv")
#Iterate through the data dictionary and assign labels (from ChatGPT -- QOL Step)
#for (i in 1:nrow(data_dict)) {
#column_name <- data_dict$Variable[i]
#description <- data_dict$Definition[i]
#if (column_name %in% colnames(mpd)) {
#label(mpd[[column_name]]) <- description
#}
#}
# Assuming your data frame is named 'mpd_data'
model <- glm(goal ~ mpd$arenaAdjustedXCord + mpd$arenaAdjustedYCord + mpd$speedFromLastEvent + mpd$timeSinceFaceoff,
data = mpd, family = binomial)
# View the summary of the model
summary(model)
##
## Call:
## glm(formula = goal ~ mpd$arenaAdjustedXCord + mpd$arenaAdjustedYCord +
## mpd$speedFromLastEvent + mpd$timeSinceFaceoff, family = binomial,
## data = mpd)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.4638339 0.0369316 -66.713 < 2e-16 ***
## mpd$arenaAdjustedXCord -0.0003300 0.0003252 -1.015 0.31025
## mpd$arenaAdjustedYCord 0.0009017 0.0009937 0.907 0.36423
## mpd$speedFromLastEvent -0.0220580 0.0027839 -7.924 2.31e-15 ***
## mpd$timeSinceFaceoff 0.0009867 0.0003459 2.853 0.00433 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 17971 on 34998 degrees of freedom
## Residual deviance: 17884 on 34994 degrees of freedom
## AIC: 17894
##
## Number of Fisher Scoring iterations: 5
Based on the logistic regression model, we observe that
speedFromLastEvent and timeSinceFaceoff
significantly influence the likelihood of scoring a goal. The negative
coefficient for speedFromLastEvent (-0.0221) indicates that
higher speeds from the last event decrease the probability of scoring,
potentially due to reduced control or accuracy at higher speeds.
Conversely, the positive coefficient for timeSinceFaceoff
(0.0010) suggests that as more time passes since the faceoff, the
chances of scoring increase, possibly due to better positioning or
strategic play. The coefficients for arenaAdjustedXCord and
arenaAdjustedYCord are not statistically significant,
implying that the x and y coordinates do not have a strong impact on
scoring likelihood in this dataset.
# Extract coefficients and standard errors
coef <- summary(model)$coefficients[, "Estimate"]
se <- summary(model)$coefficients[, "Std. Error"]
# Calculate 95% CI
ci_lower <- coef - 1.96 * se
ci_upper <- coef + 1.96 * se
# Combine into a data frame for easy viewing
ci <- data.frame(Coefficient = coef, Lower_CI = ci_lower, Upper_CI = ci_upper)
print(ci)
## Coefficient Lower_CI Upper_CI
## (Intercept) -2.4638338618 -2.5362197412 -2.3914479825
## mpd$arenaAdjustedXCord -0.0003300306 -0.0009675145 0.0003074533
## mpd$arenaAdjustedYCord 0.0009016557 -0.0010460807 0.0028493921
## mpd$speedFromLastEvent -0.0220579931 -0.0275143686 -0.0166016175
## mpd$timeSinceFaceoff 0.0009867410 0.0003088239 0.0016646582
Intercept: The coefficient for the intercept is -2.4638, with a 95% confidence interval ranging from -2.5362 to -2.3914. This represents the baseline log-odds of scoring a goal when all explanatory variables are zero. The negative value indicates a low probability of scoring under these conditions.
mpd$arenaAdjustedXCord: The coefficient is -0.00033, with a confidence interval from -0.00097 to 0.00031. Since the interval includes zero and the p-value is not significant, this suggests that the x-coordinate does not have a meaningful impact on the likelihood of scoring a goal.
mpd$arenaAdjustedYCord: The coefficient is 0.0009, with a confidence interval from -0.00105 to 0.00285. Similar to the x-coordinate, the interval includes zero and the p-value is not significant, indicating that the y-coordinate does not significantly affect the probability of scoring.
mpd$speedFromLastEvent: The coefficient is -0.0221, with a confidence interval from -0.0275 to -0.0166. This negative coefficient is statistically significant, suggesting that higher speeds from the last event decrease the likelihood of scoring a goal. The confidence interval does not include zero, reinforcing the significance of this variable.
mpd$timeSinceFaceoff: The coefficient is 0.0010, with a confidence interval from 0.00031 to 0.00166. This positive coefficient is statistically significant, indicating that more time since the faceoff increases the probability of scoring. The confidence interval does not include zero, confirming the importance of this variable.