#Package Loading
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::src() masks Hmisc::src()
## ✖ dplyr::summarize() masks Hmisc::summarize()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
#Loading the MoneyPuck Shot Dataset
mpd = read.csv('./shots_2024.csv')
#adding descriptors to dataframe
# Load the data dictionary (update with your file path)
data_dict <- read.csv('./MoneyPuck_Shot_Data_Dictionary (1) (1).csv')
# Iterate through the data dictionary and assign labels (from ChatGPT -- QOL Step)
for (i in 1:nrow(data_dict)) {
column_name <- data_dict$Variable[i]
description <- data_dict$Definition[i]
if (column_name %in% colnames(mpd)) {
label(mpd[[column_name]]) <- description
}
}
#filter to team NYR
mpd_nyr <- mpd %>% filter(teamCode == "NYR")
#add a calculated column tiredness differential
mpd_nyr$OffenseTimeAdv = mpd_nyr$shootingTeamAverageTimeOnIce - mpd_nyr$defendingTeamAverageTimeOnIce
#add a calculated column offense adv
mpd_nyr$offenseManAdv = mpd_nyr$shootingTeamForwardsOnIce + mpd_nyr$shootingTeamDefencemenOnIce - mpd_nyr$defendingTeamDefencemenOnIce - mpd_nyr$defendingTeamForwardsOnIce
# Create scatter plot
ggplot(mpd_nyr, aes(x = OffenseTimeAdv, y = xGoal)) +
geom_point(color = "blue") + # Scatter points in blue
labs(title = "Scatterplot of OffenseTimeAdv vs xGoal",
x = "Offense Time Advantage",
y = "Expected Goals (xGoal)") +
theme_minimal() # Clean theme
After inspecting the scatterplot for Offense Time Advantage versus xGoal, the points are widely dispersed with no noticeable clusters or extreme outliers. This even spread suggests that no single observation is dominating the relationship, and that the variable Offense Time Advantage does not appear to have a strong linear influence on xGoal. This insight indicates that other factors may be at play in determining expected goals.
correlation <- cor(mpd_nyr$OffenseTimeAdv, mpd_nyr$xGoal)
print(correlation)
## [1] 0.0299162
The computed correlation coefficient is approximately 0.03, which is nearly zero. This very low correlation reinforces what we observed in the plot—a lack of a linear relationship between Offense Time Advantage and xGoal. This result leads us to question whether a non-linear relationship exists or if additional variables might better explain the variability in expected goals.
###Confidence Interval
ci_xGoal <- t.test(mpd_nyr$xGoal, conf.level = 0.95)
print(ci_xGoal$conf.int)
## [1] 0.06859590 0.07794544
## attr(,"conf.level")
## [1] 0.95
The 95% confidence interval for xGoal, ranging from roughly 0.0686 to 0.0779, is quite narrow, indicating a precise estimate of the mean expected goals. Despite this precision, the minimal correlation suggests that while xGoal is reliably measured, its changes are not explained by Offense Time Advantage alone. This finding invites further exploration into other potential predictors or interactions that could clarify what drives variations in xGoal.
# Create scatter plot
ggplot(mpd_nyr, aes(x = offenseManAdv, y = xPlayContinuedInZone)) +
geom_point(color = "blue") + # Scatter points in blue
labs(title = "Scatterplot of OffenseManAdv vs xPlayContInZone",
x = "Offense Man Advantage",
y = "Expected Play Continues In Zone") +
theme_minimal() # Clean theme
The scatterplot for Offense Man Advantage versus xPlayContinuedInZone reveals a clear upward trend with a moderate clustering of points and no significant outliers. Although a few points deviate slightly from the general trend, the overall pattern is consistent, suggesting a stable relationship where an increased manpower advantage corresponds with a higher likelihood of continued play in the zone.
correlation <- cor(mpd_nyr$offenseManAdv, mpd_nyr$xPlayContinuedInZone, method = "spearman")
print(correlation)
## [1] 0.4169057
Here, the correlation coefficient is approximately 0.42, indicating a moderate positive linear relationship between Offense Man Advantage and xPlayContinuedInZone. This moderate association is well-supported by the visual trend in the plot, confirming that as the offensive manpower advantage increases, so does the expected continuation of play.
ci_xGoal <- t.test(mpd_nyr$xPlayContinuedInZone, conf.level = 0.95)
print(ci_xGoal$conf.int)
## [1] 0.3877935 0.3956812
## attr(,"conf.level")
## [1] 0.95
The 95% confidence interval for xPlayContinuedInZone, estimated to be between about 0.3878 and 0.3957, suggests a very precise estimate of its mean value. The narrow interval, together with the moderate correlation, indicates that while Offense Man Advantage contributes to predicting continued play, there is still room to investigate other contributing factors or potential confounders. This reliable estimation invites further research into how situational or contextual game elements might interact with manpower advantage to affect offensive play.