#Package Loading
library(Hmisc)
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ dplyr::src()       masks Hmisc::src()
## ✖ dplyr::summarize() masks Hmisc::summarize()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
#Loading the MoneyPuck Shot Dataset
mpd = read.csv('./shots_2024.csv')

#adding descriptors to dataframe

# Load the data dictionary (update with your file path)
data_dict <- read.csv('./MoneyPuck_Shot_Data_Dictionary (1) (1).csv')

# Iterate through the data dictionary and assign labels (from ChatGPT -- QOL Step)
for (i in 1:nrow(data_dict)) {
  column_name <- data_dict$Variable[i]
  description <- data_dict$Definition[i]
  
  if (column_name %in% colnames(mpd)) {
    label(mpd[[column_name]]) <- description
  }
}

#filter to team NYR

mpd_nyr <- mpd %>% filter(teamCode == "NYR")

#add a calculated column tiredness differential

mpd_nyr$OffenseTimeAdv = mpd_nyr$shootingTeamAverageTimeOnIce - mpd_nyr$defendingTeamAverageTimeOnIce

#add a calculated column offense adv
mpd_nyr$offenseManAdv = mpd_nyr$shootingTeamForwardsOnIce + mpd_nyr$shootingTeamDefencemenOnIce - mpd_nyr$defendingTeamDefencemenOnIce - mpd_nyr$defendingTeamForwardsOnIce

Plot 1. xGoal vs. Offense Time Adv

# Create scatter plot
ggplot(mpd_nyr, aes(x = OffenseTimeAdv, y = xGoal)) +
  geom_point(color = "blue") +  # Scatter points in blue
  labs(title = "Scatterplot of OffenseTimeAdv vs xGoal",
       x = "Offense Time Advantage",
       y = "Expected Goals (xGoal)") +
  theme_minimal()  # Clean theme

After inspecting the scatterplot for Offense Time Advantage versus xGoal, the points are widely dispersed with no noticeable clusters or extreme outliers. This even spread suggests that no single observation is dominating the relationship, and that the variable Offense Time Advantage does not appear to have a strong linear influence on xGoal. This insight indicates that other factors may be at play in determining expected goals.

Correlation Coefficient

correlation <- cor(mpd_nyr$OffenseTimeAdv, mpd_nyr$xGoal)

print(correlation)
## [1] 0.0299162

The computed correlation coefficient is approximately 0.03, which is nearly zero. This very low correlation reinforces what we observed in the plot—a lack of a linear relationship between Offense Time Advantage and xGoal. This result leads us to question whether a non-linear relationship exists or if additional variables might better explain the variability in expected goals.

###Confidence Interval

ci_xGoal <- t.test(mpd_nyr$xGoal, conf.level = 0.95)

print(ci_xGoal$conf.int)
## [1] 0.06859590 0.07794544
## attr(,"conf.level")
## [1] 0.95

The 95% confidence interval for xGoal, ranging from roughly 0.0686 to 0.0779, is quite narrow, indicating a precise estimate of the mean expected goals. Despite this precision, the minimal correlation suggests that while xGoal is reliably measured, its changes are not explained by Offense Time Advantage alone. This finding invites further exploration into other potential predictors or interactions that could clarify what drives variations in xGoal.

Plot 4. xPlayContinuedInZone vs Offense Man Adv

# Create scatter plot
ggplot(mpd_nyr, aes(x = offenseManAdv, y = xPlayContinuedInZone)) +
  geom_point(color = "blue") +  # Scatter points in blue
  labs(title = "Scatterplot of OffenseManAdv vs xPlayContInZone",
       x = "Offense Man Advantage",
       y = "Expected Play Continues In Zone") +
  theme_minimal()  # Clean theme

The scatterplot for Offense Man Advantage versus xPlayContinuedInZone reveals a clear upward trend with a moderate clustering of points and no significant outliers. Although a few points deviate slightly from the general trend, the overall pattern is consistent, suggesting a stable relationship where an increased manpower advantage corresponds with a higher likelihood of continued play in the zone.

Correlation Coefficient

correlation <- cor(mpd_nyr$offenseManAdv, mpd_nyr$xPlayContinuedInZone, method = "spearman")

print(correlation)
## [1] 0.4169057

Here, the correlation coefficient is approximately 0.42, indicating a moderate positive linear relationship between Offense Man Advantage and xPlayContinuedInZone. This moderate association is well-supported by the visual trend in the plot, confirming that as the offensive manpower advantage increases, so does the expected continuation of play.

Confidence Interval

ci_xGoal <- t.test(mpd_nyr$xPlayContinuedInZone, conf.level = 0.95)

print(ci_xGoal$conf.int)
## [1] 0.3877935 0.3956812
## attr(,"conf.level")
## [1] 0.95

The 95% confidence interval for xPlayContinuedInZone, estimated to be between about 0.3878 and 0.3957, suggests a very precise estimate of its mean value. The narrow interval, together with the moderate correlation, indicates that while Offense Man Advantage contributes to predicting continued play, there is still room to investigate other contributing factors or potential confounders. This reliable estimation invites further research into how situational or contextual game elements might interact with manpower advantage to affect offensive play.