Assignment 3

This is an analysis of post-season statistics for the NHL and preceding leagues. I first started by loading the data into a frame.

games <- read.csv("C:/Users/egdos/Desktop/homework/ADAFall24/games.csv", header=TRUE)

Graphing Data

Next, I created a histogram showing the frequency of goals against a particular team. This showed that total goals against had a left skew.

hist(games$GA,breaks = seq(0,100,by=5),xlab="Goals Against",ylab="Frequency",main="Goals Against Frequencies")

Then, I created a graph showing the number of post-season appearances for each team in the dataset.

appear_frq <- table(games$tmID)
barplot(appear_frq[1:26],las=2,xlab="Teams",ylab="Appearances",main="# of Post-Season Appearances")

barplot(appear_frq[27:52],las=2,xlab="Teams",ylab="Appearances",main="# of Post-Season Appearances")

barplot(appear_frq[53:77],las=2,xlab="Teams",ylab="Appearances",main="# of Post-Season Appearances")

This has been split into three graphs so the labels remain readable.

Statistical Tests

First, I found some descriptive statistics for games played, just to give an idea of why the playoff appearances numbers are not to be trusted until adjusted for the shorter post-seasons in historical periods.

  mean(games$GA)
## [1] 25.60194
  sd(games$GA)
## [1] 15.45656

The standard deviation of 15 demonstrates a wide range of goals against in this unadjusted state.

Next, I performed a test to investigate potential correlations between losses and goals against, with the theory that more goals against would have a positive correlation with number of losses.

cor.test(games$GA,games$L)
## 
##  Pearson's product-moment correlation
## 
## data:  games$GA and games$L
## t = 46.348, df = 925, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8155840 0.8544577
## sample estimates:
##       cor 
## 0.8360672

The theory is proven right. There is a relatively strong correlation with a highly significant p-value (p<0.01).

But does this set of stats hold true for the modern NHL? I grew curious, and created a variation of the argument for the modern NHL (post-1967), followed by the correlation for the premodern era.

cor.test(games$L[games$year>1966],games$GA[games$year>1966])
## 
##  Pearson's product-moment correlation
## 
## data:  games$L[games$year > 1966] and games$GA[games$year > 1966]
## t = 34.192, df = 679, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7660003 0.8214046
## sample estimates:
##       cor 
## 0.7953578
cor.test(games$L[games$year<1966],games$GA[games$year<1966])
## 
##  Pearson's product-moment correlation
## 
## data:  games$L[games$year < 1966] and games$GA[games$year < 1966]
## t = 24.456, df = 240, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8043517 0.8774046
## sample estimates:
##       cor 
## 0.8447693

The correlation does have a slight difference between the two eras. I then performed a t-test on the goals against data for the two periods, dividing the number of goals by the number of games to try and control for the longer post-season of the modern period.

t.test((games$GA[games$year>1966]/games$G[games$year>1966]),(games$GA[games$year<1966]/games$G[games$year<1966]))
## 
##  Welch Two Sample t-test
## 
## data:  (games$GA[games$year > 1966]/games$G[games$year > 1966]) and (games$GA[games$year < 1966]/games$G[games$year < 1966])
## t = 11.587, df = 399.96, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.7207753 1.0153335
## sample estimates:
## mean of x mean of y 
##  3.285313  2.417259

There is a significant difference between the mean goals against that cannot be solely attributed to more games being played in the modern post-season. I think for a continuation of this project I would like to look at the data set’s player data and track changes in player scoring over time to see if there are trends based on eras and major rule changes.