DATA 607 Final Project

Purpose

Advanced Statistics are making a major impact in the National Hockey League. This is a professional sport where the amount of money a team can spend on average (Salary Cap Hit) is strictly limited. Therefore, decisions on how much, and how long, to pay players are crucial to a team’s success. Typically, these contract negotians include some type of ‘comparables’ where the performance of other similar players is used to set a market value. The goal for this assignment is as follows:

  1. Determine the metrics which best represent player performance as it relates to salary cap.

  2. Create a prediction function to estimate a player’s value based on the aforementioned metrics.

  3. Use the estimated player values to attempt a prediction on team success.

Setup

Required Data Packages

suppressWarnings(suppressMessages(library(XML)))
suppressWarnings(suppressMessages(library(httr)))
suppressWarnings(suppressMessages(library(rvest)))
suppressWarnings(suppressMessages(library(stringr)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(tidyr)))
suppressWarnings(suppressMessages(library(data.table)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(ggthemes)))
suppressWarnings(suppressMessages(library(knitr)))
suppressWarnings(suppressMessages(library(kableExtra)))

Necesary Functions

This function is needed to load all the tables of multi-page html files.

fill_df <- function(pages, data_df, url) {
  for(i in 2:pages){
    tmp_url <- paste0(url,i)
    players_html <- read_html(tmp_url)
    tmp_df <- as.data.frame(players_html %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
    data_df <- rbind(data_df, tmp_df)
  }
  return(data_df)
}

This function formats values to millions.

formattermillion <- function(x){ 
    x/1000000
}

Load html files, convert date to data frames and clean

#Even Strength
nst_on_ice_doc_2015 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2015 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2015 <- read_html("https://www.capfriendly.com/browse/active/2015/caphit/all/all/all/desc/1")
nst_team_doc_2015 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20142015&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2014-10-08&td=2015-04-11")

nst_on_ice_doc_2016 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2016 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2016 <- read_html("https://www.capfriendly.com/browse/active/2016/caphit/all/all/all/desc/1")
nst_team_doc_2016 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20152016&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2015-10-07&td=2016-04-10")

nst_on_ice_doc_2017 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2017 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2017 <- read_html("https://www.capfriendly.com/browse/active/2017/caphit/all/all/all/desc/1")
nst_team_doc_2017 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20162017&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2016-10-12&td=2017-04-09")

nst_on_ice_doc_2018 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2018 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2018 <- read_html("https://www.capfriendly.com/browse/active/2018/caphit/all/all/all/desc/1")
nst_team_doc_2018 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20172018&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2016-10-12&td=2017-04-09")

#PowerPlay
nst_on_ice_doc_2015_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2015_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

nst_on_ice_doc_2016_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2016_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

nst_on_ice_doc_2017_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2017_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

nst_on_ice_doc_2018_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2018_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

#PenaltyKill
nst_on_ice_doc_2015_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2015_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

nst_on_ice_doc_2016_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2016_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

nst_on_ice_doc_2017_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2017_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

nst_on_ice_doc_2018_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2018_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")

team_cap_html <- read_html("https://www.capfriendly.com/")
team_cap_df <- as.data.frame(team_cap_html %>% html_nodes(xpath = "//*/table[@id='ich']") %>% html_table())

#Even Strength
nst_on_ice_players_2015 <- as.data.frame(nst_on_ice_doc_2015 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2015)
nst_on_ice_players_2016 <- as.data.frame(nst_on_ice_doc_2016 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2016)
nst_on_ice_players_2017 <- as.data.frame(nst_on_ice_doc_2017 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2017)
nst_on_ice_players_2018 <- as.data.frame(nst_on_ice_doc_2018 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2018)

#Power Play
nst_on_ice_players_2015_pp <- as.data.frame(nst_on_ice_doc_2015_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2015)
colnames(nst_on_ice_players_2015_pp) <- paste(colnames(nst_on_ice_players_2015_pp), "pp", sep = "_")
nst_on_ice_players_2015_pp <- nst_on_ice_players_2015_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

nst_on_ice_players_2016_pp <- as.data.frame(nst_on_ice_doc_2016_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2016)
colnames(nst_on_ice_players_2016_pp) <- paste(colnames(nst_on_ice_players_2016_pp), "pp", sep = "_")
nst_on_ice_players_2016_pp <- nst_on_ice_players_2016_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

nst_on_ice_players_2017_pp <- as.data.frame(nst_on_ice_doc_2017_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2017)
colnames(nst_on_ice_players_2017_pp) <- paste(colnames(nst_on_ice_players_2017_pp), "pp", sep = "_")
nst_on_ice_players_2017_pp <- nst_on_ice_players_2017_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

nst_on_ice_players_2018_pp <- as.data.frame(nst_on_ice_doc_2018_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2018)
colnames(nst_on_ice_players_2018_pp) <- paste(colnames(nst_on_ice_players_2018_pp), "pp", sep = "_")
nst_on_ice_players_2018_pp <- nst_on_ice_players_2018_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

#Penalty Kill
nst_on_ice_players_2015_pk <- as.data.frame(nst_on_ice_doc_2015_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2015)
colnames(nst_on_ice_players_2015_pk) <- paste(colnames(nst_on_ice_players_2015_pk), "pk", sep = "_")
nst_on_ice_players_2015_pk <- nst_on_ice_players_2015_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

nst_on_ice_players_2016_pk <- as.data.frame(nst_on_ice_doc_2016_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2016)
colnames(nst_on_ice_players_2016_pk) <- paste(colnames(nst_on_ice_players_2016_pk), "pk", sep = "_")
nst_on_ice_players_2016_pk <- nst_on_ice_players_2016_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

nst_on_ice_players_2017_pk <- as.data.frame(nst_on_ice_doc_2017_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2017)
colnames(nst_on_ice_players_2017_pk) <- paste(colnames(nst_on_ice_players_2017_pk), "pk", sep = "_")
nst_on_ice_players_2017_pk <- nst_on_ice_players_2017_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

nst_on_ice_players_2018_pk <- as.data.frame(nst_on_ice_doc_2018_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2018)
colnames(nst_on_ice_players_2018_pk) <- paste(colnames(nst_on_ice_players_2018_pk), "pk", sep = "_")
nst_on_ice_players_2018_pk <- nst_on_ice_players_2018_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

#Even Strength
nst_individual_players_2015 <- as.data.frame(nst_individual_doc_2015 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
nst_individual_players_2016 <- as.data.frame(nst_individual_doc_2016 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
nst_individual_players_2017 <- as.data.frame(nst_individual_doc_2017 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
nst_individual_players_2018 <- as.data.frame(nst_individual_doc_2018 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))

#Power Play
nst_individual_players_2015_pp <- as.data.frame(nst_individual_doc_2015_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2015_pp) <- paste(colnames(nst_individual_players_2015_pp), "pp", sep = "_")
nst_individual_players_2015_pp <- nst_individual_players_2015_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

nst_individual_players_2016_pp <- as.data.frame(nst_individual_doc_2016_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2016_pp) <- paste(colnames(nst_individual_players_2016_pp), "pp", sep = "_")
nst_individual_players_2016_pp <- nst_individual_players_2016_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

nst_individual_players_2017_pp <- as.data.frame(nst_individual_doc_2017_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2017_pp) <- paste(colnames(nst_individual_players_2017_pp), "pp", sep = "_")
nst_individual_players_2017_pp <- nst_individual_players_2017_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

nst_individual_players_2018_pp <- as.data.frame(nst_individual_doc_2018_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2018_pp) <- paste(colnames(nst_individual_players_2018_pp), "pp", sep = "_")
nst_individual_players_2018_pp <- nst_individual_players_2018_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)

#Penalty Kill
nst_individual_players_2015_pk <- as.data.frame(nst_individual_doc_2015_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2015_pk) <- paste(colnames(nst_individual_players_2015_pk), "pk", sep = "_")
nst_individual_players_2015_pk <- nst_individual_players_2015_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

nst_individual_players_2016_pk <- as.data.frame(nst_individual_doc_2016_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2016_pk) <- paste(colnames(nst_individual_players_2016_pk), "pk", sep = "_")
nst_individual_players_2016_pk <- nst_individual_players_2016_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

nst_individual_players_2017_pk <- as.data.frame(nst_individual_doc_2017_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2017_pk) <- paste(colnames(nst_individual_players_2017_pk), "pk", sep = "_")
nst_individual_players_2017_pk <- nst_individual_players_2017_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

nst_individual_players_2018_pk <- as.data.frame(nst_individual_doc_2018_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2018_pk) <- paste(colnames(nst_individual_players_2018_pk), "pk", sep = "_")
nst_individual_players_2018_pk <- nst_individual_players_2018_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)

last_page_2015 <- as.numeric(str_match_all(players_html_2015 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2015 <- as.data.frame(players_html_2015 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2015 <- fill_df(last_page_2015, players_cap_raw_df_2015, "https://www.capfriendly.com/browse/active/2015/caphit/all/all/all/desc/")

last_page_2016 <- as.numeric(str_match_all(players_html_2016 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2016 <- as.data.frame(players_html_2016 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2016 <- fill_df(last_page_2016, players_cap_raw_df_2016, "https://www.capfriendly.com/browse/active/2017/caphit/all/all/all/desc/")

last_page_2017 <- as.numeric(str_match_all(players_html_2017 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2017 <- as.data.frame(players_html_2017 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2017 <- fill_df(last_page_2017, players_cap_raw_df_2017, "https://www.capfriendly.com/browse/active/2017/caphit/all/all/all/desc/")

last_page_2018 <- as.numeric(str_match_all(players_html_2018 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2018 <- as.data.frame(players_html_2018 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2018 <- fill_df(last_page_2018, players_cap_raw_df_2018, "https://www.capfriendly.com/browse/active/2018/caphit/all/all/all/desc/")

nst_team_2015_df <- as.data.frame(nst_team_doc_2015 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2015_df$year <- 2015
nst_team_2016_df <- as.data.frame(nst_team_doc_2016 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2016_df$year <- 2016
nst_team_2017_df <- as.data.frame(nst_team_doc_2017 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2017_df$year <- 2017
nst_team_2018_df <- as.data.frame(nst_team_doc_2018 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2018_df$year <- 2018
#Be kind, rewind.... and cache web scrapes.
load(url("https://github.com/john-grando/data607-finalproject/raw/master/html_files.RData"))
#remove duplicate columns
drop <- c("","Position","TOI", "GP", "Var.1")
nst_raw_players_2015 <- left_join(nst_on_ice_players_2015[, !names(nst_on_ice_players_2015) %in% c("Var.1")], nst_individual_players_2015[, !names(nst_individual_players_2015) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2015_pp[, !names(nst_on_ice_players_2015_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2015_pp[, !names(nst_individual_players_2015_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2015_pk[, !names(nst_on_ice_players_2015_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2015_pk[, !names(nst_individual_players_2015_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))

nst_raw_players_2016 <- left_join(nst_on_ice_players_2016[, !names(nst_on_ice_players_2016) %in% c("Var.1")], nst_individual_players_2016[, !names(nst_individual_players_2016) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2016_pp[, !names(nst_on_ice_players_2016_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2016_pp[, !names(nst_individual_players_2016_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2016_pk[, !names(nst_on_ice_players_2016_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2016_pk[, !names(nst_individual_players_2016_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))

nst_raw_players_2017 <- left_join(nst_on_ice_players_2017[, !names(nst_on_ice_players_2017) %in% c("Var.1")], nst_individual_players_2017[, !names(nst_individual_players_2017) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2017_pp[, !names(nst_on_ice_players_2017_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2017_pp[, !names(nst_individual_players_2017_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2017_pk[, !names(nst_on_ice_players_2017_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2017_pk[, !names(nst_individual_players_2017_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))

nst_raw_players_2018 <- left_join(nst_on_ice_players_2018[, !names(nst_on_ice_players_2018) %in% c("Var.1")], nst_individual_players_2018[, !names(nst_individual_players_2018) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2018_pp[, !names(nst_on_ice_players_2018_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2018_pp[, !names(nst_individual_players_2018_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2018_pk[, !names(nst_on_ice_players_2018_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2018_pk[, !names(nst_individual_players_2018_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))

nst_players_2015 <- nst_raw_players_2015 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
keep <- c("PLAYER","EXPIRY","CAP.HIT..")
nst_players_2016 <- nst_raw_players_2016 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
keep <- c("PLAYER","EXPIRY","CAP.HIT..")
nst_players_2017 <- nst_raw_players_2017 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
nst_players_2018 <- nst_raw_players_2018 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
keep <- c("PLAYER","EXPIRY","CAP.HIT..")

players_cap_df_2015 <- players_cap_raw_df_2015[names(players_cap_raw_df_2015) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
players_cap_df_2016 <- players_cap_raw_df_2016[names(players_cap_raw_df_2016) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
players_cap_df_2017 <- players_cap_raw_df_2017[names(players_cap_raw_df_2017) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
players_cap_df_2018 <- players_cap_raw_df_2018[names(players_cap_raw_df_2018) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))

setnames(players_cap_df_2015, old=c("PLAYER"), new=c("Player"))
setnames(players_cap_df_2016, old=c("PLAYER"), new=c("Player"))
setnames(players_cap_df_2017, old=c("PLAYER"), new=c("Player"))
setnames(players_cap_df_2018, old=c("PLAYER"), new=c("Player"))

full_player_df_2015 <- unique(inner_join(nst_players_2015, players_cap_df_2015, by = "Player"))
full_player_df_2016 <- unique(inner_join(nst_players_2016, players_cap_df_2016, by = "Player"))
full_player_df_2017 <- unique(inner_join(nst_players_2017, players_cap_df_2017, by = "Player"))
full_player_df_2018 <- unique(inner_join(nst_players_2018, players_cap_df_2018, by = "Player"))

full_player_df <- rbind(full_player_df_2015, full_player_df_2016, full_player_df_2017, full_player_df_2018)

Convert stats to per-minute-played metrics

player_per_game_df <- full_player_df[,c("CF","CA","FF","FA","SF","SA","GF","GA","SCF","SCA","HDCF","HDCA","HDGF","HDGA","MDCF","MDCA", "MDGF","MDGA","LDCF","LDCA","LDGF","LDGA","Off..Zone.Faceoffs","Neu..Zone.Faceoffs","Def..Zone.Faceoffs","Goals","Total.Assists", "First.Assists","Second.Assists","Total.Points","Shots","iCF","iSCF","iHDCF", "Rebounds.Created", "PIM", "Total.Penalties", "Minor", "Major", "Misconduct", "Penalties.Drawn", "Giveaways", "Takeaways", "Hits", "Hits.Taken", "Shots.Blocked", "CF_pp", "CA_pp", "FF_pp", "FA_pp", "SF_pp", "SA_pp", "GF_pp", "GA_pp", "SCF_pp", "SCA_pp", "HDCF_pp", "HDCA_pp", "HDGF_pp", "HDGA_pp", "MDCF_pp", "MDCA_pp", "MDGF_pp", "MDGA_pp", "LDCF_pp", "LDCA_pp", "LDGF_pp", "LDGA_pp", "Off..Zone.Faceoffs_pp","Neu..Zone.Faceoffs_pp","Def..Zone.Faceoffs_pp","Goals_pp","Total.Assists_pp", "First.Assists_pp", "Second.Assists_pp", "Total.Points_pp", "Shots_pp", "iCF_pp","iSCF_pp", "iHDCF_pp", "Rebounds.Created_pp", "PIM_pp", "Total.Penalties_pp", "Minor_pp", "Major_pp", "Misconduct_pp", "Penalties.Drawn_pp", "Giveaways_pp", "Takeaways_pp", "Hits_pp", "Hits.Taken_pp", "Shots.Blocked_pp", "CF_pk", "CA_pk", "FF_pk", "FA_pk", "SF_pk", "SA_pk", "GF_pk", "GA_pk", "SCF_pk", "SCA_pk", "HDCF_pk", "HDCA_pk", "HDGF_pk", "HDGA_pk", "MDCF_pk", "MDCA_pk", "MDGF_pk", "MDGA_pk", "LDCF_pk","LDCA_pk", "LDGF_pk","LDGA_pk", "Off..Zone.Faceoffs_pk", "Neu..Zone.Faceoffs_pk", "Def..Zone.Faceoffs_pk","Goals_pk","Total.Assists_pk", "First.Assists_pk", "Second.Assists_pk", "Total.Points_pk", "Shots_pk", "iCF_pk","iSCF_pk", "iHDCF_pk", "Rebounds.Created_pk", "PIM_pk", "Total.Penalties_pk", "Minor_pk", "Major_pk", "Misconduct_pk", "Penalties.Drawn_pk", "Giveaways_pk", "Takeaways_pk", "Hits_pk", "Hits.Taken_pk", "Shots.Blocked_pk")] / full_player_df[,5]
player_per_game_df <- cbind(full_player_df[,1:4],player_per_game_df, full_player_df[,c("TOI","EXPIRY", "CAP.HIT..", "year")])
player_per_game_df$Position <- ifelse(player_per_game_df$Position=="D","D","F")
player_per_game_df$Position <- factor(player_per_game_df$Position, levels = c("F", "D"))

Initial Data Summary

Team Cap Graph

Here, it is noted that not all teams spend to their maximum cap so it is important to analyze the individual metrics for players.

team_cap_df <- team_cap_df %>% mutate(CAP.HIT = 75000000 - as.numeric(gsub('[$,]', '', PROJECTED.CAP.SPACE)))
ggplot(team_cap_df, aes(x=reorder(TEAM, -CAP.HIT), y=CAP.HIT)) + geom_col(fill = "steelblue") + theme_economist() + labs(x = "Team", y = "Cap Hit (Millions)", title = "NHL Teams") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12)) + coord_flip() + scale_y_continuous(limits = c(0,80000000), breaks = (seq(0, 80000000, 5000000)), labels = formattermillion)

Player Salary Distributions

This is where the data gets a little challenging. There are actual a few different contract types in the NHL. First, there is an Entry Level Contract (ELC) where a player signs for a relatively fixed amount (typically less than one million per year for three years). Next, there is a Restricted Free Agent Contract (RFA) where a player must sign with their team, and if another team were to extend an “offer sheet” to a player to have him switch teams, it would come with additional compensation to the former team (draft picks). Finally, there are Unrestricted Free Agent Contracts (UFA) where players may sign for any amount or term (with some capped restrictions). In order to try and determine a “true” player value, this analysis tried to focus on UFA contracts by basing a filter on the expiration status of a contract (RFA, UFA) and cap hit (more than one million dollars).

ufa_player_df <- player_per_game_df[player_per_game_df$EXPIRY=="UFA" & player_per_game_df$CAP.HIT..>1000000 & player_per_game_df$GP>41 & player_per_game_df$year < 2018,]
#ufa_player_df <- player_per_game_df[!duplicated(player_per_game_df[,c("Player","CAP.HIT..")]),]
f_per_game_df <- ufa_player_df[ufa_player_df$Position!="D",]
d_per_game_df <- ufa_player_df[ufa_player_df$Position=="D",]
all_median = median(player_per_game_df$CAP.HIT..)
ufa_median = median(ufa_player_df$CAP.HIT..)
#ggplot(player_per_game_df, aes(x=CF, fill=Position)) + geom_histogram(bins = 20)
ggplot(player_per_game_df, aes(x=CAP.HIT.., fill=Position)) + geom_histogram(bins=20, position = "identity", alpha = 0.2) + facet_wrap(~EXPIRY) + geom_vline(xintercept = all_median) + geom_text(aes(x=(all_median-500000), y=150, label="All player median"), angle=90, size = 5) + geom_vline(xintercept = ufa_median) + geom_text(aes(x=(ufa_median-500000), y=150, label="UFA player median"), angle=90, size = 5) + theme_economist() + labs(x = "Cap Hit", y = "Count", title = "NHL Players") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(0, 11000000), breaks = (seq(0, 11000000, 1000000)), labels = formattermillion) + scale_fill_manual(values=c("mediumblue", "springgreen"))

ggplot(ufa_player_df, aes(x=CAP.HIT.., fill=Position)) + geom_histogram(bins=20, position = "identity", alpha = 0.2) + geom_vline(xintercept = ufa_median) + geom_text(aes(x=(ufa_median-300000), y=30, label="UFA player median"), angle=90, size = 5) + theme_economist() + labs(x = "Cap Hit", y = "Count", title = "UFA Players") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(0, 11000000), breaks = (seq(0, 11000000, 1000000)), labels = formattermillion) + scale_fill_manual(values=c("mediumblue", "springgreen"))

Analysis

Mix up the data and separate into training and development sets

Note, since the roles of forwards and defensemen are so different, they have been separately analyzed in this study.

f_complete_player_df <- f_per_game_df[complete.cases(f_per_game_df), ]
d_complete_player_df <- d_per_game_df[complete.cases(d_per_game_df), ]
sample_size <- floor(0.8 * nrow(f_per_game_df))
set.seed(8345)
f_train_dev_ind <- sample(seq_len(nrow(f_complete_player_df)), size = sample_size)
f_train_df <-f_complete_player_df[f_train_dev_ind,]
f_dev_df <- f_complete_player_df[-f_train_dev_ind,]

sample_size <- floor(0.8 * nrow(d_per_game_df))
set.seed(3454)
d_train_dev_ind <- sample(seq_len(nrow(d_complete_player_df)), size = sample_size)
d_train_df <-d_complete_player_df[d_train_dev_ind,]
d_dev_df <- d_complete_player_df[-d_train_dev_ind,]

Perform glm regressions with step functions

f_glm <- glm(data = f_train_df, CAP.HIT.. ~ CF + CA + SCF + SCA + HDCF + HDCA + MDCF + MDCA + LDCF + LDCA + Goals + Total.Assists + First.Assists + Second.Assists + Total.Points + Shots + iCF + iSCF + iHDCF + Rebounds.Created + PIM + Total.Penalties + Minor + Major + Misconduct + Penalties.Drawn + Giveaways + Takeaways + Hits + Hits.Taken + Shots.Blocked + CF_pp + CA_pp + SCF_pp + SCA_pp + HDCF_pp + HDCA_pp + MDCF_pp + MDCA_pp + LDCF_pp + LDCA_pp + Goals_pp + Total.Assists_pp + First.Assists_pp + Second.Assists_pp + Total.Points_pp + Shots_pp + iCF_pp + iSCF_pp + iHDCF_pp + Rebounds.Created_pp + PIM_pp + Total.Penalties_pp + Minor_pp + Major_pp + Misconduct_pp + Penalties.Drawn_pp + Giveaways_pp + Takeaways_pp + Hits_pp + Hits.Taken_pp + Shots.Blocked_pp + CF_pk + CA_pk + SCF_pk + SCA_pk + HDCF_pk + HDCA_pk + MDCF_pk + MDCA_pk + LDCF_pk + LDCA_pk + Goals_pk + Total.Assists_pk + First.Assists_pk + Second.Assists_pk + Total.Points_pk + Shots_pk + iCF_pk + iSCF_pk + iHDCF_pk + Rebounds.Created_pk + PIM_pk + Total.Penalties_pk + Minor_pk + Major_pk + Misconduct_pk + Penalties.Drawn_pk + Giveaways_pk + Takeaways_pk + Hits_pk + Hits.Taken_pk + Shots.Blocked_pk, family = "Gamma" (link = 'log'))
step_f_glm <- step(f_glm, trace = FALSE)
summary(step_f_glm)
## 
## Call:
## glm(formula = CAP.HIT.. ~ CF + Goals + Total.Assists + Shots + 
##     Total.Penalties + Minor + Takeaways + Hits.Taken + CF_pp + 
##     CA_pp + SCA_pp + HDCF_pp + LDCA_pp + Goals_pp + Total.Assists_pp + 
##     Shots_pp + iSCF_pp + iHDCF_pp + SCA_pk + HDCF_pk + HDCA_pk + 
##     Total.Assists_pk + iCF_pk + iSCF_pk + Rebounds.Created_pk + 
##     Penalties.Drawn_pk, family = Gamma(link = "log"), data = f_train_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2470  -0.2638  -0.0255   0.2118   0.8513  
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          14.2087     0.1867  76.103  < 2e-16 ***
## CF                    0.4760     0.2324   2.048 0.041183 *  
## Goals               -10.0489     4.3190  -2.327 0.020449 *  
## Total.Assists         9.5873     3.3909   2.827 0.004914 ** 
## Shots                 2.2237     0.7988   2.784 0.005612 ** 
## Total.Penalties     -63.3366    13.0787  -4.843 1.79e-06 ***
## Minor                64.4153    13.5545   4.752 2.75e-06 ***
## Takeaways             2.6666     1.7598   1.515 0.130449    
## Hits.Taken           -2.3178     0.5818  -3.983 7.98e-05 ***
## CF_pp                 1.8019     0.4640   3.883 0.000119 ***
## CA_pp               -10.1331     5.9970  -1.690 0.091816 .  
## SCA_pp               14.7489     7.3924   1.995 0.046662 *  
## HDCF_pp              -5.5121     1.7483  -3.153 0.001730 ** 
## LDCA_pp              19.6599     7.6634   2.565 0.010645 *  
## Goals_pp            -16.2565     8.4995  -1.913 0.056464 .  
## Total.Assists_pp     14.7384     5.9731   2.467 0.013998 *  
## Shots_pp              9.8613     2.8874   3.415 0.000698 ***
## iSCF_pp              -9.9211     3.2312  -3.070 0.002274 ** 
## iHDCF_pp              9.0660     3.0494   2.973 0.003115 ** 
## SCA_pk               -2.0989     1.2450  -1.686 0.092538 .  
## HDCF_pk             -22.8880     7.9500  -2.879 0.004190 ** 
## HDCA_pk               5.6729     3.1166   1.820 0.069429 .  
## Total.Assists_pk     55.7194    26.9421   2.068 0.039231 *  
## iCF_pk              -12.7809     7.6090  -1.680 0.093746 .  
## iSCF_pk              19.0641    11.7231   1.626 0.104646    
## Rebounds.Created_pk  96.7542    29.8999   3.236 0.001307 ** 
## Penalties.Drawn_pk   15.0288     9.1585   1.641 0.101540    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.1137775)
## 
##     Null deviance: 119.038  on 453  degrees of freedom
## Residual deviance:  52.455  on 427  degrees of freedom
## AIC: 14106
## 
## Number of Fisher Scoring iterations: 5
d_glm <- glm(data = d_train_df, CAP.HIT.. ~ CF + CA + SCF + SCA + HDCF + HDCA + MDCF + MDCA + LDCF + LDCA + Goals + Total.Assists + First.Assists + Second.Assists + Total.Points + Shots + iCF + iSCF + iHDCF + Rebounds.Created + PIM + Total.Penalties + Minor + Major + Misconduct + Penalties.Drawn + Giveaways + Takeaways + Hits + Hits.Taken + Shots.Blocked + CF_pp + CA_pp + SCF_pp + SCA_pp + HDCF_pp + HDCA_pp + MDCF_pp + MDCA_pp + LDCF_pp + LDCA_pp + Goals_pp + Total.Assists_pp + First.Assists_pp + Second.Assists_pp + Total.Points_pp + Shots_pp + iCF_pp + iSCF_pp + iHDCF_pp + Rebounds.Created_pp + PIM_pp + Total.Penalties_pp + Minor_pp + Major_pp + Misconduct_pp + Penalties.Drawn_pp + Giveaways_pp + Takeaways_pp + Hits_pp + Hits.Taken_pp + Shots.Blocked_pp + CF_pk + CA_pk + SCF_pk + SCA_pk + HDCF_pk + HDCA_pk + MDCF_pk + MDCA_pk + LDCF_pk + LDCA_pk + Goals_pk + Total.Assists_pk + First.Assists_pk + Second.Assists_pk + Total.Points_pk + Shots_pk + iCF_pk + iSCF_pk + iHDCF_pk + Rebounds.Created_pk + PIM_pk + Total.Penalties_pk + Minor_pk + Major_pk + Misconduct_pk + Penalties.Drawn_pk + Giveaways_pk + Takeaways_pk + Hits_pk + Hits.Taken_pk + Shots.Blocked_pk, family = "Gamma" (link = 'log'))
step_d_glm <- step(d_glm, trace = FALSE)
summary(step_d_glm)
## 
## Call:
## glm(formula = CAP.HIT.. ~ Goals + Shots + iSCF + Hits.Taken + 
##     Shots.Blocked + CF_pp + SCF_pp + LDCF_pp + Shots_pp + Rebounds.Created_pp + 
##     PIM_pp + Total.Penalties_pp + Minor_pp + CA_pk + SCA_pk + 
##     LDCA_pk + Total.Assists_pk + PIM_pk + Total.Penalties_pk + 
##     Minor_pk + Giveaways_pk, family = Gamma(link = "log"), data = d_train_df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.00038  -0.22529   0.01786   0.21782   0.72854  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           15.4079     0.1647  93.533  < 2e-16 ***
## Goals                -43.5053    13.6619  -3.184  0.00164 ** 
## Shots                  3.7504     1.8355   2.043  0.04213 *  
## iSCF                  -6.2090     2.7194  -2.283  0.02330 *  
## Hits.Taken            -2.5047     0.7526  -3.328  0.00101 ** 
## Shots.Blocked         -2.7418     1.1342  -2.417  0.01639 *  
## CF_pp                 24.4404    10.4749   2.333  0.02047 *  
## SCF_pp               -24.2866    10.8339  -2.242  0.02590 *  
## LDCF_pp              -25.2514    10.9237  -2.312  0.02165 *  
## Shots_pp              15.1178     5.0835   2.974  0.00324 ** 
## Rebounds.Created_pp  -42.5128    18.6720  -2.277  0.02369 *  
## PIM_pp                72.2512    35.0528   2.061  0.04037 *  
## Total.Penalties_pp  -459.8654   191.7612  -2.398  0.01725 *  
## Minor_pp             351.2200   128.4503   2.734  0.00672 ** 
## CA_pk                -22.5169     7.7119  -2.920  0.00384 ** 
## SCA_pk                24.6426     7.8301   3.147  0.00186 ** 
## LDCA_pk               22.1086     8.2589   2.677  0.00795 ** 
## Total.Assists_pk      94.3023    39.9053   2.363  0.01893 *  
## PIM_pk                56.7550    25.6723   2.211  0.02801 *  
## Total.Penalties_pk  -562.6875   200.4699  -2.807  0.00542 ** 
## Minor_pk             356.7556   152.0071   2.347  0.01975 *  
## Giveaways_pk          24.5560     8.9598   2.741  0.00660 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.1146516)
## 
##     Null deviance: 58.450  on 259  degrees of freedom
## Residual deviance: 32.468  on 238  degrees of freedom
## AIC: 8101
## 
## Number of Fisher Scoring iterations: 6

Regression analysis results

f_train_df$EstCapHit <- step_f_glm$fitted.values
f_train_df$EstCapHit_res <- step_f_glm$residuals
d_train_df$EstCapHit <- step_d_glm$fitted.values
d_train_df$EstCapHit_res <- step_d_glm$residuals

ggplot(f_train_df, aes(x=EstCapHit_res)) + geom_histogram(bins = 15) + theme_economist() + labs(x = "Residuals", y = "Count", title = "Forwards Residuals") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(-1.6, 1.6), breaks = (seq(-1.6, 1.6, 0.2)))

qqnorm(step_f_glm$residuals)
qqline(step_f_glm$residuals)

ggplot(d_train_df, aes(x=EstCapHit_res)) + geom_histogram(bins = 15) + theme_economist() + labs(x = "Residuals", y = "Count", title = "Defense Residuals") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(-1.6, 1.6), breaks = (seq(-1.6, 1.6, 0.2)))

qqnorm(step_d_glm$residuals)
qqline(step_d_glm$residuals)

Test against development data

f_dev_df$EstCapHit <- predict(step_f_glm, f_dev_df[,c("CF", "Goals", "Total.Assists", "Shots", "Total.Penalties", "Minor", "Takeaways", "Hits.Taken", "CF_pp", "CA_pp", "SCA_pp", "HDCF_pp", "LDCA_pp", "Goals_pp", "Total.Assists_pp", "Shots_pp", "iSCF_pp", "iHDCF_pp", "SCA_pk", "HDCF_pk", "HDCA_pk", "Total.Assists_pk", "iCF_pk", "iSCF_pk", "Rebounds.Created_pk", "Penalties.Drawn_pk")], type = "response")

d_dev_df$EstCapHit <- predict(step_d_glm, d_dev_df[,c("Goals", "Shots", "iSCF", "Hits.Taken", "Shots.Blocked", "CF_pp", "SCF_pp", "LDCF_pp", "Shots_pp", "Rebounds.Created_pp", "PIM_pp", "Total.Penalties_pp", "Minor_pp", "CA_pk", "SCA_pk", "LDCA_pk", "Total.Assists_pk", "PIM_pk", "Total.Penalties_pk", "Minor_pk", "Giveaways_pk")], type = "response")

f_dev_summary_df <- f_dev_df %>% mutate(variation = 1 - CAP.HIT.. / EstCapHit) %>% summarize(avg = mean(variation), max = max(variation), min = min(variation), sd = sd(variation))
d_dev_summary_df <- d_dev_df %>% mutate(variation = 1 - CAP.HIT.. / EstCapHit) %>% summarize(avg = mean(variation), max = max(variation), min = min(variation), sd = sd(variation))
#variation
f_dev_summary_df
##          avg      max       min        sd
## 1 0.00282645 0.701659 -1.581916 0.4013641
d_dev_summary_df
##          avg       max       min        sd
## 1 0.02126109 0.6985328 -1.637093 0.4381454
f_train_df$type = "Training"
f_dev_df$type = "Development"
f_train_dev_df <- rbind(f_train_df[,c("type","CAP.HIT..", "EstCapHit")], f_dev_df[,c("type","CAP.HIT..", "EstCapHit")]) %>% gather(captype, "value", 2:3)
f_train_dev_df <- transform(f_train_dev_df, type=factor(type, levels=c("Training","Development")))
ggplot(f_train_dev_df, aes(x = captype, y=value)) + geom_boxplot() + facet_wrap(~type) + theme_economist() + labs(x = "Group", y = "Cap Hit (Millions)", title = "Actual Cap Hit vs. Training and Development Data") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_y_continuous(limits = c(0, 15000000), breaks = (seq(0, 15000000, 2000000)), labels = formattermillion) + scale_x_discrete(labels = c("Cap Hit", "Estimated"))

Application of Results

Apply glm predictions to all data

player_per_game_df[is.na(player_per_game_df)] <- 0
player_per_game_df$EstCapHit <- ifelse(player_per_game_df$Position!="D", predict(step_f_glm, player_per_game_df[,c("CF", "Goals", "Total.Assists", "Shots", "Total.Penalties", "Minor", "Takeaways", "Hits.Taken", "CF_pp", "CA_pp", "SCA_pp", "HDCF_pp", "LDCA_pp", "Goals_pp", "Total.Assists_pp", "Shots_pp", "iSCF_pp", "iHDCF_pp", "SCA_pk", "HDCF_pk", "HDCA_pk", "Total.Assists_pk", "iCF_pk", "iSCF_pk", "Rebounds.Created_pk", "Penalties.Drawn_pk")], type = "response"), predict(step_d_glm, player_per_game_df[,c("Goals", "Shots", "iSCF", "Hits.Taken", "Shots.Blocked", "CF_pp", "SCF_pp", "LDCF_pp", "Shots_pp", "Rebounds.Created_pp", "PIM_pp", "Total.Penalties_pp", "Minor_pp", "CA_pk", "SCA_pk", "LDCA_pk", "Total.Assists_pk", "PIM_pk", "Total.Penalties_pk", "Minor_pk", "Giveaways_pk")], type = "response"))

player_f_det_df <- player_per_game_df %>% filter(Position != "D") %>% select(Player, Team, year, CF, Goals, Total.Assists, Shots, Total.Penalties, Minor, Takeaways, Hits.Taken, CF_pp, CA_pp, SCA_pp, HDCF_pp, LDCA_pp, Goals_pp, Total.Assists_pp, Shots_pp, iSCF_pp, iHDCF_pp, SCA_pk, HDCF_pk, HDCA_pk, Total.Assists_pk, iCF_pk, iSCF_pk, Rebounds.Created_pk, Penalties.Drawn_pk, CAP.HIT.., EstCapHit)

player_d_det_df <- player_per_game_df %>% filter(Position == "D") %>% select(Player, Team, year, Goals, Shots, iSCF, Hits.Taken, Shots.Blocked, CF_pp, SCF_pp, LDCF_pp, Shots_pp, Rebounds.Created_pp, PIM_pp, Total.Penalties_pp, Minor_pp, CA_pk, SCA_pk, LDCA_pk, Total.Assists_pk, PIM_pk, Total.Penalties_pk, Minor_pk, Giveaways_pk, EstCapHit)

player_f_df <- player_per_game_df %>% filter(Position != "D") %>% filter(year < 2018) %>% filter(GP > 60) %>% select(Player, Team, year, GP, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)

player_d_df <- player_per_game_df %>% filter(Position == "D") %>% filter(year < 2018) %>% filter(GP > 60) %>% select(Player, Team, year, GP, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)

Higest value forwards historically

head(player_f_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
Player Team year GP Cap_Hit Estimated_Cap_Hit
Claude_Giroux PHI 2016 78 8275000 9536499
Jason_Spezza DAL 2015 82 7000000 9058724
Alex_Ovechkin WSH 2017 82 9538462 8692597
Tyler_Seguin DAL 2015 71 5750000 8291754
Ryan_O_Reilly BUF 2017 72 7500000 8187712
Jason_Spezza DAL 2016 75 7500000 8016420
Tyler_Seguin DAL 2017 82 5750000 7953395
Phil_Kessel TOR 2015 82 8000000 7935799
Claude_Giroux PHI 2017 82 8275000 7836780
Alex_Ovechkin WSH 2015 81 9538462 7798531
Phil_Kessel PIT 2016 82 8000000 7572535
Joe_Thornton S.J 2016 82 6750000 7527376
Jason_Spezza DAL 2017 68 7500000 7518205
Evgeni_Malkin PIT 2017 62 9500000 7453605
Mark_Stone OTT 2016 75 3500000 7372516
Bobby_Ryan OTT 2015 78 5100000 7342898
Jamie_Benn DAL 2016 82 5250000 7328143
Anze_Kopitar L.A 2016 81 6800000 7151951
Mike_Hoffman OTT 2017 74 5187500 7081869
Tyler_Seguin DAL 2016 72 5750000 7075758

Higest value defensemen historically

head(player_d_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
Player Team year GP Cap_Hit Estimated_Cap_Hit
P_K__Subban MTL 2016 68 9000000 10954811
Keith_Yandle FLA 2017 82 6350000 9352123
Dion_Phaneuf TOR 2015 70 7000000 8852881
Shea_Weber MTL 2017 78 7857143 8819301
Dougie_Hamilton BOS 2015 72 925000 7688128
John_Carlson WSH 2017 72 3966667 7613749
Kris_Letang PIT 2016 71 7250000 7353238
Alexander_Edler VAN 2015 74 5000000 7305675
Drew_Doughty L.A 2017 82 7000000 7257779
Shea_Weber NSH 2015 78 7857143 7082954
Aaron_Ekblad FLA 2017 68 925000 7072747
Oliver_Ekman_Larsson ARI 2016 75 5500000 7044832
Jake_Muzzin L.A 2017 82 4000000 6993346
Brent_Burns S.J 2016 82 5760000 6947536
Shayne_Gostisbehere PHI 2016 64 925000 6771965
Kris_Letang PIT 2015 69 7250000 6715196
Drew_Doughty L.A 2016 82 7000000 6577072
P_K__Subban MTL 2015 82 9000000 6553050
Ryan_Suter MIN 2016 82 7538462 6472063
Torey_Krug BOS 2016 81 5250000 6439972

Higest value forwards this year

player_2018_f_df <- player_per_game_df %>% filter(Position != "D") %>% filter(year == 2018) %>% filter(GP > 5) %>% select(Player, Team, year, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)
head(player_2018_f_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
Player Team year Cap_Hit Estimated_Cap_Hit
Jakub_Voracek PHI 2018 8250000 17932527
Jonathan_Toews CHI 2018 10500000 15682475
Evgeni_Malkin PIT 2018 9500000 11020279
Evgeny_Kuznetsov WSH 2018 7800000 9733922
Jonathan_Drouin MTL 2018 5500000 9508431
Joe_Thornton S.J 2018 8000000 9469489
Mike_Hoffman OTT 2018 5187500 9319404
David_Backes BOS 2018 6000000 9225687
Patrick_Kane CHI 2018 10500000 9144583
Johnny_Gaudreau CGY 2018 6750000 8738026
Patrice_Bergeron BOS 2018 6875000 8357984
Vladimir_Tarasenko STL 2018 7500000 8332791
Blake_Wheeler WPG 2018 5600000 8156201
Steven_Stamkos T.B 2018 8500000 8151298
Max_Pacioretty MTL 2018 4500000 8024670
Taylor_Hall N.J 2018 6000000 7982195
Bobby_Ryan OTT 2018 7250000 7870804
Thomas_Vanek VAN 2018 2000000 7842372
Barclay_Goodrow S.J 2018 650000 7837137
Alex_Galchenyuk MTL 2018 4900000 7772333

Higest value defensemen this year

player_2018_d_df <- player_per_game_df %>% filter(Position == "D") %>% filter(year == 2018) %>% filter(GP > 5) %>% select(Player, Team, year, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)
head(player_2018_d_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
Player Team year Cap_Hit Estimated_Cap_Hit
Dustin_Byfuglien WPG 2018 7600000 18654542
Roman_Josi NSH 2018 4000000 12868584
P_K__Subban NSH 2018 9000000 12836368
Brad_Hunt VGK 2018 650000 12238751
Duncan_Keith CHI 2018 5538462 9839100
Sami_Vatanen ANA 2018 4875000 9756144
Matt_Tennyson BUF 2018 650000 9441197
Keith_Yandle FLA 2018 6350000 9389085
Kris_Letang PIT 2018 7250000 9269856
Shea_Weber MTL 2018 7857143 8908570
Adam_Pelech NYI 2018 1600000 8905920
Ben_Hutton VAN 2018 2800000 8678523
Rasmus_Ristolainen BUF 2018 5400000 8518337
Karl_Alzner MTL 2018 4625000 8307787
Erik_Karlsson OTT 2018 6500000 8246111
Drew_Doughty L.A 2018 7000000 7940454
John_Carlson WSH 2018 3966667 7925082
Alec_Martinez L.A 2018 4000000 7835307
Colin_Miller VGK 2018 1000000 7808783
Alex_Pietrangelo STL 2018 6500000 7207172

For team analysis, limit the maximum and minimums to realistic values

player_per_game_df$EstCapHit <- sapply(player_per_game_df$EstCapHit, function(x){
  x <- ifelse(x > 15000000, 15000000, x)
})
player_per_game_df$EstCapHit <- sapply(player_per_game_df$EstCapHit, function(x){
  x <- ifelse(x < 650000, 650000, x)
})

Format Team Stats

nst_team_stats_df <- rbind(nst_team_2015_df, nst_team_2016_df, nst_team_2017_df, nst_team_2018_df)
nst_team_stats_df$points <- nst_team_stats_df$W * 2 + nst_team_stats_df$OTL

nst_team_stats_df$teamabbr <- sapply(nst_team_stats_df$Team, function(x) {
  x <- ifelse(grepl("anaheim", tolower(x)), "ANA",
       ifelse(grepl("arizona", tolower(x)), "ARI",
       ifelse(grepl("boston", tolower(x)), "BOS",
       ifelse(grepl("buffalo", tolower(x)), "BUF",
       ifelse(grepl("calgary", tolower(x)), "CGY",
       ifelse(grepl("carolina", tolower(x)), "CAR",
       ifelse(grepl("chicago", tolower(x)), "CHI",
       ifelse(grepl("colorado", tolower(x)), "COL",
       ifelse(grepl("columbus", tolower(x)), "CBJ",
       ifelse(grepl("dallas", tolower(x)), "DAL",
       ifelse(grepl("detroit", tolower(x)), "DET",
       ifelse(grepl("edmonton", tolower(x)), "EDM",
       ifelse(grepl("florida", tolower(x)), "FLA",
       ifelse(grepl("los angeles", tolower(x)), "L.A",
       ifelse(grepl("minnesota", tolower(x)), "MIN",
       ifelse(grepl("montreal", tolower(x)), "MTL",
       ifelse(grepl("nashville", tolower(x)), "NSH",
       ifelse(grepl("new jersey", tolower(x)), "N.J",
       ifelse(grepl("islanders", tolower(x)), "NYI",
       ifelse(grepl("rangers", tolower(x)), "NYR",
       ifelse(grepl("ottawa", tolower(x)), "OTT",
       ifelse(grepl("philadelphia", tolower(x)), "PHI",
       ifelse(grepl("pittsburgh", tolower(x)), "PIT",
       ifelse(grepl("san jose", tolower(x)), "S.J",
       ifelse(grepl("st louis", tolower(x)), "STL",
       ifelse(grepl("tampa bay", tolower(x)), "T.B",
       ifelse(grepl("toronto", tolower(x)), "TOR",
       ifelse(grepl("los angeles", tolower(x)), "L.A",
       ifelse(grepl("vancouver", tolower(x)), "VAN",
       ifelse(grepl("vegas", tolower(x)), "VGK",
       ifelse(grepl("washington", tolower(x)), "WSH",
       ifelse(grepl("winnipeg", tolower(x)), "WPG",
              "other"))))))))))))))))))))))))))))))))
})

player_per_game_df$CapxTOI <- player_per_game_df$CAP.HIT.. * player_per_game_df$TOI
player_per_game_df$EstCapxTOI <- player_per_game_df$EstCapHit * player_per_game_df$TOI
team_summary_df <- player_per_game_df %>% filter(GP>10) %>% filter(year<2018) %>% mutate(countcolumn = 1) %>% group_by(Team, year) %>% summarize(weighted = sum(CapxTOI), estweighted = sum(EstCapxTOI), captotal = sum(CAP.HIT..), time = sum(TOI), players = sum(countcolumn)) %>% mutate(weighted_cap = weighted / time) %>% mutate(est_weighted_cap = estweighted / time) %>% mutate(cap = captotal / sum(players))
keep <- c("teamabbr", "year","points", "PDO", "SV.", "W", "CF.", "FF.", "SF.", "LDSV.", "MDSV.", "HDSV.", "GF", "GP", "HDCF")
team_summary_df <- merge(team_summary_df, nst_team_stats_df[,names(nst_team_stats_df) %in% keep], by.x = c("Team", "year"), by.y = c("teamabbr", "year")) %>% mutate(year = factor(year))

Team Results compared to weighted player values

ggplot(team_summary_df, aes(x=est_weighted_cap, y=points, color=year)) + geom_point() + theme_economist() + labs(x = "Estimated Average Player Cap Hit (millions)", y = "Points", title = "Estimated Average Player Cap Hit Relation to Points") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(3000000, 5500000), breaks = (seq(3000000, 5500000, 500000)), labels = formattermillion) + scale_y_continuous(limits = c(40, 130), breaks = (seq(40, 120, 10))) + scale_color_manual(values = c("mediumblue", "steelblue", "springgreen"))

Run glm based on weighted player values and save percentage

I have added in the varying types of save percentage (shots from low-danger, mid-danger, and high-danger areas) since this study has not included the impact of goaltending.

team_lm <- lm(data = team_summary_df, points ~ est_weighted_cap + LDSV. + MDSV. + HDSV.)
team_step_lm <- step(team_lm, trace = FALSE)
summary(team_step_lm)
## 
## Call:
## lm(formula = points ~ est_weighted_cap + LDSV. + MDSV. + HDSV., 
##     data = team_summary_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.815  -8.980   2.360   8.143  26.283 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -2.047e+03  4.374e+02  -4.681 1.07e-05 ***
## est_weighted_cap  1.020e-05  4.095e-06   2.491  0.01470 *  
## LDSV.             1.251e+01  4.417e+00   2.832  0.00578 ** 
## MDSV.             6.393e+00  2.094e+00   3.053  0.00303 ** 
## HDSV.             2.855e+00  9.299e-01   3.070  0.00287 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.15 on 85 degrees of freedom
## Multiple R-squared:  0.333,  Adjusted R-squared:  0.3016 
## F-statistic: 10.61 on 4 and 85 DF,  p-value: 5.089e-07

Prediction

team_prediction_df <- player_per_game_df %>% filter(GP>0) %>% filter(year==2018) %>% mutate(countcolumn = 1) %>% group_by(Team, year) %>% summarize(weighted = sum(CapxTOI), estweighted = sum(EstCapxTOI), captotal = sum(CAP.HIT..), time = sum(TOI), players = sum(countcolumn)) %>% mutate(weighted_cap = weighted / time) %>% mutate(est_weighted_cap = estweighted / time) %>% mutate(cap = captotal / sum(players))
keep <- c("teamabbr", "year","points", "PDO", "SV.", "W", "CF.", "FF.", "SF.", "LDSV.", "MDSV.", "HDSV.", "GP", "GF", "HDCF")
team_prediction_df <- merge(team_prediction_df, nst_team_stats_df[,names(nst_team_stats_df) %in% keep], by.x = c("Team", "year"), by.y = c("teamabbr", "year"))
team_prediction_df$PredictedPoints <- predict(team_step_lm, team_prediction_df[,c("est_weighted_cap", "LDSV.", "MDSV.", "HDSV.")])
team_prediction_df <- team_prediction_df %>% mutate(positionrank = rank(-points, ties.method = "average"), estpositionrank = rank(-PredictedPoints, ties.method = "average")) %>% mutate(year = factor(year))
ggplot(team_prediction_df, aes(x=est_weighted_cap, y=PredictedPoints, color=year)) + geom_point() + theme_economist() + labs(x = "Estimated Average Player Cap Hit (millions)", y = "Estimated Points", title = "Estimated 2018 Results") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "none") + scale_x_continuous(limits = c(3000000, 6500000), breaks = (seq(3000000, 6500000, 500000)), labels = formattermillion) + scale_y_continuous(limits = c(60, 130), breaks = (seq(60, 130, 10))) + scale_color_manual(values = c("mediumblue"))

colnames(team_prediction_df) <- c("Team", "year", "weighted", "estweighted", "captotal", "time", "players", "weighted_cap",  "Estimated_Cap", "cap", "GP","Wins", "CF_Perc", "FF_Perc", "SF_Perc", "GF", "HDCF","HDSV_Perc", "MDSV_Perc", "LDSV_Perc", "SV_Perc", "PDO", "Points", "Predicted_Points", "Rank", "Est_Rank")

Predicted finishing points for each team based on glm

head(team_prediction_df %>% arrange(-Predicted_Points), n = 32) %>% select(Team, Estimated_Cap, LDSV_Perc, MDSV_Perc, HDSV_Perc, PDO, Points, Rank, Est_Rank, Predicted_Points) %>% kable("html") %>% kable_styling()
Team Estimated_Cap LDSV_Perc MDSV_Perc HDSV_Perc PDO Points Rank Est_Rank Predicted_Points
WSH 4579948 99.28 96.65 87.86 1.020 35 10.0 1 109.77320
CHI 5688806 98.53 95.80 88.71 1.004 29 22.0 2 108.69379
PHI 4141515 99.09 96.61 89.96 1.017 29 22.0 3 108.66507
ANA 3968388 99.52 96.87 87.55 1.013 30 19.0 4 107.05955
L.A 4044817 98.14 98.06 90.23 1.019 41 2.0 5 105.83647
S.J 4585989 99.36 96.33 86.70 0.979 32 15.5 6 105.47810
NSH 5262799 98.67 95.65 88.39 1.005 39 4.0 7 104.22788
T.B 4447205 98.94 96.11 88.68 1.029 42 1.0 8 103.05582
STL 4737125 98.67 95.99 88.32 1.013 40 3.0 9 100.84051
WPG 5372630 98.91 94.99 87.14 1.010 38 5.0 10 100.56211
VAN 4493144 97.21 97.79 90.60 1.009 32 15.5 11 98.10664
MTL 5273781 98.86 95.82 84.42 0.989 30 19.0 12 96.46904
CAR 4508526 98.82 94.62 89.79 0.991 28 24.0 13 95.82431
CBJ 3915007 98.74 96.73 87.50 1.003 35 10.0 14 95.72130
CGY 4993286 97.49 97.27 87.88 0.990 32 15.5 15 95.61992
NYR 4228625 98.83 95.43 88.68 1.007 32 15.5 16 95.10387
N.J 4099493 98.53 96.39 86.76 1.018 36 7.0 17 90.68997
VGK 4695134 98.06 97.19 84.47 1.004 35 10.0 18 89.46206
BOS 4063974 99.35 93.77 88.68 0.999 30 19.0 19 89.31715
NYI 4185823 97.92 96.37 87.50 1.017 35 10.0 20 85.92537
BUF 4947457 98.37 95.81 83.41 0.971 18 31.0 21 84.06481
DET 4164180 98.61 95.44 85.60 0.989 27 25.0 22 82.96564
COL 4662615 97.71 97.23 83.64 0.989 26 26.5 23 82.63877
PIT 4611159 98.34 94.03 87.55 0.961 35 10.0 24 80.70054
OTT 4483309 97.74 95.98 85.24 0.985 25 28.0 25 77.76257
TOR 3240157 97.73 95.79 89.58 1.025 37 6.0 26 76.13523
FLA 4263779 98.25 94.85 85.59 0.993 26 26.5 27 75.67839
ARI 3643413 98.00 95.87 86.41 0.978 19 30.0 28 75.08618
DAL 3651222 97.95 93.72 90.55 0.995 33 13.0 29 72.61582
EDM 3492833 98.19 93.89 89.43 0.989 24 29.0 30 71.89163
MIN 3662911 97.30 95.25 89.50 0.996 29 22.0 31 71.38784

Predicted finishing points for each team based on weighted player values only

head(team_prediction_df %>% arrange(-Estimated_Cap), n = 32) %>% select(Team, Estimated_Cap, LDSV_Perc, MDSV_Perc, HDSV_Perc, PDO, Points, Rank, Est_Rank, Predicted_Points) %>% kable("html") %>% kable_styling()
Team Estimated_Cap LDSV_Perc MDSV_Perc HDSV_Perc PDO Points Rank Est_Rank Predicted_Points
CHI 5688806 98.53 95.80 88.71 1.004 29 22.0 2 108.69379
WPG 5372630 98.91 94.99 87.14 1.010 38 5.0 10 100.56211
MTL 5273781 98.86 95.82 84.42 0.989 30 19.0 12 96.46904
NSH 5262799 98.67 95.65 88.39 1.005 39 4.0 7 104.22788
CGY 4993286 97.49 97.27 87.88 0.990 32 15.5 15 95.61992
BUF 4947457 98.37 95.81 83.41 0.971 18 31.0 21 84.06481
STL 4737125 98.67 95.99 88.32 1.013 40 3.0 9 100.84051
VGK 4695134 98.06 97.19 84.47 1.004 35 10.0 18 89.46206
COL 4662615 97.71 97.23 83.64 0.989 26 26.5 23 82.63877
PIT 4611159 98.34 94.03 87.55 0.961 35 10.0 24 80.70054
S.J 4585989 99.36 96.33 86.70 0.979 32 15.5 6 105.47810
WSH 4579948 99.28 96.65 87.86 1.020 35 10.0 1 109.77320
CAR 4508526 98.82 94.62 89.79 0.991 28 24.0 13 95.82431
VAN 4493144 97.21 97.79 90.60 1.009 32 15.5 11 98.10664
OTT 4483309 97.74 95.98 85.24 0.985 25 28.0 25 77.76257
T.B 4447205 98.94 96.11 88.68 1.029 42 1.0 8 103.05582
FLA 4263779 98.25 94.85 85.59 0.993 26 26.5 27 75.67839
NYR 4228625 98.83 95.43 88.68 1.007 32 15.5 16 95.10387
NYI 4185823 97.92 96.37 87.50 1.017 35 10.0 20 85.92537
DET 4164180 98.61 95.44 85.60 0.989 27 25.0 22 82.96564
PHI 4141515 99.09 96.61 89.96 1.017 29 22.0 3 108.66507
N.J 4099493 98.53 96.39 86.76 1.018 36 7.0 17 90.68997
BOS 4063974 99.35 93.77 88.68 0.999 30 19.0 19 89.31715
L.A 4044817 98.14 98.06 90.23 1.019 41 2.0 5 105.83647
ANA 3968388 99.52 96.87 87.55 1.013 30 19.0 4 107.05955
CBJ 3915007 98.74 96.73 87.50 1.003 35 10.0 14 95.72130
MIN 3662911 97.30 95.25 89.50 0.996 29 22.0 31 71.38784
DAL 3651222 97.95 93.72 90.55 0.995 33 13.0 29 72.61582
ARI 3643413 98.00 95.87 86.41 0.978 19 30.0 28 75.08618
EDM 3492833 98.19 93.89 89.43 0.989 24 29.0 30 71.89163
TOR 3240157 97.73 95.79 89.58 1.025 37 6.0 26 76.13523

Notes

  1. Goaltending value was not calculated.

  2. This analysis does not take into account scheduling, which can greatly impact the possible number of points for each team.

  3. Relative values for advanced statistics (e.g. Corsi, Fenwick) should be pulled from other data sources and analyzed.

  4. Given recent rule changes, it appears the average number of goals scored per game has now increased from previous years.

pre_2018_scoring_df <- team_summary_df %>% group_by(year) %>% summarize(GPT = sum(GP), GFT = sum(GF)) %>% mutate(GPG = GFT / GPT)
cur_2018_scoring_df <- team_prediction_df %>% mutate(year = as.numeric(as.character(year))) %>% summarize(year = max(year), GPT = sum(GP), GFT = sum(GF)) %>% mutate(GPG = GFT / GPT)
rbind(cur_2018_scoring_df, pre_2018_scoring_df) %>% arrange(year) %>% rename(games_played = GPT, goals_for = GFT, goals_per_game_per_team = GPG) %>% kable("html") %>% kable_styling()
year games_played goals_for goals_per_game_per_team
2015 2460 4404 1.790244
2016 2460 4248 1.726829
2017 2460 4475 1.819106
2018 876 1651 1.884703